College Baseball Power Rankings Theory Overview, Part Two

Part one of Ryan Nelson’s College Baseball Power Rankings statistical dive can be found here.

Welcome to Part Two of Collegiate Baseball Scouting Network’s Power Ranking Series. For those of you that have read the first part of this series, feel free to skip this first paragraph, as I will simply be summarizing for those who missed it. In Part One, I introduced our new metric called Team Rating. This aptly named statistic is a holistic measure of a team’s strength based on their run scored and allowed, as well as their opponent’s strength. Through combining the best of both the NCAA’s RPI team ranking algorithm and a Pythagorean win-percentage variant, we were able to, in our opinion, more accurately measure a team’s season performance than any previously available method. If you are interested in learning more of the details, you can read the full write-up here.

While this method of ranking teams is useful, it is not completely unique. There are other Power Ranking algorithms out there, including the previously mentioned RPI, which is the NCAA’s official algorithm for ranking teams and seeding tournaments. So while we think that our algorithm might prove slightly better in the long run, it is not completely novel. In today’s piece, we would like to break out something completely new.

After one determines that a team is good, the next logical step is to ask why the team is good, or rather, what aspect of a team is good. In baseball, there are many facets of a team that can make them “good” or “bad.” There are obvious ones, like hitting or pitching, less obvious ones, like defense, baserunning, or coaching, and those that are impossibly miniscule but still have the potential to add up, like team nutrition, or travel schedule, or even the aerodynamics of the team’s uniforms. But at the end of the day, every benefit or detriment comes in one of two categories: run-scoring and run-prevention. In a game of runs, there are only two things that matter, and those things are scoring as many as you can and holding your opponent to as few as you can. So we decided to create a metric that measured a team’s ability to do those things.

We decided to name these two metrics Batting Rating and Pitching Rating. Like I mentioned above, there is more to run-scoring than batting (namely baserunning), and there is more to run-prevention than pitching (namely fielding), but Run-Scoring/Run-Prevention Rating is a mouth full, and Offense/Defense Rating is so generic it hurts, so Batting/Pitching Rating it is!

The guts of these metrics are very similar to Team Rating, in that they are based on runs, not wins, and they factor opponent strength. The key difference is that they look at offense/defense individually instead of together. In order to do this, we use the same Pythagenport formula as in Team Rating, but instead of using both runs scored and runs allowed, we only use the value we are trying to measure, and we assign a perfectly average run total to the other side of the equation. As an example, we will look at Kentucky’s offense from last year (Spoiler Alert: they rank 1st in Batting Ranking from last season). Kentucky scored 484 runs in 66 games last year, a very solid total. Since we are measuring their offense, this will be their runs scored. But because we only care about Kentucky’s offense, we don’t want to include their runs allowed in the formula. Instead, we will pretend that Kentucky had a perfectly average pitching staff/defense. The average runs allowed per game in D1 baseball last year was 5.66. So if we assume that Kentucky had a perfectly average pitching staff, they would have allowed approximately 373.6 runs. If we use this same fictitious, average pitching staff for all teams when measuring Batting Ranking, then all teams will be measured solely on their offensive ability. All of this would simply be reversed if you were trying to measure Pitching Ranking.

So now that we have this half-reality-based, half-fabricated Pythagenport Win Percentage, now what? The next step is to control for team strength. The method will be very similar to the one used for Team Rating, with one major twist. When measuring Team Rating, we calculated the strength of a team’s opponents, and the strength of the team’s opponents’ opponents. We will also go two levels deep with Batting/Pitching Rating, but rather than measure the opponent’s overall strength, we just want to measure the opponent’s corresponding Offensive/Defensive Pythagenport Win Percentage.

If it is not immediately clear why we would do this, think of it this way: if we were to measure the strength of Kentucky’s offense, would we care about the strength of their opponents’ offense? No, because Kentucky’s offense does not face their opponents’ offense, they face their opponents’ pitching staff. So we would use Kentucky’s opponents’ Pitching Ratings. And when we go to that second level, who does Kentucky’s opponents’ pitching staffs face? They face the opposing offenses. So the opponents’ opponents’ Batting Rating is what matters. In the table below, you can see it laid out simply and in a more easily digestible format than the wall of text that I just attempted to erect:

The above flip-flop pattern of which aspect to measure ensures that we are only measuring the relative strengths of the teams, not the holistic strength.

After weighing each of the three parts appropriately, we have our final metrics: Batting Rating and Pitching Rating. One last note I will make before I reveal a retrospective ranking of last year’s teams: besides the raw Batting/Pitching Ratings, you also find a Scaled ranking and a Standardized ranking. These are two different ways of presenting the ratings that have two distinct advantages. For the scaled data, the best team will always be a perfect 1.000, with subsequent teams being lower; the advantage to this is that it leads to a very easy comparison between two offenses. For the standardized data, the average team will be exactly 0.000, with higher numbers being better and lower being worse. For those of you with a knowledge of statistics, you will also know that this means a standardized rating of 1.000 is 1 standard deviation above average, a standardized rating of 2.000 being 2 standard deviations above average, and so on, and vice-versa. The benefits of this is that it allows for better comparison between years (is the best offense this year better than the best offense last year or the year before?), and it allows the comparison of an offense to a defense (if the best offense has a standardized rating of 2.470, and the best defense has a standardized rating of 2.682, the defense is theoretically better than the offense).

And now that I have said my piece, here are the retrospective rankings of 2017’s final season results:

TeamBatting RatingBatting Rating ScaledBatting Rating StandardizedBatting Rating Rank
Florida St.0.57550.9912.4102
Wake Forest0.57330.9872.3373
North Carolina0.56240.9681.9647
Southern Miss0.55820.9611.82210
New Mexico0.55530.9561.72311
South Ala.0.55230.9511.62014
Texas Tech0.55180.9501.60415
Sam Houston St.0.54720.9421.44617
Georgia Tech0.54680.9411.43518
Texas A&M0.54660.9411.42820
Southeastern Louisiana0.54630.9401.41921
Virginia Tech0.54510.9381.37522
TeamPitching RatingPitching Rating ScaledPitching Rating StandardizedPitching Rating Ranked
Oregon St.0.59221.0002.7601
North Carolina0.57620.9732.2593
Texas Tech0.56250.9501.8287
Miami (FL)0.55970.9451.74110
South Carolina0.55860.9431.70512
Ole Miss0.55850.9431.70113
Long Beach St.0.55510.9371.59415
Missouri St.0.55320.9341.53416
Florida St.0.55160.9311.48419
Texas A&M0.55000.9291.43421
Cal St. Fullerton0.54930.9281.41223
North Carolina St.0.54920.9271.41024
Sam Houston St.0.54840.9261.38425



Leave a Reply

%d bloggers like this: