The RPides of March: Does RPI predict NCAA inclusion and seeding?

If the middle to end of March is "March Madness", then the beginning of the month could be called "Amateur Stats Madness". These days, the discussion in college basketball is less about performance on the court, or trumped up controversies on and off the court , and more about  the RPI. 

There is a strong sense that the RPI is a crucial metric which can predict tournament inclusion and seeding. But even if the RPI had no effect, there would be a reason for journalists to act like it did. While polls change  weekly, while the standings change 2-3 times a week,  RPI jumps around every day there is a Division 1 game. That means that Joe Lunardi gets to publish a new bracketology chart every day and that his on-camera pals get to do segment after segment about who is in, or on "the bubble".  We shouldn't assume that RPI is important, simply because we hear a lot about it... 

To get a sense of how relevant the index is in terms of what happens later in March, I looked at RPI in March 2013 as a predictor of at large bids, and of seeds. 

Getting In

First, I compared the 2013 at-large field, to the field that would have existed had RPI been a perfect predictor of who made it. This was an exercise in arithmetic and nothing more. 
 

In 2013 there were 37 at large teams in total. If none of the top 37 RPI teams got automatic bids, and if RPI was a perfect predictor of getting a bid, then we would expect for the team with the 37th best RPI (Illinois last year) to get a bid, and the 38th RPI team (NC State) to get an NIT bid. 

Of course the cutoff isn’t actually 37 because plenty of automatic bids do go to top RPI teams. Last year 11 teams (including our lobos) had top 37 RPIs and got automatic bids. What this does, assuming that the number of at-large bids is fixed, is raise the cutoff by 11, to 48. 

But last year that was STILL too low of a cutoff, because there were two more auto-qualifying teams with RPI ranks between 37 and 48 (Ole Miss and Akron). And even that was too low, because UConn (RPI 48) , was ineligible to make the tournament. Ultimately all top 51 RPI teams would make it in, if RPI was a perfect predictor. So how did RPI do? 

It wasn’t perfect…but it almost was. 97% of at large bids went to teams with Top-51 RPIs. 

There was exactly one team that was not invited to the tournament, even though it had a top 51 RPI: Southern Miss (RPI 31). If this seems  bizarre , then you can read this writer try to explain it away. I personally don’t discount the role of schedule strength, net of RPI , and ultimately conference affiliation in explaining why a Conference USA school would get omitted. Which team out of the Top 51 got an at-large bid? That would be Villanova (RPI 54), a Blueblood from an elite conference. 

In the end one team’s really solid RPI didn’t help it make the tournament while another made it with a slightly too low rank. RPI could not have been the only thing to earn 'Nova a bid, but its RPI was very close to the adjusted cut-off.

Getting a Seed

Maybe RPI is some kind of filter to get in, but all sorts of other factors decide seeding. Do teams with the best RPIs get the best ranks by the committee and ultimately the best seeds? To look at this, I created  a very simple regression model where the natural log of overall NCAA rank is predicted by the natural log of RPI rank. The results are quite supportive of the RPI.

On average for tournament teams, a full 60% of a team’s 2013 NCAA rank can be explained by its end of season RPI rank . A 4 rank increase in a team’s RPI rank would explain a 1 rank increase in tournament rank. (My results are significant at .01)

When I run the model for at-large teams only, it does even better- 65% of variation in overall rank can be explained. This suggests that RPI is probably an important tool used in seeding, alongside factors like minimizing conference matchups, lowering travel for top teams, and who knows what else. Below, is a graph showing that tournament rank (logged) varies closely with RPI rank (also logged).

Patrick Adler, 2014 NOTE Logs are taken because the relationship is not perfectly linear.  

Patrick Adler, 2014
NOTE Logs are taken because the relationship is not perfectly linear.  

Together, these results  validate the RPI as a barometer for the tournament. Last year year at least, RPI was a really good predictor of who made the tournament and who didn't. RPI seems so powerful that these obsessive exercises in bracket soothsaying are probably over-the-top.  

 These numbers do not demonstrate that RPI is a better predictor than the other indices out there, which are surely highly correlated with RPI. I'm sure that ESPN can make a similar case for its own BPI. And of course, they do not demonstrate that RPI is a good indicator of team strength or a good forecast of success once the madness begins. 

 

 

An early version of this post appeared on TheLoboLair.com. Thanks to everyone there for their comments.