Wednesday, March 18, 2015

How Well Did The Computers Predict The Field?

This is my annual post where I break down the RPI/Sagarin/Pomeroy numbers of the bubble teams. Below I listed the ten lowest-rated at-large teams and the ten highest-rated non-NCAA Tournament teams. Keep in mind that I am only considering at-large eligible teams, and am not listing automatic bid winners (or postseason ineligible teams).

There are a bunch of really interesting things in these numbers, and I discuss it all at the bottom of the post.

Note that all of these numbers are as of Monday morning (i.e. they include all of the results up through Selection Sunday but do not include any post-Selection Sunday tournaments... even though a few have already tipped off as I'm typing this).


Ten highest rated teams to miss the Tournament (NIT seed given):

29. Colorado State (1)
34. Temple (1)
45. Tulsa (2)
46. Old Dominion (1)
51. Iona (6)
54. Green Bay (5)
56. Richmond (1)
58. Louisiana Tech (3)
59. Stanford (2)
62. Illinois State (4)

Ten lowest rated teams to earn an at-large (seed given):
61. Indiana (10)
60. Mississippi (11)
57. LSU (9)
55. Purdue (9)
49. Oklahoma (9)
48. UCLA (11)
44. St. John's (9)
43. Iowa (7)
42. Texas (11)
40 (tied). Boise State (11)
40 (tied). Ohio State (10)


Ten highest rated teams to miss the Tournament (NIT seed given):

42. Colorado State (1)
47. Miami-Florida (2)
48. Temple (1)
51. Stanford (2)
53. Texas A&M (2)
54. Illinois (3)
55. Tulsa (2)
56. Rhode Island (3)
58. Murray State (3)

Ten lowest rated teams to earn an at-large (seed given):
50. Mississippi (11)
49. Cincinnati (8)
46. UCLA (11)
45. Indiana (10)
44. Boise State (11)
43. BYU (11)
41. LSU (9)
40. Purdue (9)
39. Dayton (11)
38. North Carolina State (8)


Ten highest rated teams to miss the Tournament (NIT seed given):

33. Florida (-)
39. Illinois (3)
40. Miami-Florida (2)
42. Minnesota (-)
43. Stanford (2)
48. TCU (-)
49. Vanderbilt (5)
54. Texas A&M (2)
57. South Carolina (-)
58. Rhode Island (3)

Ten lowest rated teams to earn an at-large (seed given):
56. Oregon (8)
55. UCLA (11)
52. Cincinnati (8)
51. Dayton (11)
50. LSU (9)
47. Boise State (11)
46. St. John's (9)
44. Mississippi (11)
41. Indiana (10)
38. Georgia (10)


Ten highest rated teams to miss the Tournament (NIT seed given):

45. Florida (-)
47. Vanderbilt (5)
48. Stanford (2)
50. Texas A&M (2)
51. Richmond (1)
52. Miami-Florida (2)
55. Rhode Island (3)
56. Temple (1)
57. TCU (-)
58. Minnesota (-)

Ten lowest rated teams to earn an at-large (seed given):
53. Indiana (10)
49. Purdue (9)
46. Oregon (8)
44. Mississippi (11)
43. LSU (9)
42. St. John's (9)
41. UCLA (11)
40. Dayton (11)
39. Boise State (11)
38. North Carolina State (8)


How were the computers used differently this year?
There was a fascinating year that was quite different from just about any other year we've seen.

Historically, the computer ratings have been straightforward. The RPI is always a reasonably good guide for seeds, and in years where the Sagarin PURE_ELO existed, that was slightly better, because teams were selected for resumes rather than how good they were. The Sagarin PREDICTOR and Pomeroy ratings were always fairly far off from the selected field, because they rate teams by how good they are, without any worry about resume strength.

But this year, that changed. And it doesn't mean that this is a permanent change, but there was definitely a change. This year, the RPI was significantly far more off than usual. In fact, the closest metric to the field, at least from the numbers above, was the Sagarin ELO_SCORE, which is an ELO style rating that also takes into accounts the scores of games, putting it kind of halfway between a pure resume measure and a measure of how good teams are like the Sagarin PREDICTOR.

The biggest victim of this was Colorado State. They were just the fourth team since the field expanded to 64+ teams with a Top 30 RPI to get left out, and they probably had the strongest resume of the bunch (the other three were 2006 Missouri State, 2006 Hofstra and 2007 Air Force). Their RPI was a bit of a gimmick of good scheduling, though. The Sagarin ELO_SCORE had them 42nd, which was still highest out of the field, but not absurdly high to get left out. The Sagarin PREDICTOR and Pomeroy ratings, meanwhile, had Colorado State in the mid-60s. So what the Selection Committee clearly saw was a team with a gimmick RPI that didn't have a Top 30 resume, and wasn't as good as their resume either. So if they were an 11 seed then so be it, but it wasn't totally out of the realm of possibility for them to get left out altogether.

Which Selection Committee decisions were most inexplicable?
Obviously UCLA getting in the field, by any metric, makes no sense. The Selection Committee chair claimed on television that they were "gaining steam", yet they weren't moving up in the computers and they hadn't beaten a Pomeroy Top 125 opponent in over a month. Basically, everybody watched them play Arizona in the Pac-12 tournament and was impressed by how they almost won that game. It was a bad decision that totally put to rest the nonsense view that the Selection Committee claims that they treat all games equally and don't put extra weight on games down the stretch.

That said, from what I said above, I think Temple actually has more of a gripe for being out than Colorado State. Temple's resume wasn't necessarily the strongest left out, but it was one of the strongest, and they were more well-liked by Pomeroy than Colorado State. But more importantly, they were a team that was completely different after adding two key transfers at the end of the fall semester, meaning that the best comparison for them was the 2010-11 USC team that added Jio Fontan after the fall semester and became significantly better afterwards. That team ended up 67th in RPI and 78th in the Sagarin ELO_CHESS, numbers which were unprecedented for an at-large team, but they got in because the Selection Committee chose to basically give them a pass for their time before Jio Fontan. But so goes for Temple, of course. And Temple not only went 13-5 in the AAC and won 11 of their final 14 games, but they also scheduled tough, knocking off Kansas and playing Villanova and Duke. We've seen year after year that the Selection Committee rewards teams for sticking tough teams on their schedules (like Georgetown sliding up to a 4 seed this year), yet they ignore Temple?

There simply is no way, under the old methodology or the new, to justify Temple being out of the field. Colorado State, in contrast, was screwed simply because the Selection Committee changed their philosophy this year and, for the first time, started to care how good teams actually were.

What does this mean going forward?
The Selection Committee isn't going to start ranking teams by Pomeroy, of course. Florida was rated the best team out of the field by both Sagarin and Pomeroy, yet they were also sub-.500 this season and were not a contender for the NIT, let alone an NCAA Tournament at-large bid.

And really, as I've argued before, it wouldn't be fair or right to judge teams purely by how good they are. Even the system that I've seen a few analytics-savvy folks argue for, of using resumes to pick the Field of 68 but then using something like Pomeroy to seed the 68 teams, seems wrong to me. Do we really want a sport where Texas ends up with a better seed than Maryland? Sure, Maryland was staggeringly lucky to win as many games as they did, but they still won those games. If we shove them back to a 9 or 10 seed, then what's the point of the regular season? Why do we care that some buzzer beater went in or out if it's not going to have any significant factor in their Pomeroy score?

In my opinion, and I'm basing this just on one year's bracket and we could see the philosophy flip back next season, I think that the change we are going to see is more subtle. Basically, the Selection Committee is now aware of the fact that other metrics exist which say Maryland and Colorado State are overrated by the RPI while Iowa was underrated. And this knowledge becomes a tiebreaker. So, even though Maryland's resume clearly deserved a 3 seed, they were slid back to a 4. Even though Colorado State's resume deserved a 9 or 10 seed, they were slid back onto the edge of the bubble. Iowa's resume deserved an 8 or 9 and they were slid up to a 7. It certainly could have played a role in why Mississippi, a team that most projected to be out of the field and which was 60th in RPI, ended up getting in.

Like I said, this is just a theory. And it's certainly not what you're going to hear from Selection Committee chairs on television, who are incredibly scripted and opaque and give stupid answers to questions (like that UCLA "was gaining steam"). But it would make sense with the growing awareness of analytics, and it will be fascinating to see if this trend continues with a different Selection Committee next season.


Anonymous said...

I enjoyed this post and mostly agree with your analysis. I don't really know if I like the change in philosophy and one of the tough things is that with different people on the committee every year, the specific criteria is always going to float some.

Mostly I just agree with the idea that using Pomeroy/SagarinPredictor would be a really bad thing for seeding or worse selection.

Bruin fan said...

So does UCLA still make no sense?

Jeff said...

Yes, UCLA should still not be in. Selection can never and should be justified by tourney performance... unless you think UAB should have been seeded ahead of Iowa State also?