Monday, March 14, 2016

How Well Did The Computers Predict The Field?

This is my annual post where I break down the RPI/Sagarin/Pomeroy numbers of the bubble teams. Below I listed the ten lowest-rated at-large teams and the ten highest-rated non-NCAA Tournament teams. Keep in mind that I am only considering at-large eligible teams, and am not listing automatic bid winners (or postseason ineligible teams).

Note that all of these numbers are as of Monday morning (i.e. they include all of the results up through Selection Sunday but do not include any post-Selection Sunday tournaments). Also note that I have added the Massey ratings this year, to include a more true measure of a team's resume.


Ten highest rated teams to miss the Tournament (NIT seed given):

30. St. Bonaventure (1)
34. Akron (6)
38. Saint Mary's (2)
39. Princeton (6)
41. San Diego State (2)
49. Valparaiso (1)
52. Monmouth (1)
54. Hofstra (5)
55. Florida (2)
62. Georgia (3)

Ten lowest rated teams to earn an at-large (seed given):
72. Syracuse (10)
63. Vanderbilt (11)
59. Temple (10)
58. Tulsa (11)
57. Michigan (11)
56. Butler (9)
53. Pittsburgh (10)
51. USC (9)
48. Cincinnati (9)
47. Wichita State (11)


Ten highest rated teams to miss the Tournament (NIT seed given):

35. Saint Mary's (2)
44. South Carolina (1)
45. Georgia Tech (4)
52. Florida (2)
54. Virginia Tech (3)
55. San Diego State (2)
56. Georgia (3)
59. Ohio State (3)
61. Valparaiso (1)
62. Kansas State (-)

Ten lowest rated teams to earn an at-large (seed given):
75. Tulsa (11)
60. Vanderbilt (11)
58. Syracuse (10)
53. Temple (10)
48. Oregon State (7)
47. VCU (10)
46. USC (9)
43. Wichita State (11)
42. Texas Tech (8)
41. Colorado (8)


Ten highest rated teams to miss the Tournament (NIT seed given):

31. Saint Mary's (2)
40. BYU (2)
43. Creighton (4)
44. Valparaiso (1)
45. Florida State (4)
47. Kansas State (-)
50. South Carolina (1)
52. San Diego State (2)
53. Clemson (-)
57. Georgia Tech (4)

Ten lowest rated teams to earn an at-large (seed given):
98. Temple (10)
66. Tulsa (11)
59. Oregon State (7)
54. Providence (9)
51. Colorado (8)
48. Texas Tech (8)
46. Dayton (7)
42. USC (9)
41. Syracuse (10)
39. Wisconsin (7)


Ten highest rated teams to miss the Tournament (NIT seed given):

34. Saint Mary's (2)
36. Valparaiso (1)
43. San Diego State (2)
44. Florida (2)
48. Creighton (4)
50. Kansas State (-)
51. South Carolina (1)
52. Florida State (4)
53. Clemson (-)
57. Houston (5)

Ten lowest rated teams to earn an at-large (seed given):
86. Temple (10)
60. Oregon State (7)
58. Tulsa (11)
56. Michigan (11)
55. Colorado (8)
54. Dayton (7)
49. USC (9)
46. Providence (9)
45. Pittsburgh (10)
42. Texas Tech (8)


Worst Teams In? Worst Snubs?
The total dominance of the RPI this season might not totally be obvious from the list above. After all, St. Bonaventure was 30th in RPI, and the Selection Committee saw through the fact that it was a gamed RPI and left them out. Their Massey rating was just 67th. Akron's RPI was also far too high, and they were left out. Even the worst team in the field, Temple (the worst at-large team I've ever tracked in Pomeroy or Sagarin in all the years I've been blogging), had a resume that wasn't really too far off of a reasonable at-large bid. They were super lucky in close games, so despite being a very mediocre team, they had a bubbly resume.

The Selection Committee wasn't perfect at seeing through gamed RPIs, though. If you look at the ten weakest Massey ratings to get in, the most glaring teams besides Tulsa are Texas Tech (I wrote about their gamed schedule here), and three Pac-12 teams (the entire Pac-12 was overrated in the RPI). So, no surprise there.

But the RPI ratings themselves weren't the problem last night. It was the RPI-related metrics:

"RPI Top 50" Uber Alles
Anytime there was a confusing seeding last night, Record vs the RPI Top 50 or Record vs the RPI Top 100 was why. Why was Texas A&M seeded ahead of Kentucky? Why was Oregon State so high? Why was St. Bonaventure left out? Why did Syracuse get in? Why were small conference teams left out? Why was Purdue behind Iowa State? The answer to every single one of those questions is: Record vs the RPI Top 50/100.

And this is a problem for two reasons. First of all, it's arbitrary, as this prescient tweet about Cincinnati points out:

Second, and more importantly...

Small conference teams are ineligible now?
When "Record vs RPI Top 50" dominates, it effectively makes small conference teams ineligible. They simply cannot get RPI Top 50 opponents on their home court. They also tend to play a much larger fraction of their games on the road, which leads to a problem when the Selection Committee doesn't take into account how much tougher those games are. For example, Syracuse got credit for a home "RPI Top 50" win over St. Bonaventure while Monmouth was killed for their three horrible "RPI 200+ losses". Unfortunately:

Monmouth was basically the poster child this year for a team that was screwed by a system that doesn't take home/road into account. Monmouth led the nation with 13 road wins, meaning that their schedule was significantly more difficult than their RPI numbers would suggest. Ken Pomeroy's numbers take this into account by using "Tier A" and "Tier B", where "Tier A" is a game equivalent to playing a Top 50 team at home, and "Tier B" is equivalent to playing a Top 100 team at home. How different do the stats get when you take home/road into account?

There's a reason why Wichita State was the only team from a small/mid level conference to earn an at-large bid, and even they did it with a comically low seed (an 11 seed when they were 12th in Pomeroy and probably had an 8 or 9 seed worthy resume). Teams like Monmouth, Valparaiso, Saint Mary's, and San Diego State? Shit out of luck in the system we had this season. And that's just not good for our sport.

Reliance on RPI records kills small conference teams on the back end, too
Every member of the Selection Committee who has been interviewed since yesterday has brought up RPI 200+ losses as impacting teams significantly. The problem is, this is yet another significant bias against small conference teams. As I pointed out above with the Canisius/St. Bonaventure tweet, winning on the road against RPI 200+ teams is far from a sure thing. Sure, you should win most of the time if you're good, but major conference teams only need to face RPI 200+ teams at most once a season on the road (and likely never), while small conference teams have to do it repeatedly. This year, Monmouth has to play 11 true road games vs RPI 200+ opponents. Eleven! Temple only had to do it 5 times, and they lost 1. Dayton did it three times, and they lost 1. Go get any major conference team to play 11 true road games vs RPI 200+ opponents and they're going to lose a couple of times also. Luckily for them, they never will have to.

Was the over-reliance on RPI the only thing wrong this year?
Amazingly, the answer to that question is "no". There were several other deviations from previous years. For one, they weighted late games significantly less than previous years. Normally, teams like Seton Hall and Saint Joseph's would have slid up another seed line or two for their conference tournament titles. Normally, they'd never allow Kentucky to beat Texas A&M head-to-head in a title game with an overall similar resume and still end up seeded behind them. Another deviation from previous years was the increased concern for "bad losses", as discussed above with regards to Monmouth.

If the Selection Committee wants to weight late games less, and wants to worry more about bad losses, there's no inherent reason why that's bad. But it's the inconsistency that is bad. Teams, players, coaches, and fans should know what to expect. To jerk the rationale around from year to year is just not good.

There is always a furor on Selection Sunday, and I generally try to avoid it. Usually the Selection Committee does a pretty good job, and usually I spend my "How well did the computers predict the field?" post defending most of their decisions. But this year was a tire fire in every possible way. And it has to get cleaned up, for the good of the sport, and for the sake of fairness.

What has to change?
The answer is not to "let Vegas seed the bracket" or to rank teams by Pomeroy. We should be ranking teams by resume strength, and not by how good they are. I want to care whether a buzzer beater goes in or not.

However, we have decent resume metrics. Use Massey, or use one of a variety of ELO ratings, or just average them all out. I don't care.

If the Selection Committee wants to continue to be subjective to take into account things like rewarding strength of schedule, or weighting big wins and bad losses over teams with neither, I'm totally fine with that. But ditch the RPI and give them location-adjusted numbers. That Syracuse win over St. Bonaventure? It doesn't count as a "Top 50" win anymore. That Monmouth win at UCLA? Suddenly that counts for much more.

It's an easy fix, and it wouldn't fundamentally change the way the brackets are made. Just give the Selection Committee the tools to allow them to properly weight small conference teams vs elite conference teams. Otherwise, let's cut the fiction and make small conference teams ineligible for at-large bids. Because this season, they were.

No comments: