Monday, March 12, 2018

How Well Did The Computers Predict The Field?

While I no longer have the time to do regular blogging, I'd like to try to keep these annual "How well did the computers predict the field?" posts going for the sake of historical data.

The first 2018 data: the root squared mean difference between each team's seed and where the teams would be if ranked strictly by the rating systems (1-11 seeds only). Again I am measuring RPI, BPI Strength of Record (resume strength) and Pomeroy (pure team strength):

Note that all of these numbers are as of Monday morning (i.e. they include all of the results up through Selection Sunday but do not include any post-Selection Sunday tournaments).

Average Rating Error:
2.22 - Pomeroy
2.25 - BPI SOR
2.96 - RPI


Ten highest rated teams to miss the Tournament (NIT seed given):

33. Middle Tennessee (3)
34. USC (1)
38. Louisville (2)
39. Western Kentucky (4)
40. Saint Mary's (1)
50. Boise State (4)
52. Temple (5)
55. Northeastern (-)
56. Nebraska (5)
58. Marquette (2)

Ten lowest rated teams to earn an at-large (seed given):
66. Arizona St (11)
64. North Carolina St (9)
61. Virginia Tech (8)
54. Florida St (9)
53. Kansas St (9)
51. Texas (10)
49. Oklahoma (10)
46. Florida (6)
45. Syracuse (11)
44. Creighton (8)

  BPI Strength of Record

Ten highest rated teams to miss the Tournament (NIT seed given):

31. Nebraska (5)
41. Marquette (2)
42. Louisville (2)
43. Oklahoma State (2)
44. Baylor (1)
48. Middle Tennessee (3)
50. Saint Mary's (1)
52. Mississippi State (4)
53. Notre Dame (1)
55. Maryland (-)

Ten lowest rated teams to earn an at-large (seed given):
69. Arizona State (11)
54. UCLA (11)
49. Missouri (8)
47. Syracuse (11)
40. Oklahoma (10)
39. Florida State (9)
38. North Carolina St (9)
37. Nevada (7)
36. Butler (10)
35. Creighton (8)


Ten highest rated teams to miss the Tournament (NIT seed given):

28. Saint Mary's (1)
29. Penn State (4)
31. Notre Dame (1)
33. Louisville (2)
34. Baylor (1)
40. USC (1)
46. Maryland (-)
52. Middle Tennessee (3)
53. Marquette (2)
56. Oklahoma St (2)

Ten lowest rated teams to earn an at-large (seed given):
69. St. Bonaventure (11)
63. Providence (10)
54. Syracuse (11)
51. Alabama (9)
49. Rhode Island (7)
48. UCLA (11)
47. Oklahoma (10)
45. Arizona St (11)
44. Kansas St (9)
42. NC State (9)


Does The RPI Matter As Much As It Used To?
One of the fascinating statistical quirks this year is that by pure computer numbers, the Pomeroy ratings were actually a slightly better predictor of NCAA Tournament seed than the RPI or the BPI. Last year, the BPI was the strongest.

Another piece of evidence for raw RPIs mattering less than they used to is which teams got left out. In the decade-or-so that I've been tracking this data, there have been plenty of RPI Top 40 teams to get left out, but only mid-majors. This year was the first time I tracked an RPI Top 40 major conference team getting left out, and in fact this year there were two - USC and Louisville (more on them in a moment). At the same time, while RPI 60+ teams have gotten in before (there's usually been 1 or 2 per year since the bracket expanded to 68), having two of them earn single-digit seeds is interesting. And Oklahoma State was on the bubble with an RPI of 88, which would have blown away the all-time record for worst RPI to ever earn an at-large bid (72, by Syracuse in 2016).

That said, while this is all a minor improvement, it's not a major one. The fact is that raw RPI ratings have never really been the way that RPI dominates the Selection process. Nobody in the last decade or two would ever argue that an RPI #30 team needs to be ahead of an RPI #40 team for that reason alone - they'd argue it using other metrics, such as "Record vs RPI Top 50". The problem with those metrics was that they have a huge major conference bias, since mid-majors can very rarely get RPI Top 50 teams on their home court. To fix this, the NCAA made a big show this year of switching to a quadrant system which rewards playing on the road.

This all sounds great! Unfortunately, it didn't quite go to plan...

"Quadrants? What are quadrants?"
What Is The Point Of Quadrants, Exactly?
After the NCAA made a big deal of the quadrant system this offseason, it's remarkable how little they ever actually came up yesterday. I watched the entire CBS/TNT Selection Show, and then a couple hours of ESPN bracket analysis, and I think I heard the phrase "tier" or "quadrant" come up just two or three times total.

To demonstrate the problem of ignoring one's own metrics, let's take the instructive case of Syracuse vs Middle Tennessee:

Record vs RPI Top 100:
8-11 Syracuse
4-6 Middle Tennessee

Record vs RPI Tiers 1+2:
7-11 Syracuse
5-4 Middle Tennessee

By the traditional "Record vs RPI Top 100" metric, it looks like Syracuse and Middle Tennessee both won about 40% of their games vs decent opponents, and thus we can give Syracuse the tiebreak because they had the higher RPI win (RPI #11 Clemson). But the quadrant/tier system significantly improves Middle Tennessee's numbers, and recognizes that they played a lot more on the road than Syracuse. In fact, Middle Tennessee led all of Division I with 12 road victories (a 12-1 record), while Syracuse went just 4-6 in true road games.

Of course, if we want to really use "analytics", we can abandon the crappy RPI metrics and just use Pomeroy's tier system:

Record vs KenPom Tier A:
3-8 Syracuse
3-3 Middle Tennessee

Once we look at non-RPI metrics, we recognize that in fact Syracuse's best win wasn't over Clemson at all (since it came at home), but on the road at Louisville. Suddenly, a Syracuse team which played all season in the ACC somehow ended up with its most impressive win coming over an NIT team? Woof. Meanwhile, Middle Tennessee's victory at Murray State rates really strong (not as strong as Syracuse's best win, but close, despite far fewer chances against elite opponents).

You can apply this same analysis to other quirky at-large teams. Those two RPI Top 40 major conference teams that got left out? USC was left out because they had literally zero RPI Top 25 wins. Louisville went an ungodly 0-11 vs the RPI Top 50. These are all traditional, RPI-heavy reasons why teams got left out. The fact that Louisville's record vs quality opponents looks different by the better metrics (a 4-10 record vs KenPom Tier A opponents, which is similar to other major conference bubble teams) didn't matter, because the Selection Committee is still stuck in RPI-based metrics.

When The RPI Does And Does Not Matter Is Instructive
In terms of pure strength of record, it's clear that the best resume left out was Nebraska. It was noted by quite a few analytics folks this year that the RPI was just way down on the Big Ten compared to better metrics. The league was certainly down, but not as much as the RPI thought it was. So it's not a surprise that Nebraska actually showed up as the strongest resume to get left out of the field via BPI despite clearly not even being a serious bubble team on Selection Sunday (only earning a 5 seed in the NIT). Heck, even Maryland was one of the ten best resumes left out according to BPI, and they couldn't even get into the NIT.

Yet interestingly, the glamor teams in the Big Ten didn't suffer this same fate. Michigan State was 14th in RPI and just 3-4 vs the RPI Tier 1 (a worse RPI Tier 1 record than San Diego State, who needed an auto bid to make the field), yet they still got a 3 seed. Ohio State and Purdue also were a seed line or two higher than their RPI resumes really should have put them (since the RPI really viewed the Big Ten as basically a strong mid-major conference this season). The Selection Committee still ranked those teams highly because they are sexy #brands with a couple of sexy wins on national television. The RPI data got ignored when it was convenient.

Thinking, Fast And Slow
The inconsistent use of RPI and the total abandonment of the very RPI quadrant system that the NCAA created this season right when it was convenient is a reminder that, fundamentally, the Selection process is irrational.

By that phrase, I don't mean that in the sense of "LOL what a bunch of morons!" Everybody in that Selection Committee room is a reasonably intelligent and accomplished adult. What I mean is the basic concept for the classic book "Thinking, Fast And Slow", by Nobel Prize winner Daniel Kahneman. Summarizing decades of fascinating psychological research, Kahneman points out that human thought can be fundamentally separated into two categories: System 1 (fast, automatic, stereotypic, unconscious) and System 2 (slow, effortful, infrequent, logical, calculating).

What the research shows is that pure rational thought is actually very difficult and emotionally taxing, so much so that the body reacts to it very similarly to how it reacts to a difficult physical exercise workout. It's why after taking difficult tests in school you often feel physically exhausted despite not leaving your chair for three hours. It's just hard damn work. Also, it takes a long time to solve even the most simple, linear problem, and we simply could not make it through life intricately breaking down each decision we make. And thus most of our decisions in life are System 1. A complex multi-variable problem like picking out an apartment (price, square footage, view, neighborhood, number of rooms, furniture, kitchen, amenities, parking, etc) will end up just coming down to a snap judgment - we all just invent a post-hoc narrative to explain what was fundamentally an emotional and irrational decision.

The issue with NCAA Tournament selection is that the process is actually monumentally complicated. We are supposed to judge dozens of different teams (and their opponents) by winning percentage, strength of schedule, strength of record, best wins, best records vs arbitrary quadrants, multiple team strength metrics, road warriors, conference titles, injured and suspended players, and more. As I wrote in 2014 when I (accurately) predicted how college football playoff selection would play out, it's simply an impossible problem to tackle. If I ask you to solve a series of math problems like 4y + 8 = 28 then I presume that most of my readers could solve that, even if it would get exhausting after a while, but those are linear problems. To solve NCAA Tournament selection is so complex that even a powerful computer model cannot really make sense of it (the BPI tournament odds that ESPN kept shoving down our throats in February and early March were a constant source of mirth and amusement).

So what happens when you have a problem too complex to solve? Daniel Kahneman explains that what we do is to reflexively solve the problem using System 1 thinking: We intuitively come to an answer, and then grasp for the justification later. So the Selection Committee makes snap mental judgments, driven heavily by subconscious stereotyping and narratives, and then decides later which metrics justify the decisions that they already wanted to make.

And this is why the mid-majors like Middle Tennessee will simply always lose those bubble battles with 8-10 Big 12 or 9-9 ACC teams. Subconsciously, the people on that Selection Committee simply will not understand that a road game at Murray State is equivalent to a home game vs a Top 25 team. It just doesn't mentally compute. When Selection Chair Bruce Rasmussen said that Middle Tennessee went out and scheduled good teams "but just didn't beat any of them", he apparently wasn't aware that they actually won their 2nd toughest non-conference game. He didn't realize that the road game at Murray State (which they won by 5) was a tougher opponent than the neutral court game vs USC (which they lost by 5).

In the end, System 1 thinking is always going to dominate in the Selection room. Forcing new quadrant metrics on them won't change anything, because fundamentally these are snap, irrational judgments. The quadrants will only get referenced when they are convenient to get referenced - they won't drive decisions. The only way to adjust the results of System 1 thinking is to get newer, younger, analytically-savvy people in the room who fundamentally, subconsciously, emotionally understand how difficult a road game is vs a decent mid-major. Then and only then will anything change.