Tuesday, March 17, 2009

How Well Did The Computers Predict The Field?

If you're on this site looking for help filling out your brackets just scroll down to the next post. This post is going to tackle the issue of computer polls as objective tools for predicting the behavior of the Selection Committee. Before getting into the numbers I did have the feel that this was not one of the best jobs by the Selection Committee in terms of seeding teams (Boston College as a 7? Marquette as a 6? Wisconsin as a 12?), but I did feel like they picked the field fairly well. There was certainly bias in favor of the Big East and ACC, and against the mid-majors and the Big Ten. But let's see how the three major computer rankings (RPI, Sagarin and Pomeroy) did at predicting the field (for the record I'm using the Sagarin ELO_CHESS since it's the one that should apply to what the Committee is doing, since it only looks at wins and losses - the PREDICTOR is useful for predicting which teams will go far in the Tournament, not which teams will get in):

Ten highest rated teams to miss the Tournament (NIT seed given):
34 - San Diego State (1)
40 - Creighton (1)
46 - UAB (7)
47 - Illinois State (5)
48 - Saint Mary's (2)
49 - Niagara (3)
51 - George Mason (7)
53 - Tulsa (4)
54 - Florida (1)
56 - Baylor (3)

Ten lowest rated teams to earn an at-large (seed given):
62 - Arizona (12)
60 - Boston College (7)
55 - Maryland (10)
45 - Wisconsin (12)
44 - Michigan (10)
42 - Minnesota (10)
41 - Texas (7)
39 - California (7)
37 - LSU (8)
36 - Texas A&M (9)

Ten highest rated teams to miss the Tournament (NIT seed given):
35 - Penn State (2)
42 - Miami (Fl) (4)
44 - Saint Mary's (2)
45 - Northwestern (5)
46 - San Diego State (1)
48 - Creighton (1)
51 - Notre Dame (2)
52 - Virginia Tech (2)
53 - Baylor (3)
55 - Providence (5)

Ten lowest rated teams to earn an at-large (seed given):
50 - Arizona (12)
47 - Tennessee (9)
43 - Dayton (11)
40 - Maryland (10)
39 - Texas A&M (9)
38 - LSU (8)
37 - Boston College (7)
34 - Texas (7)
33 - BYU (8)
32 - Utah (5)

Ten highest rated teams to miss the Tournament (NIT seed given):
26 - Georgetown (6)
31 - Washington State (7)
34 - San Diego State (1)
37 - Notre Dame (2)
38 - New Mexico (3)
40 - Miami (Fl) (4)
41 - Kansas State (4)
42 - UAB (7)
44 - Florida (1)
48 - Kentucky (4)

Ten lowest rated teams to earn an at-large (seed given):
83 - Dayton (11)
59 - Boston College (7)
54 - Maryland (10)
52 - Texas A&M (9)
49 - Michigan (10)
46 - LSU (8)
45 - Minnesota (10)
43 - Butler (9)
39 - Arizona (12)
36 - Ohio State (8)


Now the first thing I want point out is that we shouldn't have expected the Pomeroy ratings to be a good predictor of bracket position. It is a rating of how good teams are, rather than how good their resumes are. The Sagarin PREDICTOR is also a horrible predictor of Tournament seed, for the same reason. Use the Pomeroy (along with the PREDICTOR) to make your bracket picks now, but don't expect either to be a good predictor of where your team will be seeded by the Selection Committee.

The second thing I want to point out is that, as expected, the Sagarin ELO_CHESS was a much better predictor of the brackets than the RPI. The second best resume left out (Miami) was only ahead of the three worst resumes that got in (Arizona, Tennessee, Dayton). That's pretty remarkable when you consider that the Sagarin ratings don't take into account things like injuries, streaks at the end of the season, and the name on the uniform (all of which are factors with the Selection Committee). If Arizona wasn't called "Arizona" they almost definitely would have been left out of the Tournament, and Sagarin's computer doesn't know that.

Finally, remember what I was telling people all through February and March: it is very rare for a team with a Sagarin ELO_CHESS outside of 50th to make the Tournament as an at-large. That held again, with no team worse than 50th getting in. I'm sure there are a few examples of teams around the 53-55 range getting in from years past, I just can't think of any off the top of my head. I also talked about how you never see teams with a Sagarin inside 35th missing out (unlike the RPI), and this year the only team inside the Sagarin Top 40 to miss was Penn State. Sagarin can't take into account things like the Selection Committee having a big anti-Big Ten bias and not wanting to give that conference an eighth team. If Penn State was in the Big East or ACC they'd most likely have been in the field.

So the Sagarin isn't going to be perfect because it can't take into account things like personal biases in the Selection Committee. But it's a pretty darn good predictor. So if you don't trust bracketologists like myself, just look at the Sagarin ELO_CHESS. If your team is 60th, they've got work to do. If they're in the Top 40 they're looking good. And if you're talking to somebody who isn't convinced by that argument, just send them the link to this blog post. I'll plan to do another analysis like this after each season from here on out. I hope you enjoyed seeing this information.


Evilmonkeycma said...

Jeff, I left out Penn State, and here's why (this may or may not be legit, but here it is). I had several resumes I was looking at for the last two spots - the usual suspect, minus Arizona (I felt they were fairly solidly in, although I was wrong). I couldn't really decide, so I looked at how well they scheduled. Penn St. fails hard on the "Who you played" count.

Novice said...

Who do you see in your final four?

Which 3+ team do you think has the best chance of winning it all?

Jeff said...

Novice, I don't pick the bracket exactly, because I view that as a silly exercise. Nobody can predict the 63 games correctly because there's too much luck involved. So rather than give an exact bracket, I talk about more general concepts and teams and match-ups. If I pick Memphis to beat Missouri, that doesn't mean it's not worthwhile to talk about what Missouri's chances would be in the Elite 8.

So I've gone through each bracket. I talk about likely upsets, possible Cinderella teams, and which top teams have the easiest routes. I hope that all helps.

Anonymous said...

"There was certainly bias...against...the Big Ten."

Actually, your stats don't back that up. They show the opposite. If you look at the Sagarin rankings you might think that there's an anti-Big Ten bias. But if you look at the RPI and Pomeroy, you would see that the Big Ten got extremely favorable treatment.

First, let's look at the highest rated teams to miss the tourney:
RPI -- 0 Big ten teams listed
Pomeroy -- 0 Big ten teams listed
Sagarin -- Penn State and Northwestern are listed

Now, let's look at the lowest rated teams to be put in the field:
RPI -- Wisconsin, Michigan, Minnesota
Pomeroy -- Michigan, Minnesota, Ohio State
Sagarin -- None

So, according to the RPI and Pomeroy, ZERO highly rated Big Ten teams were left out, and FOUR low rated Big Ten teams made the field. Compare that with San Diego State, who was in all three lists of the highest rated teams to miss the field. I think your argument that there was bias against mid-majors holds a lot more water.

I think what the committee did is pretty clear here. They favored teams who played strong out of conference schedules (Arizona), and punished those who did not. Good for them. Penn State's best out of conference wins were against Georgia Tech (87), Mount St Mary's (141) and Sacred Heart (168). All the rest were 200 and 300+ rated teams. Sorry, but I think a team that scheduled like that should be punished.

Jeff said...

I disagree. First of all, the point is that the Selection Committee rightly ignores the RPI and Pomeroy. The RPI is an objective metric, but it's not very accurate, so trying to get too much information out of it is a mistake. And Pomeroy doesn't rate teams by how good their resumes are, so it's not measuring what the Selection Committee is measuring.

Since the Sagarin's ELO_CHESS is the best example of an objective opinion on what the Selection Committee is trying to measure, it's the one we should be looking at. And as you admit, Sagarin shows that the Big Ten was biased against.

And as for rewarding tough out of conference scheduling, that is true and that is good, but it still doesn't explain why Arizona got in. I think the general opinion out there is that Saint Mary's should have gotten in over Arizona (I believe that as well), as according to Pomeroy (the only objective measurement of out of conference strength of schedule that I'm aware of), Saint Mary's played a tough out of conference schedule. And that's with many of Saint Mary's opponents ending up not being nearly as good as they were supposed to be when scheduled (Southern Illinios, Oregon and Kent State, for example). If the Selection Committee was just rewarding tough scheduling, then they'd have to have put Saint Mary's in.

The problem is that most people are looking at the bracket through the alternative reality that ESPN has created nationwide, that the Big East is the best conference, with the ACC a bit back, and the Big Ten is embarrassingly bad. In reality, anybody who watched a lot of games objectively (and the computers agree) sees it as the ACC being the best conference, with the Big Ten slightly behind, and the Big East a way back in third.

Imagine a Big East team going through a relatively tough out of conference schedule, then going 10-8 in the Big East, then knocking off the top seed en route to a run to the Big East title game. Any chance of that team getting an 8 seed? Of course not. But that's Ohio State. And I don't think there's any question that going 10-8 in the Big Ten (or 9-7 in the ACC) is much tougher than going 10-8 in the Big East.

Of course, this will all settle out when the field plays. If you read my previews, I feel like Purdue and Wisconsin will both outplay their seed, while only potentially Illinois will underplay. And meanwhile, I see Syracuse, Villanova, Louisville and Marquette underplaying their seed, while I see West Virginia as the only Big East team likely to outplay its seed. We'll see if that holds.