Friday, November 06, 2015

How Many Close Games Should A Team Win?

There are a lot of ways to differentiate between analytics and anti-analytics in the world of sports, but one of the biggest differences is certainty vs uncertainty.

A common strawman against analytics is that its supporters believe whatever a computer tells them. That they believe that if one player has 8.7 fWAR that he must necessarily be the MVP over the guy with 8.4 fWAR.

You can probably find some random dude in some website's comments section saying this, but this certainly is not what the inventor of fWAR intended it to be used for, nor is it analytics orthodoxy. In contrast, it is cranky old newspaper columnists, radio hosts and television screaming heads will tell you with absolute certainty who the best player is, or who the five best pitchers are, or which team is going to win their league. You know, the "HOW CAN YOU RANK TCU AHEAD OF BAYLOR WHEN BAYLOR BEAT TCU HEAD TO HEAD YOU MORON!" type.

What analytics tells you, and what numbers in general will tell you, is a probability distribution. Using fWAR is a fine way to start an MVP debate, but there is a margin of error in all of those numbers. Certain statistics are weighted more than others, and these weights can be debated and disagreed upon. Certainly they are incomplete metrics.

A similar dichotomy occurs in college basketball when we rank teams. The anti-analytics crowd loves to pick out a computer rating that they disagree with and then declare how stupid it is that a computer thinks [/Team A] is ranked [/Ranking].

Of course, this isn't how the computers work. Ken Pomeroy will tell you himself that just because his rating says that a team is 8th in the country does not mean that he is particularly confident that the team is precisely the 8th best team in the country. Or even that they're somewhere between 6th and 10th. It's a rating that has a margin of error and a limited sample size.

There's a lot of conventional analytics wisdom in college hoops that has been fairly well established. For example, the fact that homecourt advantage is basically the same everywhere - it's no tougher to win at Cameron Indoor than it is to win at The Pope (assuming you're playing the same opponent in both places). Another conventional analytics wisdom is that 3P% defense is almost entirely luck. Another one is that free throw percentage defense does not exist, no matter what fun distracting stuff kids come up with. And another one, and perhaps most controversial, is that the results of close games are "random".

The media will fight to the death over "clutch" play, because it makes their game reports so much simpler to write if they can treat the results of the close games as the will of a magical "clutch" player who "willed his team to victory". It's simply boring to write "[Team A] got lucky on a few bounces to preserve their two point win over [Team B]".

In the abstract, every reasonable fan and analyst will admit why, in principle, very close games are basically out of control of the players. A ball rolling in or out on the final shot is a roll of the dice. Maybe a ref makes a controversial block/charge call in the final minute that swings a game. You can call up a perfect final play and your guy just misses the layup, or you can play perfect defense only for the other team to nail a thirty foot game-winner. Weird stuff happens in close games.

But even with that settled, fan bases will always manage to create a narrative every time it is their team that has had a long run of close wins or losses. You've heard all of the arguments before. "We have a great ball handler who dominates the ball and takes over late in games". "Our star's shooting stats in the clutch are incredible". "This team just has so much experience. They know how to win." "Wouldn't you want to have [/Star Player] on your team on the final possession?" or "He's just a great/terrible coach at drawing up plays out of timeouts."

There are more reasonable arguments against the idea that "close games" are 50/50 propositions, which is that the final score doesn't always indicate how close a game was. "Game control" is a metric worth considering.

All of this brings me back to my original point: Probabilities. It would be false to say that games with a final scoring margin less than "X" are 50/50 propositions while games with a final scoring margin above "Y" are "blowouts". What we can instead give you is a sliding scale.

The statistics don't say that clutch play doesn't matter or that a one point win isn't better than a one point loss. What they say is that we shouldn't treat a one point win equivalent to a forty point win. A one point win is better than a one point loss, but on average it's worse than a three point win.

The question I pose here is as follows: Is there a way to define "close" games and "blowout" games reasonably, so that we can look at a team's record in close games and reasonably approximate how many games they should have won?

Below, I have taken data from the Top 25 teams in the Pomeroy ratings for the 2014-15 season, and separated out their results into games vs Pomeroy Top 50 opponents (i.e. similar quality opponents) and games vs Pomeroy 51+ opponents (i.e. weaker opponents). Anywhere you draw that line will be arbitrary, and I'm not taking into account home/road or anything else, but it's a reasonable start.

Either way, for each group of games, I calculated the winning percentages for games that went to overtime, games decided by one or two points, games decided by three or four points, et cetera. Here is the full data:

The first observation that I have is that games between similar quality opponents are pretty much 50/50 toss-ups for scoring margins fewer than 6 points or in overtime. For games decided by 7 to 20 points, the win probability is closer to 60%. For 21+ point blowouts, the win probability is a firm 80%.

For games between mismatched opponents, however, quality teams do have a clear advantage in close games, though it's only around 60% for games decided by 4 points or fewer. For games decided by 9 or more points, the Pomeroy Top 25 teams won more than 90% of the time. It's worth noting that games against weaker opponents are more likely than not happening on the home court, which means that if there is a crucial referee call in the final minute that it's probably going in their favor, which also partially explains the >60% winning percentage.

In other words, there is a fine line here. Between equally matched opponents, we would expect games decided by a few possessions to be basically toss-ups. However, we would expect the better team to win the lion's share of games decided by five or more points.

These numbers make sense if we think about them. If you have two relatively evenly matched teams, you will need a large sample size of time to figure out which team is better with a reasonable level of confidence. A few possessions, or even a five minute overtime period, just are not enough time to overpower random chance. On the other hand, if you have two mismatched teams then five minutes really should be enough for the better team to score more points most of the time.

So in the future, when you see that a team has a 3-0 record in overtime games or is 7-3 in games decided by six points or fewer, what you need to ask for next is context. One could take this analysis into excruciating detail, but, I think a reasonable rule of thumb is that for games decided by six points or fewer, teams should expect to win ~50% against equal competition compared to ~70% against cupcakes.

To bring this full circle, what this analysis does not say is that "[Team A] definitely should have won X games this season". Everything has a margin for error, and everything can be up to interpretation. However, if a team goes 12-1 in games decided by six points or fewer, I do think that we can say with a high degree of confidence that they're not as good as their record. Because over a large enough sample of games, even with vastly superior talent, that simply is not sustainable.

This shouldn't be controversial. But I can assure that this coming season, just like every other season, it will be.

No comments: