Friday, July 15, 2011

Tempo, Clutch Play, And Why Mariano Rivera Was More Valuable Before He Was A Closer

Quick quiz:

On January 16th, 2011, Notre Dame went into Madison Square Garden to play St. John's and got wiped out by a score of 72-54. Eight days later, on January 24th, Notre Dame bounced back with a 56-51 win at Pittsburgh. My question: which game featured better offense, and which game featured better defense.

If you asked the average college basketball fan, or the average college basketball television analyst, they'll tell you that clearly the St. John's game featured better offense, while Notre Dame struggled but overcame the suffocating Pitt defense. And they'd be wrong. The ND/Pitt game featured a 44.0 FG%, 53.0 eFG% and 16 turnovers in 49 possessions. The ND/SJU game featured a 45.2 FG%, 49.4 eFG% and 35 turnovers in 70 possessions. So overall, the shooting was better and the turnovers were rarer in the ND/Pitt game. The offenses were better. But the game was much slower. And while the effect of tempo on raw statistics is well understood by many online writers and bloggers, it continues to be completely misunderstood by announcers and casual fans. But college basketball is not alone - baseball is another sport where casual fans and television announcers are completely wrong about some of the most important parts of the game.


What Do We Know About Closers?

Sabermetrics has gained quite a foothold in popular culture. Most casual fans understand the importance of on-base percentage and the fact that wins and RBIs really don't matter. Felix Hernandez won the Cy Young Award last season despite a 13-12 record. Heck, Brad Pitt is playing Billy Beane in a movie coming out in September. I've even seen Sportscenter list a player's BABIP. But if there's one aspect of the game that is misunderstood more than any other it's the concept of closers. Other than with a few very progressive sabermetric writers (such as Joe Posnanski, who wrote a great article I'll link to later in this post) it's taken as a rule of the game that:

1) Your best reliever needs to be the closer
2) your closer needs to be saved until there's a save situation (except in very rare occasions that means starting an inning in the ninth or later with a lead of 1-3 runs)
3) Saves are a meaningful statistic
4) A great closer is as valuable as an ace starter, and is a necessary part of a great team.

In fact, all four of those statements are false.


Mariano Rivera Was More Valuable As A Setup Man

I'm a diehard New York Mets fan. They're my most beloved team in any sport. Just three days ago the team dumped its closer, Francisco Rodriguez, for a couple of minor prospects, to get out of a potential $17.5 Million option for next season. I mentioned to a friend that even if the Mets had unlimited funds (they are expected to cut about $20 Million off their payroll next season) that even the greatest closer in history in their prime wouldn't be worth $17.5 Million in a season. He responded that while K-Rod Vintage 2011 wasn't worth that money, that Mariano Rivera in his prime was. And I disagreed, using a stat called WAR (wins above replacement), which encompasses all of the value of a player over a generic "replacement" player, as well as the "leverage" of the situation (more on that in a moment, but briefly it's a measure of how valuable or "clutch" a moment is in the game). There are several different formulations for WAR, but they're all very similar, and I'm using the baseball-reference.com version for consistency.

Mariano Rivera is without much doubt the greatest closer of all-time. He first became a closer in 1997, and as such has played 14 full seasons as closer (I'm not including stats for the 2011 season, since it's incomplete). In those 14 full seasons as closer he made the All-Star team 11 times. These are his average stats per season:

69.2 innings, 39.6 saves, 2.02 ERA, 8.03 SO/9, 3.38 WAR

Those save, ERA and strikeout stats are all tremendous. But a WAR of 3.38? In fact, his most productive season as a closer was in 2004 (career-high 53 saves, 1.94 ERA) when he had a WAR of 4.8. To put that in perspective, Johann Santana led AL pitchers that season with 7.4. Barry Bonds led all players with 12.4, but even though that was steroid fueled, the almost surely clean singles hitter Ichiro Suzuki had 8.1 that season. In the fifty years since the beginning of the expansion era, nine different pitchers have had a WAR of greater than 10 in at least one season of their career, all of whom were starters (Steve Carlton, Bob Gibson, Dwight Gooden, Sandy Koufax, Wilbur Wood, Gaylord Perry, Roger Clemens, Dick Ellsworth, Pedro Martinez).

The most interesting part of Mariano Rivera's stats are his 1996 numbers. For those that don't remember, he began his career as a setup man for John Wetteland. Rivera's best season as a setup man in 1996. Here are his stats from that season:

107.2 innings, 5 saves, 2.09 ERA, 10.9 SO/9, 5.4 WAR

His raw pitching stats are really similar to his closing days, yet he had 5.4 win shares. His most productive season came when he was a setup man, not as a closer. Why? Most importantly, he pitched a whole lot more. He has never pitched more than 80.2 innings in a season since becoming closer, and has averaged under 70, yet pitched 107.2 in 1996. Weren't those innings less important, though? Let's make that a new section:


Doesn't The 9th Inning Matter More Than The 8th Inning?

The short answer is that, on average, "Yes". But not always, and not by as much as you'd think. Sabermetricians use a stat called "leverage" (generally abbreviated "aLI" for "average leverage index"), which weights situations by the typical leverage they have on a team's chance of winning each game with 1.0 being an average situation, and greater than 1 meaning a more important situation. Rivera as a closer has had an average aLI of 1.9, with career highs of 2.2 on three occasions (including 2004). Yet in 1996 as a setup man his aLI was 1.5.

For perspective, let's look at Mariano Rivera and his setup man this season, David Robertson:

Rivera: 34.0 innings, 1.85 ERA, 22 saves, 2.1 aLI, 1.8 WAR
Robertson: 35.1 innings, 1.27 ERA, 0 saves, 1.6 aLI, 1.7 WAR

So Robertson has been the slightly better pitcher, while Rivera has done it in slightly more important situations, and so overall their value to the team has been identical. But of course, let's compare it to the team's ace pitcher:

CC Sabathia: 145.2 innings, 2.72 ERA, 0.8 aLI, 3.6 WAR

Sabathia hasn't had the sterling ERA that Rivera or Robertson has had, but he's been able to do it over nearly five times as many innings. And so despite the fact that he's often pitching in lower-than-average situations (the early innings of a game, naturally, are lower leverage than later innings, all else equal), he's been as valuable as Rivera and Robertson combined.

There's an even more interesting situation if we flip over to the big rival of the Yankees, the Red Sox. Here are the stats of the closer (Jonathan Papelbon), the setup man (Daniel Bard), and the team's two best starters (Josh Beckett and Jon Lester):

Papelbon: 36.2 innings, 3.93 ERA, 20 saves, 1.5 aLI, 0.4 WAR
Bard: 44.0 innings, 2.05 ERA, 1 save, 1.4 aLI, 1.5 WAR
Beckett: 111.0 innings, 2.27 ERA, 1.0 aLI, 4.1 WAR
Lester: 114.1 innings, 3.31 ERA, 1.0 aLI, 2.8 WAR

Here we have an even more blatant situation where the setup man (Daniel Bard) is a better pitcher than the closer (Jonathan Papelbon). And because manager Terry Francona has done a good job of getting Bard in games to clean up difficult situations in the middle innings, his aLI is nearly identical to the closer, and as such he has been a much more productive player. But even then, Bard has been less productive than a starter who is only 31st in the majors in starter ERA (Jon Lester), and far less productive than the team's ace (Josh Beckett). Despite not having quite the same ERA as Bard, and pitching in lower leverage situations, Beckett has performed in nearly three times as many innings, and as such has been nearly three times as productive.

An even more interesting example is Dennis Eckersley, who was a starter for 12 years before converting to the bullpen and becoming a closer, where he became the man who was probably the best closer there was prior to Mariano Rivera. Eckersely's peak as a closer came from 1988 through 1992. His five first years in the majors were his best five stretch as a starter, so we'll compare those two five-year stretches:

1975-1979: 1148.1 innings, 3.12 ERA, 6.7 SO/9, 77-50, 3 saves, 1.0 aLI, 25.7 WAR, 1 Top 5 Cy Young finish
1988-1992: 359.2 innings, 1.90 ERA, 9.5 SO/9, 24-9, 220 saves, 1.7 aLI, 12.7 WAR, 1 Cy Young, 1 MVP, 3 Top 5 Cy Young finishes

There's no question that Eckersley was better as a closer than as a starter - his stats across the board were better. And he played in more important situations as a closer. yet because he pitched more than three times as many innings per season as a starter, he actually was twice as productive per season.

In Eckersley's 12 first seasons of his career he had 359 starts and 3 saves, and 42.1 win shares. In his latter 12 seasons he had 2 starts and 387 saves, and 16.6 win shares. Ask most people today and they won't even remember that Eckersley ever started a game. Yet even as a "good" starter he was more valuable to his team than as the "greatest ever up until this point in time" closer.


All Those New-Fangled Stats Contradict What I See With My Eyes: They Must Be Wrong!

Win shares are not a perfect stat. There are no perfect stats. So I won't pretend that there aren't alternative measures of a player's value to his team (such as WPA), but the relative measures are the same. The fact is that the greatest closer ever, in their prime, isn't even close to being as valuable as an ace starting pitcher, or a leading hitter.

The first criticism of the analysis on Rivera is that he's such a "clutch" player, and that in the biggest games he's at his best. Certainly his postseason stats are staggering (a 0.71 ERA, which is the lowest of any pitcher ever), though it's worth noting that the sample size is small. If he had to pitch 400 postseason innings in his career, no matter how good he is, his ERA is most likely going to rise just out of randomness. And while Rivera gets some credit for being so great in the postseason, the reality is that the difference between a "clutch" and "not-clutch" player just isn't all that great.

Mainstream sportswriters often criticize stat-heads for not believing that players can be clutch. This is in fact a strawman - statisticians not only accept that clutch play exists, but have attempted many ways to quantify it. But the reality is that clutch play just doesn't matter that much - it's a minor rounding error. Our perception of clutch behavior is the ultimate confirmation bias - if we believe a player is clutch then we'll remember their clutch successes, and immediately forget their clutch failures.

There is an interesting set of data here, which is a collection of stats for all active hitters who played in at least 1000 games between 1996 and 2011 (I can't find the same data for pitchers - it would cost money I don't want to spend to build the same data myself). They are ranked by "WPA_Clutch", which is a weighted measure that effectively calculates how much better a player is in the most clutch situations relative to how they are on average. And naturally, Alex Rodriguez is second from the bottom. But there are many more interesting things on this chart.

First of all, the average hitter is worse in clutch situations than in ordinary situations - the average WPA_Clutch rating is negative. It makes sense that pitchers are better in clutch situations than hitters because they have to compete against 20+ players in a row, while batters can zone in and out for three hours while just focusing on their at bat. Pitchers will be able to narrow the intensity gap in clutch situations. This also probably has something to do with why relief pitchers tend to have better ERAs than starters, and why good pitching dominates the playoffs more often than good hitting.

Second, the players we think are "clutch" often aren't. There's no player more famous for his clutch play than Derek Jeter, yet he's actually below average in the clutch. Albert Pujols is also below average in the clutch, though this is just a reflection of how insanely good he is in non-clutch situations - you'd still want him up at the plate in key situations.

And that latter point is what you should draw from that data. As bad as Alex Rodriguez is in the clutch, that's just in respect to how awesomely good he is in non-clutch situations. He is still far better in absolute in the clutch than Derek Jeter is. Even in Jeter's prime, you should still prefer A-Rod at the plate in a big moment. The difference between a "clutch" and "non-clutch" player just isn't that great. I've talked many times in the past on this blog how there just is very little correlation between teams that have won a disproportionate number of close games in the past and future performance in close games. "Clutch" is simply the most overrated concept in sports.

Joe Posnanski wrote a fascinating article last year on this concept with respect to closers. What he did was he compared the likelihood that major league teams had of winning when entering the ninth inning with the lead, which even the most progressive statistician would assume had risen over the past 60 years as we went from starters closing their own games to the era of specialized closers with microscopic ERAs. Yet what Posnanski found was effectively zero trend - major league baseball teams, as a whole, have averaged approximately a 95% conversion rate in those situations for 60 straight years. Below is the key section, including an emphasis on the great Mariano Rivera:

Teams held 95.5% of their ninth-inning leads in 2010. Teams held 95.5% of their ninth-inning leads in 1952.

Well, that shocked the heck out of me. It’s not quite that simple, though. There have been a few anomalies, yes. For instance, in 1957, teams held only 92.7% of their ninth-inning leads — easily the lowest percentage over the last 60 years. That was a year for comebacks. And the highest percentage was in the strike year of 1981, when teams held 97.6% of their leads — that probably would have normalized over a full schedule.

Other than that, though, the best winning percentage for ninth-inning leads is .958. It has happened four times — 2008, 1988, 1972 and 1965. That pretty much covers the entire spectrum of bullpen use. It doesn’t change. Basically, teams as a whole ALWAYS win between a touch less than 94% and a touch more than 95% of the time. This has been stunningly, almost mockingly, consistent. The game has grown, the leagues have expanded, the roles have changed, the pressure has turned up, but the numbers don’t change.

Here, I’ll give you another example. Most of us would agree, probably, that Mariano Rivera is the greatest closer in the history of baseball, right? I mean, we can have that argument another time, but I think it’s Rivera, and you probably think it’s Rivera, and since he became a closer in 1997, the Yankees have won a rather remarkable 97.3% of the time when they lead going into the ninth inning. I don’t have an easy way to compare that to everyone over the same time period, but I’d bet that’s the best record for any team. In 2008, the Yankees won all 77 games the led going into the ninth. Most years they lose only once or twice.

So that would seem to indicate that Rivera DOES make a difference. And I think he does make a difference — compared to other closers.

But … consider the 1950s New York Yankees. Dominant team, of course. The bullpen was an ever-shifting thing, though. One year, Ryne Duren was their main guy out of the pen, another year it was Bob Grim or Art Ditmar or Tom Morgan or Tommy Byrne or Jim Konstanty … well, the names changed all the time. The bullpen changed all the time. Casey Stengel seemed to shift strategies every now and again, probably to keep things interesting, starters finished many more games, and anyway the game was very different then and …

From 1951 through 1962, the New York Yankees held 97.3% of their ninth-inning leads. If you carry it another decimal point, they actually held a slightly HIGHER percentage of their ninth-inning leads than the Mariano Yankees.


So having a great closer does matter.... but very little. Posnanski suggests that teams hold their best relievers as "setup" men, so that they can be brought in to the most important situations (which often occur before the ninth inning), and also so that they can be kept on the roster for lower salaries (since saves, as meaningless as they are, mean dollars in arbitration and free agent negotiations). While it would be hard to get away with this too much in the modern era, with aggressive agents and a Players Union that will come down on any strategy it thinks suppress salaries, there is something you can do. Posnanski suggests that the Red Sox are trying this now with Daniel Bard, keeping him as the setup guy so they're not stuck using him in the ninth inning, and so that they can sign him to a smaller contract when he becomes a free agent.


What Does This Have To Do With College Basketball?

Well, we're out of season, when I tend to make arguments while using examples from other sports (see here, here and here for examples). And to me, the obsession with closers in baseball is similar to the obsession with statistical totals in basketball, rather than tempo-free stats.

At the top of this post I talked about a relatively high-scoring Notre Dame/St. John's game that actually featured very good defense and mediocre offense, as compared to a very low-scoring Notre Dame/Pitt game that actually featured good offense and mediocre defense. If there's one thing I've been harping on for years on this blog it's that the fact that a team scores more points than another does not mean that their offense is better. Notre Dame and Pitt are among the teams (Wisconsin and Ohio State are others) that are constantly referred to by sportswriters and television broadcasters and analysts as "great defensive" teams that struggle on offense. Yet they are all teams that are typically better offensively than defensively, but score fewer points because they play a slow tempo.

A famous game that happened last season was Penn State's 36-33 win over Wisconsin in the Big Ten tournament, which was derided across the sports world as proof of how horrible Big Ten basketball is, and how those offenses are among the worst in the country. And of course, the offenses were bad that day - neither team could shoot at all. The teams combined for a 31.1 FG% and a 33.9 eFG% with 12 turnovers. But compare that to another game from the same day: a 52-51 win for Virginia Tech over Florida State in the ACC tournament. Those teams combined for a 38.8 FG% and a 43.7 eFG% with 30 turnovers. Virginia Tech and Florida State did shoot a bit better, but they were also a lot sloppier with the ball. You'd probably give the slight offensive edge to the ACC game, but only barely, yet nobody considered that game proof that the ACC was playing boring 1960s Four Corners basketball. Why? Because the FSU/Va Tech game featured 61 possessions while the Wisconsin/Penn State game had 42. So overall the ACC game had 0.84 PPP, while the Big Ten game had 0.82 PPP. Basically identical.

And it was those 42 possessions that should have stood out. Not only was it the lowest number of possessions in any D-I game all season, but the second fewest was the 48 possessions in the aforementioned Notre Dame/Pittsburgh game. There were actually only five Division I games all season with fewer than 53 possessions, and the 42 possessions were the fewest in any D-I game since 1998. So the Wisconsin/Penn State game had bad shooting, but not historically bad. What was historic was the pace. Yet how many people did you see mention that (I was one of them)? The story should have been the pace, not the offense.


Why Would Anybody Want To Watch Paint Dry?


I understand that fans prefer watching uptempo games. With all else equal I prefer watching uptempo games too (although I'd rather watch a slow but fundamentally sound game over an uptempo but sloppy game). But actually, on average the differences in tempo are hard to notice. Yes, the 42 possessions in that Wisconsin/Penn State game were ridiculous. But no team averages anything near that.

Wisconsin led the entire nation in fewest possessions per game last season, and Penn State had the fourth fewest, yet their averages were 57.3 and 59.9 per game, respectively. In comparison, Iowa was the speediest Big Ten team with 67.2 possessions per game. Providence led the Big East with 72.2, but they had to play a gimmicky game to make up for a lack of talent. If you look at the 11 Big East teams that went to the NCAA Tournament, the fastest tempo was Marquette (68.4). South Carolina and Arkansas led the SEC with 68.1. North Carolina led all major conference teams with 72.8 possessions per game.

How much does that work out on the clock? North Carolina averaged 16.5 seconds per possession. Marquette averaged 17.5. Penn State averaged 20.0. Wisconsin averaged 20.9. And keep in mind that Wisconsin's slow pace this past season was a statistical anomaly. They were the slowest major conference team in 2009-10, but with 59.9 possessions per game (20.0 seconds per possession).

Compare those averages to the variations. Wisconsin did have their Penn State game with 42 possessions (28.6 seconds per possession), but they also had 64 possessions (18.9 seconds per possession) in regulation against UNLV and 65 in a tune-up against Prairie View A&M. North Carolina had some crazy fast games, like 94 possessions (12.8 seconds per) against LIU in the NCAA Tournament, and 81 (14.8 seconds per) in the regular season against Maryland. But they also played a 58 possession game (20.7 seconds per) against Boston College (a dreadful 48-46 game that actually had fewer points per possesion - 0.81 - than that famous Wisconsin/Penn State game).


What's The Point?

The point is twofold. Basically, we judge tempo backwards. We fans tend to have the perception that there is a dramatic difference in pace and style between the conferences. Yet in fact, the overall difference between the uptempo ACC and the "watch the paint dry" Big Ten is less than a second per possession. And the difference between the absolute fastest and slowest major conference teams in a typical season is only 3-4 seconds per possession. It's noticeable, but not by much. Nobody is yelling at their tv because a possession took three seconds longer than they expected. Even Wisconsin sometimes has fast breaks, and even North Carolina sometimes has a shot clock violation.

Conversely, we all tend to forget about tempo when looking at individual games. Yet this is where the real variation is. Even Wisconsin sometimes plays games at a faster pace than the slower North Carolina games. There is a wide variation in tempo from game-to-game, and that's what we should be paying attention to. There's no reason why when we see a 90-87 score that we should immediately assume it was great offense rather than a really fast tempo. And there's no reason that when a score is 36-33 that we should immediately assume it was historically bad offense instead of a historically slow tempo.

If you're a regular reader you know that I constantly feed you a diet of tempo-free stats. I tell you PPP, I give you OR% instead of raw rebounding numbers, et cetera. And this is why. The fact is that your opponent is no less likely to score if it just took you 5 seconds to score or 35 seconds to score. What matters is how many points you score before you give your opponent the ball back.

Tempo-free stats are not intuitive, the same way that it's not intuitive that even the greatest closer in baseball in his absolute prime isn't even worth half of what an elite starter is worth, or that Dennis Eckersley did more for his team when he was a good starter than when he was a Hall of Fame closer. But it's the truth. And the truth is what I try my best to present to you here.

No comments: