Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples, (other than choosing which metric to focus on). If the metric I'm focusing on is yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions. On a case-by-case basis, it's easy to find reasons why any given player is going to buck the trend and sustain production. So I constrain myself and remove my ability to rationalize on a case-by-case basis.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of all my predictions from last year and how they fared. Here's a similar list from 2017.
What Is Regression To The Mean
For our first article of 2019, I think it's important to nail down exactly what regression to the mean is and why it is so powerful. I'd like to illustrate it with an example from basketball.
The free throw attempt might be the purest act in all of sports. There's no defense. There's no weather. The distance and angle never change. It is exactly the same every time: one player, one ball, one hoop, one shot.
For his career, Steph Curry shoots 90% on free throws, but on a game-to-game level, there's a little bit of variance. Imagine that in Week 1 of the 2019-2020 season Curry makes 3 of 6 free throws. (This would be a wildly uncharacteristic game, but it's not impossible; Curry once shot 1-of-4 and has twice gone 4-of-7.)
Nobody in their right mind would look at this game and conclude that Curry was suddenly a 50% free throw shooter, right? Instead, we'd think this game was an outlier and expect him to go back to hitting 90% the rest of the way. Because 90% is Curry's long-term mean, (or average), and we expect him to regress, (or return), to it.
Just like Steph Curry has an innate average free throw percentage, so does every player have an innate average talent level. And just as Curry's game-by-game results can deviate from that average, so can every player's results deviate from their own true mean. And just like we'd expect Curry to return to his average, we should expect all players to return to theirs, as well.
Jim Brown averaged 104 rushing yards per game for his career, the highest total in history. This doesn't mean he rushed for 104 yards in every game, though. In some games, he rushed for 200 yards or more. One time, he rushed 14 times in a game for 11 yards! Sometimes he strung together a couple of good games or bad games in a row. But no matter how well or how poorly he played over short spans, it was a safe bet that over longer timelines he'd produce closer to his career average.
Player performance in any given sample is a function of that player's talent plus a large element of random chance. The smaller the sample, the more random chance dominates outcomes. The larger the sample, the more random chance offsets (with good luck canceling out bad luck) and player talent dominates.
That's regression to the mean in a nutshell. It's a concept we all intuitively understand, even if we don't talk about it in so many words. And yet, despite understanding it, we all routinely ignore it when it comes time to look at players as individuals.
If we're going to take advantage of regression to the mean, there are four guiding principles we need to keep in mind.
Principle #1: Everyone regresses to the mean.
Principle #2: Everyone's mean is different.
The six leading passers through one week are Andy Dalton (418 yards), Dak Prescott (405 yards), Matt Stafford (385 yards), Case Keenum (380 yards), Patrick Mahomes II (378 yards), and Drew Brees (370 yards). All six of those men will average fewer yards per game going forward. I know this because no player's "true mean" is over 350 passing yards per game, which would translate to 5,600 yards over a full season. Peyton Manning holds the all-time record with 343.2 passing yards per game in his record-setting 2013 season, and that was the only time he even topped 300 yards.
But just because all six players will regress doesn't mean all six players will regress the same amount. There have been ten seasons in history where a quarterback averaged 320 or more yards per game, and Drew Brees owns five of them. In his first full season as a starter, Patrick Mahomes II became the seventh player in history to throw for 5,000 yards. Matt Stafford is the only player other than Brees with two of the top twelve passing seasons in history.
Andy Dalton, on the other hand, has been a starter for eight seasons and has never thrown for more than 4,030 yards. He's only even topped 4,000 yards twice. Which is still better than Case Keenum can claim; Keenum's best season to date was just 3,890 yards and his career average of 225 yards per game would translate to just 3600 yards over a full season.
Just because all six players are guaranteed to regress doesn't mean our expectations of all four players should be the same going forward.
Principle #3: Regression by itself doesn't change player order.
Let's say I have two mystery running backs. Player A is averaging 20 points per game, (or ppg), and Player B is averaging 18 ppg. I tell you that you can have your pick between them. Who do you choose?
Player A is certainly the bigger outlier. He's almost certainly going to regress more than Player B. But Player B is going to regress, as well, and unless we know something else about them we have to assume that Player A will still be ahead afterward. Maybe they average 13 and 12 ppg going forward, but you still want the player who is scoring more today. Keep this in mind the next time you see someone merely point to a player's high fantasy point total and cry "regression".
Principle #4: Regression operates on multiple dimensions.
Our third principle tells us we can't just look at a statistic we care about, (in this case, fantasy points), and apply the concept of regression directly. All that does is tell us that good players are likely to remain good, (if slightly less so), and bad players are likely to remain bad, (if also slightly less so).
But players are going to regress in several ways all at the same time. If a quarterback has an abnormally high number of pass attempts, but an abnormally low yard per attempt average, we should expect his number of attempts to come down... but we should also expect his average per attempt to come up.
Some dimensions are more stable than others. Rush attempts are much more predictable from week to week than yards per carry. Yardage totals vary a lot less than touchdown totals. By focusing on the secondary elements of a player's production that are most likely to regress, we can find ways to change the order of the list we actually care about.
So, for instance, if we want to find players who will score fewer fantasy points, we might look at players who are scoring a lot of touchdowns right now. And if we want to find players who will score more fantasy points, maybe we look at players who have lots of targets but a low yard per target average.
By combining these principles, we can get one step ahead of our leaguemates. We can buy and sell tomorrow's production at today's prices and consistently reap a profit. All by simply understanding regression, how it works, and how we can put it to work for us.