Regression to the mean. If you've been playing fantasy football for more than three minutes, you've heard the phrase, typically wielded like a cudgel against a player that one owner or analyst didn't see coming as a way to justify their failure.
Someone has a good game? He's going to regress. Someone has a bad game? He's going to regress. Someone has a good season? He's going to regress. Someone has a bad season? He's going to regress. In many ways, regression is the Muzak of fantasy football analysis: bland, unobjectionable, engineered to blend seamlessly into the background without drawing attention to itself.
This kind of regression talk is little more than a talisman we invoke, a prayer to the fantasy gods, a ward against the unknown. It is often a tool of explanation at the expense of understanding.
That stuff about players coming off of good seasons and players coming off of bad seasons all regressing? It's true, all of it, every word. It's also useless. A receiver coming off a 1600-yard year will probably regress. So will a receiver coming off a 1400-yard year. And, all else being equal, the 1600-yard receiver will still be ahead afterward.
(“All else being equal”, it should be noted, is another favorite talisman of fantasy analysts.)
And more, guys coming off of boring, generic, ho-hum seasons are going to regress, too, in ways both visible and invisible. But the mean that Ted Ginn Jr is regressing to is not the same mean that Antonio Brown is regressing to.
The truth is that regression to the mean should be the midpoint of thoughtful fantasy analysis, not the endpoint. (If you'll forgive a pun— and forgiving puns is something you'll likely have to do relatively often around these parts— regression analysis should be more concerned with means than ends.)
“Regression Alert” is a new column this year that seeks to change all that, to demystify regression. It's not a magical force, it's a simple and logical process, one that we can harness and put to good use.
The truth is, fantasy analysis doesn't have a great hit rate. As Niels Bohr once famously wrote, “Prediction is very difficult, especially about the future.” Regression stands as a notable exception. We may not be able to know what's going to happen... but we can get a pretty good idea of what won't.
Hunting for Outliers
When I was tossing around the idea for this column, I was a bit at a loss for what to do for week 1. Small sample sizes are the fertile soil in which regression blooms, and after week 1, small samples are all we have.
Consider this list of the top 12 fantasy running backs after week 1 last year, (standard scoring), as well as where they ranked from weeks 2-17:
- DeAngelo Williams (59)
- C.J. Anderson (57)
- Spencer Ware (20)
- Theo Riddick (40)
- Carlos Hyde (19)
- Demarco Murray (6)
- David Johnson (1)
- Ameer Abdullah (128)
- Danny Woodhead (132)
- Melvin Gordon (9)
- Matt Forte (21)
- Jalen Richard (49)
So... that's really something, right? Obviously, DeAngelo Williams' decline was predictable, and some of the other falls were the result of injuries, (though the backs weren't exactly lighting it up prior to getting hurt). But when samples are this small, regression is exerting a strong pull in a million directions at once.
I had to find one solid example to cut through all that noise and perfectly illustrate what exactly regression is and how exactly it works. And for the first 45 minutes of football this year, I had no idea what that would be.
Then Kareem Hunt took the second play of the fourth quarter 78 yards for a touchdown.
Hunt finished his first career NFL game with 246 yards and three touchdowns. The yardage total was a record for a player making his NFL debut. Anytime anyone sets a record, they're obviously going to regress— if people set records every week, they wouldn't be records.
But what's maybe not obvious is that while Hunt isn't going to get 250 yards and 3 touchdowns every week, he might not get 250 yards and 3 touchdowns *any* week. Like... ever again, for his entire career.
Counting Hunt, there are 20 players in NFL history who had 200 yards and 3 touchdowns in a game as a rookie. Adrian Peterson actually did it twice. Ezekiel Elliott and David Johnson are both still early in their careers with plenty of time to potentially repeat, (though it's worth pointing out that even with his monster year last year, David Johnson never matched the 229 yards from scrimmage he had in a game as a rookie).
But the other 16 players are all far enough along for us to begin to look back on their careers. Alfred Morris had a 200-yard game as a rookie, and to this point, his second-best career game is just 139 yards. Jahvid Best had 232 yards and 3 touchdowns in his second career game, then never hit either threshold again before his career was cut short to injury.
Here's a complete list of the players in question, as well as how many times they topped 200 yards again.
- Kareem Hunt (0)
- Ezekiel Elliott (0)
- David Johnson (0)
- Alfred Morris (0)
- Doug Martin (2)
- Jahvid Best (0)
- Adrian Peterson (7)
- Joseph Addai (1)
- Julius Jones (1)
- Clinton Portis (3)
- Mike Anderson (1)
- Corey Dillon (5)
- Eddie Kennison (0)
- Joey Galloway (0)
- Bo Jackson (0)
- Curt Warner (2)
- Eric Dickerson (6)
- Billy Sims (4)
- Jerry Butler (0)
- Gale Sayers (3)
There are a lot of great players on that list, and a lot of high draft picks whose careers didn't pan out but who at one point looked like the next big thing. But 200-yard games are rare events even for Hall of Famers.
It wouldn't be unheard of for a rookie to post career highs in his very first game. Anquan Boldin played 14 years and gained more than 13,500 yards after his rookie debut without ever again topping 200 yards. Cam Newton has an MVP trophy on his shelf; the first- and second-highest passing totals of his career came in the first and second games of his career.
How does it make sense for a rookie's first game to be his best? Don't players typically get better as they gain NFL experience? Well, yes, they do... but production isn't a perfect reflection of talent.
Instead, a useful mental model of production would be (Production) = (intrinsic factors) + (extrinsic factors). Intrinsic factors are the things inside a player's control: strength, speed, vision, route-running, hands, basically just overall talent level. You could even extend that to his fit within the scheme around him. Extrinsic factors are everything outside a player's control: opposing defenses, how well his teammates played on any given day, even things like weather and lucky bounces of the football.
Nobody's “intrinsic” talent level is 250 yards per game. If someone's was, that player would average 4,000 yards a season. Instead, players struggle to reach even half that.
As a result, we know that for every 250 yard game, extrinsic factors played a major role. Take Hunt's 78-yard touchdown reception. Had a defender been in better position, maybe he gets tackled after 20 yards. If that had happened, he would have ended his night with fewer than 200 yards and wouldn't have set the NFL record.
A player's intrinsic factors aren't fixed. Players can become better with experience or worse with age. But they're extremely stable from game to game and fairly stable from year to year. The same cannot be said for extrinsic factors, which vary wildly.
Sometimes a defense takes a bad pursuit angle three times in a twenty-play span and a player has a monster game. Sometimes the opposite happens and a player is shut down. Extrinsic factors are all over the map. Over a long enough timeline, however, we'd expect all the positive factors and negative factors to even out and for the player's performance to merely reflect their true talent level.
This intrinsic talent level is the player's mean, and this process of extrinsic factors evening out is what we call regression. And our ability to predict this regression gives us a crucial opportunity to turn a profit in fantasy football.
Again, using the “production = intrinsic + extrinsic” model, let's let intrinsic value be X and extrinsic factors be Y. The purpose of “Regression Alert” isn't to try to identify everyone's “true mean”, their X-value. That's a very difficult problem, and Footballguys already has Bob Henry, a man with countless awards for prediction accuracy, handling that.
And the point of the column isn't to highlight players who had big games, either. You really don't need me to tell you that Kareem Hunt isn't going to get 250 yards every week. Moreover, that's the kind of “regression to the mean” analysis that I was decrying at the top. A guy with 250 yards and a guy with 150 yards are both likely to regress... and all else being equal, the guy with 250 yards is still going to be better than the guy with 150 yards afterward.
Instead, Regression Alert is going to focus on some of the more subtle dimensions that regression to the mean can act on. For instance, if a receiver has 400 yards and 6 touchdowns through five games, that might seem like a sustainable level of production— to use our terminology, it might seem like that player is performing right around their X-value with minimal Y-value added.
But did you know that over a long timeline, nearly 100% of NFL receivers will average a yard-to-touchdown ratio of between 100 and 240, and around 85% will average a ratio between 130 and 200? If we assume our mystery player's “true” yard-to-touchdown ratio is 150:1 (instead of the 67:1 ratio he has to this point), this means that player has about 30% more fantasy points than we'd expect him to have. He's a prime sell-high candidate.
As I said, regression can act along many dimensions. Yards per carry, yard to touchdown ratio, targets per route run, percent of team passing yards, and so on and so forth.
Every week, I will pick one of those factors to take a closer look at. And then I will pull out two groups of players, one of which outscored the other to that point in the season... and use the magic of regression to the mean to predict that the second group will outscore the first going forward.
If you love the idea of selling high and buying low with minimal risk, please follow along. And if you have any questions or ideas for statistics that seem like prime regression candidates, feel free to send them my way.