Take your shot at Playoff Glory!
Grab your teams in the Footballguys Playoff Challenge Round 2: Divisional Round. $35 Entry Fee with deep payouts and a $25,000 Grand Prize.
Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017. Over four seasons, I have made 30 specific predictions and 24 of them have proven correct, a hit rate of 80%.
In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I talked about yard-to-touchdown ratios and why they were the most powerful regression target in football that absolutely no one talks about, then predicted that touchdowns were going to follow yards going forward (but the yards wouldn't follow back).
In Week 5, we looked at ten years worth of data to see whether early-season results better predicted rest-of-year performance than preseason ADP and we found that, while the exact details fluctuated from year to year, overall they did not. No specific prediction was made.
In Week 6, I taught a quick trick to tell how well a new statistic actually measures what you think it measures. No specific prediction was made.
In Week 7, I went over the process of finding a good statistic for regression and used team rushing vs. passing touchdowns as an example.
In Week 8, I talked about how interceptions were an unstable statistic for quarterbacks, but also for defenses.
In Week 9, we took a look at Ja'Marr Chase's season so far. He was outperforming his opportunities, which is not sustainable in the long term, but I offered a reminder that everyone regresses to a different mean, and the "true performance level" that Chase will trend towards over a long timeline is likely a lot higher than for most other receivers. No specific prediction was made.
In Week 10, I talked about how schedule luck in fantasy football was entirely driven by chance and, as such, should be completely random from one sample to the next. Then I checked Footballguys' staff leagues and predicted that the teams with the worst schedule luck would outperform the teams with the best schedule luck once that random element was removed from their favor.
In Week 11, I walked through how to tell the difference between regression to the mean and gambler's fallacy (which is essentially a belief in regression past the mean). No specific prediction was made.
|Statistic for regression||Performance before prediction||Performance since prediction||Weeks remaining|
|Yards per Carry||Group A had 10% more rushing yards per game||Group B has 4% more rushing yards per game||None (Win!)|
|Yards per Touchdown||Group A scored 9% more fantasy points per game||Group B scored 13% more fantasy points per game||None (Win!)|
|Passing vs. Rushing TDs||Group A scored 42% more RUSHING TDs||Group A is scoring 33% more PASSING TDs||None (Win!)|
|Defensive Interceptions||Group A had 33% more interceptions||Group B had 24% more interceptions||None (Win!)|
|Schedule Luck||Group A had a 3.7% better win%||Group B has a 38.1% better win%||2|
Group A had a monstrously good week intercepting the football, but it was too little too late. The Patriots and Texans both had a whopping four interceptions, the Colts had three, and the Buccaneers and Cowboys combined to chip in three more for good measure. Altogether, Group A managed 1.56 interceptions per game, well above even the lofty 1.29 interception per game average they had at the start of the prediction.
But one week does not a prediction make; Group B outperformed its prior per-game interception rate in all four weeks, and Group A underperformed its prior interception rate in three out of the four. Even with the interception explosion, Group A's per-game interception rate was 15% lower than its previous average, while Group B's was 50% higher. As a result, Group B notched a comfortable victory.
(As an aside: I had been tempted to start with a much stronger initial interception prediction that would have given Group A a 93% advantage, by far the biggest I'd ever granted in Regression Alert history. I backed out because I feared that 93% was too large a gap even for something as powerful as regression to the mean to overcome. Because of the Week 11 explosion, Group A would have finished that prediction with a 5% edge still. It would have been a dramatic reversal, but still would have granted us our first loss of the season. Sometimes discretion is the better part of valor.)
As for the schedule luck prediction, our "good but not lucky" squads are 10-4 over the last two weeks, while our "lucky but not good" teams are 4-8. Interestingly, our "unlucky" teams have benefited a little from schedule luck while our "lucky" teams have been hurt by it, which isn't a very surprising outcome because schedule luck is completely random.
(Our unlucky teams becoming lucky and our lucky teams becoming unlucky wasn't the likely outcome, either, though. Each group had a 50/50 shot at positive or negative luck, so there was about a 1-in-4 chance that we'd get a total luck swap, but also an equal 1-in-4 chance that the lucky teams would stay lucky and the unlucky teams would stay unlucky.)
The Arrow of Time and Regression to the Mean
Most of the time when we talk about regression to the mean in this space, we're talking about it from the beginning of the process. We take a starting state with large gaps between players or teams and predict an ending state where those gaps are much smaller.
But we can just as easily perform the process in reverse. We can take an ending state with small gaps and from it predict a prior starting state where those gaps were much bigger. (Technically, "predictions" about the past are referred to as retrodictions or postdictions.)
Retrodictions aren't as directly actionable. Unless you have a time machine, you can't exactly take advantage of this new knowledge. But they're a great way to test your understanding of a subject without having to wait for new evidence to come in. And having a good understanding of a topic ensures you'll be able to make better decisions going forward.
Continue reading this article with a Season Long Pro subscription.
"Footballguys is the best premium
fantasy football only site on the planet."
Matthew Berry, ESPN