Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I break down a topic related to regression to the mean. Some weeks, I'll explain what it is, how it works, why you hear so much about it, and how you can harness its power for yourself. In other weeks, I'll give practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Ja'Marr Chase is one of the top performers in my sample, then Ja'Marr Chase goes into Group A, and may the fantasy gods show mercy on my predictions.
And then, because predictions are meaningless without accountability, I track and report my results. Here's last year's season-ending recap, which covered the outcome of every prediction made in our eight-year history, giving our top-line record (46-15, a 75% hit rate) and lessons learned along the way.
Our Year to Date
Sometimes, I use this column to explain the concept of regression to the mean. In Week 2, I discussed what it is and what this column's primary goals would be. In Week 3, I explained how we could use regression to predict changes in future performance-- who would improve, who would decline-- without knowing anything about the players themselves.
Sometimes, I use this column to point out general examples of regression without making specific, testable predictions. In Week 5, I looked at more than a decade worth of evidence showing how strongly early-season performances regressed toward preseason expectations.
Other times, I use this column to make specific predictions. In Week 4, I explained that touchdowns tend to follow yards and predicted that the players with the highest yard-to-touchdown ratios would begin outscoring the players with the lowest. In Week 6, I showed the evidence that yards per carry was predictively useless and predicted the lowest ypc backs would outrush the highest ypc backs going forward.
The Scorecard
Statistic Being Tracked | Performance Before Prediction | Performance Since Prediction | Weeks Remaining |
---|---|---|---|
Yard-to-TD Ratio | Group A averaged 25% more PPG | Group B averaged 12% more PPG | None (Win!) |
Yards per Carry | Group A averaged 39% more rushing yards per game | Group A averages 15% more rushing yards per game | 2 |
A large aim of this column is to convince you that regression will happen whether you believe in it or not. Well, apparently, it will happen whether I believe in it or not, too. Group A came out so strong in the first week of our Yard-to-Touchdown prediction that I thought there was no way Group B could make up the necessary ground. But they did.
The prediction has two component parts: that yardage will remain relatively stable across samples, and that NFL receivers will tend to convert that yardage into touchdowns at a rate somewhere between 120 and 200. Both Group A and Group B had yardage totals in the last month that were within 5% of their per-game averages at the time of the prediction, confirming the first part. And Group A saw its yard-to-touchdown ratio rise from 70 to 141, while Group B's fell from 433 to 189, confirming the second part.
The net result of that touchdown regression was Group A's production fell from 11.5 to 9.6 points per game, while Group B's rose from 9.2 to 10.2.
Meanwhile, our second prediction also has two primary components: attempts will remain relatively stable across samples, but yards per carry is essentially just a random number generator and any advantages there will disappear. The latter component has come through so far; Group A saw its ypc average fall from 5.37 to 4.42, while Group B's rose from 3.39 to 4.39. Overall, Group A went from gaining 58% more yards with each carry to gaining just 1% more.
The problem so far is that attempts per game hasn't been stable. Group A has seen its workload rise by about 14%, and Group B has seen its fall by a similar amount, which is enough to keep Group A ahead for now. We have two more weeks for things to reverse, though, and I've learned better than to doubt regression.
(Most) Quarterback Stats Don't Regress As Much
When considering regression, I find it useful to think of production as a result of a combination of both skill (or factors innate to the player) and luck (or factors outside the player's direct control). Statistics that are more luck than skill tend to regress quicker and more significantly than statistics that are more skill than luck.
I've linked to research by Danny Tuccitto, who found that running backs needed 1,978 carries before their ypc average reached a level that reflected more skill than luck. Using the same methodology, Danny has also found that a quarterback's yards per attempt average (or YPA) stabilized in just 396 attempts. For a running back, this represents eight years of 250-carry seasons. For a quarterback, that's less than a season's worth of attempts (only one team -- the 2022 Chicago Bears -- has finished with fewer than 400 pass attempts since 2010).
As a result, you're not going to see me predicting regression very often for quarterback stats like yards per attempt. (In fact, the last time I did so was in 2020 when I predicted-- successfully -- that yards per attempt wouldn't regress.)
But there is one quarterback statistic that I love to badmouth, one that is terrible, horrible, no good, very bad. That statistic is interception rate.
Why Does Interception Rate Regress So Much?
The statistics that regress more strongly are those that are the result of a relatively higher level of luck than skill. Interception rate does have a skill component. Leaguewide, quarterbacks threw an interception on 2.2% of their attempts last year. For his career, Aaron Rodgers only throws an interception on 1.4% of his attempts, the best rate in history. Jameis Winston, on the other hand, throws one every 3.5% of attempts. Over 600 pass attempts, that's the difference between 8 interceptions (Rodgers), 13 interceptions (league average), or 21 interceptions (Winston).
We know that's a real difference because the samples involved are so big. Rodgers has thrown over 8,000 career pass attempts. Winston has attempted over 3,000 passes. The league as a whole attempted nearly 18,000 passes last year. These are all significantly greater than the 1,681 attempts that Tuccitto calculated were required for interception rate to stabilize.
But that 1,681 attempt threshold is a lot closer to a running back's 1,978 carry requirement than the 396 attempts necessary for yards per attempt. Why is this?
First: Interceptions Are Heavily Influenced By the Situation
Remember how the league average interception rate last year was 2.2%? On plays where a team was trailing by two scores or more (9 or more points), that rose to 2.6%. When playing with a 2-score lead, that fell to 2.0%. These might not seem like huge differences, but over a 600-attempt season, that's an extra 3 or 4 interceptions.
To some extent, this is selection bias. Consider this 2016 game between the Patriots and the Jets. Quarterbacks threw 0 INTs on 23 attempts with the lead vs. 3 INTs on 24 attempts while trailing. But the Patriots won that game 41-3; all attempts with a lead came from Tom Brady, while all attempts while trailing came from either Bryce Petty or Ryan Fitzpatrick. And it's no surprise that Brady threw fewer interceptions.
To the extent that good quarterbacks spend more time with the lead and good quarterbacks throw fewer interceptions, we should expect quarterbacks with the lead to throw fewer interceptions. But even when you control for the quarterback, the effect persists.
Looking just at Tom Brady, he threw 5771 attempts with the lead in his career and was intercepted on just 1.7% of them. He threw 4373 attempts while trailing and was intercepted on 2.1% of them. Every other quarterback will show a similar pattern.
And this is good. When a team trails, especially when it trails big or trails late, it needs to take bigger risks to get back into the game. Taking bigger risks will lead to more interceptions, but it will also maximize your chances of a comeback. On the other hand, when a team is ahead, it wants to take fewer risks to make a comeback as difficult as possible for the other team. Some of the best quarterbacks see some of the biggest differences in their interception rate while leading vs. trailing simply because the best quarterbacks tend to be really good at calibrating their risk/reward decision-making to the needs of the moment.
But remember, "luck" is "factors outside of the player's control", and quarterbacks don't have a ton of control over whether their team is leading or trailing at any given point. It depends a lot on how good the opponent is, how well the defense is playing, if special teams are holding up their end of the bargain, etc. So we should expect the situations a quarterback plays in to vary significantly from one sample to the next, and that should impact their expected interception rate.