
Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples, (other than choosing which metric to focus on). If the metric I'm focusing on is yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions. On a case-by-case basis, it's easy to find reasons why any given player is going to buck the trend and sustain production. So I constrain myself and remove my ability to rationalize on a case-by-case basis.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of all my predictions from last year and how they fared. Here's a similar list from 2017.
The Scorecard
In Week 2, I opened with a primer on what regression to the mean was, how it worked, and how we would use it to our advantage. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I explained why touchdowns follow yards, (but yards don't follow back), and predicted that the players with the fewest touchdowns per yard gained would outscore the players with the most touchdowns per yard gained going forward.
In Week 5, I talked about how preseason expectations still held as much predictive power as performance through four weeks. No specific prediction was made.
Statistic For Regression
|
Performance Before Prediction
|
Performance Since Prediction
|
Weeks Remaining
|
Yards per Carry
|
Group A had 20% more rushing yards per game
|
Group B has 32% more rushing yards per game
|
1
|
Yard:Touchdown Ratio
|
Group A had 23% more points per game
|
Group B has 65% more points per game
|
2
|
It only took three weeks for yards per carry to completely even out between our two groups. At the time we made our yards per carry prediction, Group A averaged 6.45 yards per carry on 255 total carries, while Group B averaged 4.14 yards per carry over 332 total carries. In the three weeks since, Group B actually leads Group A in yards per carry, 4.56 to 4.53. They've also retained their huge edge in rush attempts and are a virtual lock to give us our first victorious prediction of the season next week.
Our touchdown underperformers started a little slowly in Week 4 but made up for lost time (and then some) in Week 5. Seven out of eleven Group B receivers reached the end zone this weekend (compared to three out of eight for Group A), and Group B receivers are now averaging 0.5 touchdowns per game (compared to 0.35 for Group A). Add in the fact that they're nearly doubling up Group A in receiving yards per game and things are starting to get a bit ugly.
Why Don't Quarterbacks Regress?
You may have noticed that I have a habit of picking on running backs and wide receivers in my predictions, but haven't really shown the same vim in going after quarterbacks. If you've been reading for a while, perhaps you've noticed that my track record at predicting regression for quarterbacks isn't so hot; of the 14 distinct, falsifiable hypotheses I have made over the last two years, eleven have succeeded and three have failed. Breaking it down, that record is 10-1 on predictions about running backs and wide receivers vs. 1-2 on predictions about quarterbacks.
Some of this is bad luck. Regression is about probability and not destiny. It is expected that sometimes we'll get things wrong. The goal is simply to get much more right than wrong, and I feel good about our track record on that front.
But there are unique challenges about predicting regression for the quarterback position, and I wanted to talk a little bit about them. And then after that, I figured I'd ignore everything I just said, throw caution to the wind, and break all the rules of Regression Alert just once.
Problem #1: The Available Player Pool is Smaller
Betting on any one individual player to regress over a small window is never a great idea. The odds are in your favor, but not by a huge amount. If a player has a 55% shot at regressing over the next few weeks that means they have a 45% chance of making you look foolish.
The two solutions to this problem are to extend the window or to extend the sample. Given a long enough timeline, the odds of regression rise to 100%. Unfortunately extending the window goes against one of the fundamental precepts of Regression Alert: accountability. It's always easy to weasel out of a bad prediction by saying we weren't wrong, it just hasn't come true yet. So we rely on the other method, extending the sample.
If you get enough guys who each have a 55% individual chance of regressing and bundle them all together, the group as a whole will have an 80% chance of regressing, 90% chance... really, the sky is the limit. The problem is that the available pool of quarterbacks is just smaller than the available pool of running backs, wide receivers, or best of all, running backs and wide receivers combined. This means that there are fewer outliers for us to bundle together and predict regression for at any given time, and our odds suffer as a result.
Problem #2: There's Less Variation Between Players
When explaining yard-to-touchdown ratios, I like to say that differences between players can be genuinely meaningful, that Dez Bryant truly can score touchdowns at a higher rate than Andre Johnson without it just being luck, but that these meaningful differences exist within a tightly-defined band. One player can score a touchdown for every 100 receiving yards, and another player can score a touchdown for every 200 receiving yards, but anything outside of those limits is unsustainable.
To illustrate this, I plotted career yard-to-touchdown ratio against career scrimmage yards for the top 200 quarterbacks, 300 running backs, 300 wide receivers, and 100 tight ends in NFL history. There's nothing special about the cutoffs that I chose; they roughly equate to Brian Hoyer or Tyrod Taylor, Isaiah Crowell or Latavius Murray, Mohamad Sanu or Marvin Jones, and Jordan Reed or Charles Clay at tight end.
(A note: Jerry Rice is included in the calculations but not pictured. He's so far off to the right of the chart that he makes the rest of the data harder to read when he's included.)
Hopefully, the first thing that jumps out to you is just how much more tightly packed quarterbacks are than players at other positions. Running backs, wide receivers, and tight ends typically converge somewhere between 100 and 200 yards per touchdown, but that entire range remains in use all the way from the very left of the chart to the very right. Quarterbacks, on the other hand, converge into a much narrower band, almost to a single point by the time you get to the right side of the chart.
It's not uncommon for a running back or wide receiver to post a yard-to-touchdown ratio over 300 or under 90 over small samples. Meanwhile, no quarterback on that chart is over 243 yards, and the only guy under 100 yards per touchdown (Frankie Albert) had to cheat to get there; his yard-to-touchdown ratio in the NFL was 142, but he had a yard-to-touchdown ratio of 79 over four seasons in the rival AAFC.
Because the thing we'd predict to regress is more tightly packed, it doesn't tend to move as much even when it does regress, which doesn't leave us set up for those dramatic reversals we all love so much.
One other issue. I've included the trend lines on each of those charts. These lines indicate what happens to typical yard-to-touchdown ratios as total career yards goes up. Notice that the wide receiver and tight end trend lines are relatively flat; there's very little relationship between how good a receiver is and how many yards he averages per touchdown. Some all-time greats rank quite low (Dez Bryant, Randy Moss, Terrell Owens). Some all-time greats rank quite high (Andre Johnson, Henry Ellard, Art Monk). Some all-time greats rank somewhere in between (Larry Fitzgerald, Isaac Bruce, Tim Brown).
The running back trend line slopes downward, which indicates that the better running backs tend to have lower ratios and the worse running backs tend to have higher ratios. This relationship is almost entirely explained by the prevalence of platoon systems where one back serves as the primary option between the 20s and the other serves as a goal-line and short-yardage back. If you raise the threshold of running backs you're looking at (say the top 200 all-time instead of the top 300), the trend line becomes completely flat.
The quarterback trend line, on the other hand, shows a noticeable and persistent downward slope. No matter how you slice the data, you find that the best quarterbacks tend to average fewer yards per touchdown than the worst quarterbacks. This compounds the difficulty factor because I want to put good quarterbacks into our Group A to make the prediction more noteworthy, but good quarterbacks are regressing to a completely different mean than their less-talented peers.
Problem #3: Their Statistics Are Less Volatile
In Week 3, when I talked about why I was opening the season with a prediction about yards per carry, I delved into the various bits of research on why yards per carry was such an unstable statistic (and therefore such a prime target for our purposes). My favorite piece came from Danny Tuccitto, who found that a running back would need 1978 carries on the same team in the same offense before yards per carry represented more skill than luck. You're talking about nearly a decade before we can feel confident that a player's yards per carry average is measuring something about that player himself.
Using the same methodology, Danny found that a quarterback only needs 396 pass attempts before his yard per attempt average represents more skill than luck. That's a bit more than half a season for most quarterbacks. We simply can't bet against a player's yards per attempt average like we can a player's yards per carry.
There is one quarterback statistic that is a phenomenal regression target, and it provided the one (out of three) quarterback predictions that Regression Alert has nailed to this point. That statistic is interception rate, which is extremely noisy from sample to sample, shows wide variations, and is an all-around prime regression candidate. I'll probably highlight it in greater detail sometime later this year. The problem is that from a fantasy football perspective interception rate doesn't move the needle a whole lot; many leagues don't give any penalties at all for interceptions, and in the leagues where interceptions are penalized, the penalty is typically fairly light, -1 or -2 points at worst.
And now, Patrick Mahomes II
So I've given you all the reasons why betting on quarterback regression is hard and will naturally have a lower hit rate, and I've talked about how the greatest asset to a regression prediction is large samples. Now I'm going to throw that all to the wind.
Patrick Mahomes II is amazing. He regularly makes throws no one else would even try. He's a joy to watch. He is currently the #2 fantasy quarterback and his 11 passing touchdowns are also tied for second-most in the NFL. He's also probably going to regress. But probably not in the direction that you'd think.
Remember that chart above showing yard-to-touchdown ratios for quarterbacks? Remember how the best quarterbacks also tended to have the lowest ratios? Mahomes currently averages 166 passing yards per touchdown, which I guess is a fine ratio if your name is Andy Dalton, but this is Patrick Mahomes II we're talking about here. For his career, he averages 118 yards per touchdown, which is unsustainably low, but for a player this good I'd think 130-140 yards per touchdown would be a pretty reasonable range.
If Mahomes was averaging 140 yards per touchdown he'd have 13 touchdowns so far this season, two more than the 11 he actually has. If he was averaging 130 yards per touchdown, he'd have thrown 14 scores to this point. Because I'd expect him to have 13-14 touchdowns and instead he only has 11, I think he's probably underperforming in the touchdown department. And like anyone who is performing outside of his "true" level, I'd expect regression to the mean to step in and nudge him in the right direction.
This means I'm predicting Patrick Mahomes II will score more touchdowns. He's averaging 2.2 touchdowns per game, which would work out to 8.8 touchdowns over the next four weeks. Instead, I'm going to predict that he averages at least 2.5 touchdowns per game, which would work out to 10 total touchdowns over the next four weeks if he stays healthy. (I'm going to count both passing and rushing touchdowns; I'm already out on a thin enough limb, I might as well tilt the odds a bit in my favor.)
Predicting with a sample size of one isn't exactly the safest thing to do if my goal is to pad my overall winning percentage, but our first two predictions have been going strong so why not take a chance and have some fun? I feel like most of the time talking about regression to the mean devolves into being a killjoy who says players who are doing cool things will soon stop doing cool things, or else speaking in broad generalities about huge classes of players. How many chances are we going to get to say that regression to the mean tells us one of the best players in the NFL is probably going to start doing even better?