Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017. Over four seasons, I have made 30 specific predictions and 24 of them have proven correct, a hit rate of 80%.
In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
|Statistic for regression||Performance before prediction||Performance since prediction||Weeks remaining|
|Yards per Carry||Group A had 10% more rushing yards per game||Group B has 11% more rushing yards per game||3|
When I made last week's prediction, Group B was averaging 27% more carries per game. Last week Group A closed that gap, which is a little bit concerning for our long-term chances. (On the other hand, one week samples are often flukes.)
What's not concerning for our chances is that yards per carry really is pseudoscience. At the time of the prediction, our "high yards per carry" sample averaged 5.40 yards per carry while our "low yards per carry" sample averaged 3.87. Last week, our "high yards per carry" RBs averaged 3.82 ypc, or less than our "low ypc" group had in the first place! Meanwhile, those "low ypc" backs averaged 4.97, not quite as much as our "high ypc" backs initially had but not far behind, either.
Was this driven by a single long run or bad performance? It was not. Five out of the ten "high yards per carry" backs averaged 3.4 ypc or less in Week 3. Only three of them topped 5.0 ypc, and one of those did it on just two carries. Meanwhile, only one "low ypc" back was below 3.4 yards per carry, and half of them were over 5.0.
You could talk about how unreliable statistics are over small samples, but the Group A backs totaled 141 carries last week, which is half a season's worth of work for even a good running back. If yards per carry can swing this much over a 140-carry sample, how can we ever be confident that anyone's yards per carry average is a meaningful reflection of their innate talent and not just random noise? (The answer is that we cannot, which is why I keep making predictions that it will regress.)
PLAYING THE HITS
If you go see Lynyrd Skynyrd live, you know they're playing Sweet Home Alabama and Freebird. The Stones are going to play (I Can't Get No) Satisfaction. KISS is going to play Rock and Roll All Nite and Detroit Rock City, and of course, Ozzy is eventually going to get around to Crazy Train.
Similarly, Regression Alert loves delving into the back catalog for obscure stats and deep cuts from time to time, but we know where our bread is buttered and we aren't shy about serving up the hits, either. Last week we played our old classic "Yards Per Carry is Pseudoscience". This week we have our seminal work "Touchdowns Follow Yards (But Yards Don't Follow Back)". Next week we're going to really drive the crowd nuts with our smash "Revisiting Preseason Expectations". But that's getting ahead of ourselves.
First, let's talk about touchdowns. Actually, before we talk about touchdowns, let's talk about vocabulary.
randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.
Touchdowns are stochastic. Over his career, Cam Newton rushed for 70 touchdowns in 140 games, an average of 0.5 touchdowns per game. We could say that's his "true production level", and over a sufficiently long timeline, we'd probably expect him to conform to that, averaging 0.5 touchdowns per game.
Despite that being his true production level, though, guess how many times Cam Newton rushed for half a touchdown in a game? As far as I can tell (and I have researched this topic extensively), it has never happened. Instead, he either scores zero touchdowns... or he scores one touchdown. (Sometimes he scores two touchdowns, and once he even rushed for three touchdowns.) Because they are binary outcomes, we can analyze Cam Newton's rushing touchdowns statistically, but we cannot predict them precisely.
Yards don't really behave like that. Over his career, Cam Newton averaged 38.6 rushing yards per game. But it's not like every week he's either getting you 0 yards or else he's getting you 75 yards. Instead, more games than not he's getting you somewhere between 20 and 60 yards. His yardage total is much more consistent from game to game than his touchdown total.
One way to measure consistency is something called standard deviation, which measures how much something varies around the average. The standard deviation of Newton's rushing yardage is 24.5 yards. The standard deviation of Newton's rushing touchdowns is 0.65 touchdowns.
Now, these numbers are not directly comparable. Standard deviations for large values are naturally bigger than standard deviations for small values. (Consider: if you switched to "feet rushing per game" rather than "yards rushing per game", the standard deviation would triple despite the underlying game-to-game variation remaining unchanged.)
But if you divide a player's standard deviation by that player's average, you get something called the coefficient of variation, or CV. CV is a way to compare how volatile different statistics are. The CV of Newton's yards is 64%, meaning it tends to vary by about 64% of his overall average. The CV of Newton's touchdowns is 130%. Touchdowns are much more random from week to week than yards are— in Newton's case, about twice as random according to CV. (For those curious, the CV of Newton's rush attempts was 42%; "usage" stats like attempts tend to be more stable from week to week even than yards.)
Not only are they more unstable, but touchdowns are also much more valuable than yards. In most scoring systems, one extra touchdown is worth the equivalent of 60 extra yards. Which means if Newton caught the high side of variance and scored a few extra touchdowns early in the year, it could dramatically inflate his fantasy production to date. And if he caught the low side of variance and failed to reach the end zone, it could leave him far lower than we'd otherwise expect.
Which gives rise to my favorite statistic for regression: yard-to-touchdown ratios. Some players are really, really good at getting yards and/or not quite as good at scoring touchdowns. For years, Julio Jones has been the most famous example of this; he has gained 218 receiving yards in his career for every touchdown he has scored. This is a very high average, but there are other wide receivers in this general range; Andre Johnson averaged 203 yards for every touchdown, Henry Ellard averaged 212, etc.
Other players are really, really good at getting touchdowns but typically aren't commensurately good at getting yards. For his career, Davante Adams scores a touchdown for every 104 yards he gains receiving. Again, this is a very low average, but not historically implausible; Dez Bryant averaged 102 yards for every touchdown, while Randy Moss was all the way down at 98 yards per touchdown.
Importantly: the yard-to-touchdown ratio is not a measure of player quality. Davante Adams has twice scored 10 or more touchdowns with 1,000 or fewer yards. All else being equal, a guy who gains 1500 yards and 10 touchdowns is better than a guy who gains 1000 yards and 10 touchdowns, even if the latter guy has a "better" yard-to-touchdown ratio. If you asked who was the best receiver in the NFL at various points over the last five years, you might plausibly have heard Jones (216 yards per touchdown), Michael Thomas (186 yards per touchdown), DeAndre Hopkins (161 yards per touchdown), Antonio Brown (148 yards per touchdown), Odell Beckham Jr (135 yards per touchdown), Tyreek Hill (118 yards per touchdown), or Adams (104 yards per touchdown). (Similarly, I could easily find mediocre or even bad receivers who span the whole yard-to-touchdown spectrum; Devin Funchess averages 108 yards per touchdown, but he's no Davante Adams.)
With that in mind, over the long term, receivers tend to average between 100 and 200 yards per touchdown, with the majority of the league clustered between 120 and 180. Any rate that falls in that range is plausibly sustainable and perhaps a true representation of a player's relative skill at scoring touchdowns. But because touchdowns are stochastic, in the short run we see yard-to-touchdown ratios that are wildly outside of that "sustainable" zone. And because touchdowns count for so many points in fantasy football, this gives us a ton of targets for regression.
So let's pit the receivers with a lot of yards but very few touchdowns against the receivers with a lot of touchdowns but very few yards and see what happens. There are eleven receivers in the NFL right now who have 200 or fewer yards and 2 or more touchdowns (guaranteeing a yard-to-touchdown ratio of 100 or lower). Similarly, there are seventeen receivers in the NFL right now who have 201 or more yards and 1 or fewer touchdowns (resulting in a yard-to-touchdown ratio of 200 or higher). Here's the full list:
|D.J. Chark Jr||154||2||77||27.4|
|Marvin Jones Jr||194||2||97||31.4|
|Henry Ruggs III||237||1||237||30.6|
|Michael Pittman Jr||220||0||undefined||22.5|
Zach Pascal, Quintez Cephus, Adam Thielen, DeAndre Hopkins, Tee Higgins, Corey Davis, D.J. Chark Jr, Tim Patrick, Amari Cooper, Marvin Jones Jr, and Emmanuel Sanders have all scored more than one touchdown per 100 receiving yards. Collectively, they average one touchdown for every 65.5 yards and 10.2 (non-PPR) fantasy points per game. This is our Group A.
On the other hand, Chase Claypool, Bryan Edwards, Julio Jones, Michael Pittman Jr, Courtland Sutton, Sammy Watkins, Hunter Renfrow, Sterling Shepard, Terry McLaurin, Henry Ruggs III, CeeDee Lamb, Keenan Allen, Tyreek Hill, D.J. Moore, Davante Adams, Brandin Cooks, and Deebo Samuel have all scored fewer than 1 touchdown per 200 receiving yards. Collectively, they average 380 receiving yards per touchdown and 9.7 fantasy points per game.
Now ordinarily I'd name this our Group B and call it a day. One of the key conceits of Regression Alert is that I don't pick the sample of players. But I can't help but notice Davante Adams and Tyreek Hill in that second sample. As I mentioned above, Adams and Hill are two of the best touchdown scorers in the NFL today, both averaging fewer than 120 yards per touchdown for their careers. Further, they're probably the top two fantasy receivers in the game. Despite both missing time, they've each outscored every other receiver since the start of 2019 by at least 25 points. They were on average the first two receivers off the board in fantasy drafts this offseason.
And while the first rule of Regression Alert is "I don't pick my sample", the second rule is "Make it as impressive as possible". Predicting that Davante Adams and Tyreek Hill are going to be productive going forward is hardly an impressive call, so let's up the degree of difficulty and remove both players from the second sample.
The remaining 15 players have averaged 401 receiving yards per touchdown and 9.4 fantasy points per game. This is going to be our Group B.
Through three weeks, Group A has outscored our handicapped Group B by 9%, but they've done so on the back of unsustainable touchdown production. Going forward, Group B should retain its total volume edge while Group A loses its touchdown advantage, and I predict that through the magic of regression Group B will score more fantasy points per game than Group A over the next four weeks. Be sure to check back in after Week 7 to find out.