Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples, (other than choosing which metric to focus on). If the metric I'm focusing on is yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of all my predictions from last year and how they fared.
THE SCORECARD
In Week 2, I laid out our guiding principles for Regression Alert. No specific prediction was made.
In Week 3, I discussed why yards per carry is the least useful statistic and predicted that the rushers with the lowest yard-per-carry average to that point would outrush the rushers with the highest yard-per-carry average going forward.
In Week 4, I explained why touchdowns follow yards, (but yards don't follow back), and predicted that the players with the fewest touchdowns per yard gained would outscore the players with the most touchdowns per yard gained going forward.
In Week 5, I talked about how preseason expectations still held as much predictive power as performance through four weeks. No specific prediction was made.
In Week 6, I looked at how much yards per target is influenced by a receiver's role, how some receivers' per-target averages deviated from what we'd expect according to their role, and predicted that the receivers with the fewest yards per target would gain more receiving yards than the receivers with the most yards per target going forward.
In Week 7, I demonstrated how randomness could reign over smaller samples, but regression dominates over larger ones. No specific prediction was made.
In Week 8, I discussed how even something like average career length could be largely determined by regression-prone fluctuations in incoming talent. No specific prediction was made.
In Week 9, I looked at running backs scoring touchdowns at an unsustainable rate and posited that even Todd Gurley must return to earth.
In Week 10, I delved into the purpose of regression alert and the proper takeaways. No specific prediction was made.
In Week 11, I explained an easy way to find statistics that were more prone to regression and picked on yards per carry one more time.
In Week 12, I went into the difference between regression to the mean, (the idea that production will probably improve or decline going forward), and the gambler's fallacy, (the idea that production is "due" to improve or decline going forward). No specific prediction was made.
In Week 13, I badmouthed interception rate for a bit and then predicted that the most interception-prone quarterbacks to that point would throw fewer picks than the least interception-prone quarterbacks going forward.
In Week 14, I delved into the various biases that permeate this column and how regression to the mean works even in less spectacular ways than the ones I choose to highlight here. No specific prediction was made.
In Week 15, I explained why regression was especially cruel in the fantasy playoffs. No specific prediction was made.
In Week 16, I lamented the rash of injuries that wrecked any hopes of putting together a large enough sample size. No specific prediction was made.
Statistic For Regression
|
Performance Before Prediction
|
Performance Since Prediction
|
Weeks Remaining
|
Yards per Carry
|
Group A had 24% more rushing yards per game
|
Group B has 4% more rushing yards per game
|
SUCCESS!
|
Yards:Touchdown Ratio
|
Group A had 28% more fantasy points per game
|
Group B has 23% more fantasy points per game
|
SUCCESS!
|
Yards per Target
|
Group A had 16% more receiving yards per game
|
Group A has 13% more receiving yards per game
|
Failure
|
Yards:Touchdown Ratio
|
Group A had 26% more fantasy points per game
|
Group B has 4% more fantasy points per game
|
SUCCESS!
|
Yards per Carry
|
Group A had 9% more rushing yards per game
|
Group B has 23% more rushing yards per game
|
SUCCESS!
|
Total Interceptions
|
Group A had 83% as many total interceptions
|
Group B has 43% as many total interceptions
|
SUCCESS!
|
Let's settle this once and for all: interception rate is not a thing. I mean, sure, for someone like Aaron Rodgers with 5,000 career pass attempts, we can say he probably has some skill at avoiding interceptions. But looking at even an 11-game sample in one season? It's just noise.
Let's drive that point home here. Through 12 weeks, (11 games counting byes), the quarterbacks in Group A averaged 0.52 interceptions per game and the quarterbacks in Group B averaged 1.08 interceptions per game-- more than double the rate. In the last four weeks, Group A quarterbacks averaged 0.74 interceptions per game and Group B quarterbacks averaged 0.49 interceptions per game. The "high-interception" group averaged fewer interceptions over the last four weeks than the low-interception group averaged over the first eleven weeks!
Only one quarterback from Group A didn't throw an interception in the past four weeks, and it was Derek Carr, the most interception-prone Group A quarterback over the first eleven weeks. Interceptions in any given span are driven almost entirely by noise.
The Tampa quarterbacks might be an even more dramatic illustration. Through twelve weeks, Ryan Fitzpatrick and Jameis Winston combined for 23 interceptions on 448 attempts, an interception rate of 5.1%. And over the past four weeks? Two interceptions on 141 attempts, an interception rate of 1.4%. A lot of people were writing off Jameis Winston as too interception-prone to be an NFL starter, but his interception rate over the first three years was just a hair worse than league average, so there was every reason to believe the horrid start to 2018 was just a fluke, because interception rate isn't really a thing.
One Final Scorecard
If you've been reading all season, you might be sick of hearing me say that regression operates best over longer timescales. I make predictions for four weeks because it makes me accountable, but I like to look back after the year is over just to see how they did over the full season. So here's the scorecard again, except instead of just the four weeks after the prediction, here's the entire season after the prediction.
Statistic For Regression
|
Performance Before Prediction
|
Performance Since Prediction
|
Weeks Remaining
|
Yards per Carry
|
Group A had 24% more rushing yards per game
|
Group A has 12% more rushing yards per game
|
Failure
|
Yards:Touchdown Ratio
|
Group A had 28% more fantasy points per game
|
Group B has 15% more fantasy points per game
|
SUCCESS!
|
Yards per Target
|
Group A had 16% more receiving yards per game
|
Group A has 7% more receiving yards per game
|
Failure
|
Yards:Touchdown Ratio
|
Group A had 26% more fantasy points per game
|
Group B has 7% more fantasy points per game
|
SUCCESS!
|
Yards per Carry
|
Group A had 9% more rushing yards per game
|
Group B has 15% more rushing yards per game
|
SUCCESS!
|
Total Interceptions
|
Group A had 83% as many total interceptions
|
Group B has 43% as many total interceptions
|
SUCCESS!
|
As you can see, the outcome is largely the same over a full season as it was over four games, with the one big exception being our initial yard per carry prediction. This prediction actually shows the downside of longer timelines; T.J. Yeldon, Kenyan Drake, Jamaal Williams, and Carlos Hyde all saw their roles change dramatically from the first two weeks. Regression to the mean had no way of knowing that Williams was only starting in the first two weeks because Aaron Jones was suspended, or that T.J. Yeldon was only keeping the seat warm for Leonard Fournette. It didn't know Kenyan Drake would lose his starting job, (and you can't even pin that on his ypc, as he posted a higher average than Frank Gore, the back who supplanted him). It certainly didn't know Carlos Hyde would be traded at midseason.
By measuring production per game, our outcomes are only mildly impacted by injuries. But they become especially susceptible to demotions; because all four of those backs stayed healthy, they added to the "games played" total for their respective groups. Because three of those four were in Group B, that group's per-game average was especially hurt. Remove those four backs from the comparison and Group B outrushed Group A again.
Also, measuring per-game helps insulate against injuries and suspensions to some extent, but they still play a role. Three of Group B's top five rushers missed at least three games, (James Conner, Marshawn Lynch, and Kareem Hunt). Group A largely avoided injuries, but it's worth noting that the players that missed time rushed for slightly fewer yards per game than the players that stayed healthy.
Does all of this "excuse" the loss for the prediction? That's up to the reader, but I personally still count it as a loss. Trying to find justifications to explain away every loss is a slippery slope, and I never invest nearly as much effort into finding justifications to explain away the wins. I'd love a 100% success rate, but like I keep saying about larger samples, the goal is to maximize the number of predictions and hit far more of them than you miss; do that and you'll still turn a consistent (and easy!) profit in the long run.
Now for some interesting notes about the rest of the predictions.
- When I made my initial yard:TD ratio prediction in week 4, Group A receivers scored one touchdown for every 68 yards and Group B receivers scored one touchdown for every 776 yards. Since then, Group A receivers scored a touchdown for every 137 yards and Group B receivers scored a touchdown for every 148 yards! Touchdowns follow yards.
- (Bonus fact: Julio Jones had 812 yards in his first eight weeks without reaching the end zone a single time. Since week 9, he has tied Antonio Brown and Davante Adams for the league lead in touchdown receptions.)
- My Yard per Target prediction was the only one that failed in the four-week window. Over the course of the full season, Group B closed the gap even more but still failed to draw ahead. I know what I said above about only searching for justifications for predictions that failed, but this failure could actually be explained entirely by Lamar Jackson. Since the prediction, Michael Crabtree and John Brown averaged 53.4 yards per game with Joe Flacco and 18.5(!!!) yards per game with Lamar Jackson. Using just their stats with Flacco, Group B would have had 1% more receiving yards per game than Group A.
- My second yard-to-touchdown ratio prediction specifically singled out Todd Gurley and made a note of the fact that he actually dragged Group A's average down during the prediction's 4-week run. Well, Gurley immediately returned to form once that prediction closed and actually finished the year as the most valuable member of Group A, as expected. But Group B managed to increase its lead even more because Christian McCaffrey, Ezekiel Elliott, and Saquon Barkley were so dominant down the stretch; they wound up being the top three fantasy backs since the time of the prediction.
- Yards per carry at the time of my second YPC prediction: 5.65 for Group A, 4.06 for Group B. Yards per carry since by second YPC prediction: 4.53 for Group A, 4.64 for Group B. Yards per carry isn't a thing!
- Since the interception prediction just closed this week, the results over the full season are naturally identical to the results over the last four weeks. See above for full thoughts.
To everyone who stuck with me through the season, I appreciate all of your time and feedback. I hope Regression Alert proved useful and made you reconsider how you looked at fantasy production going forward, even if just a little bit.