Does Corsi change with age? It's actually a good question. When you try to answer this you run into a lot of problems. First, a player's Corsi varies a lot from season to season. Second, there is a lot of "censoring". Censoring is when there is data out there that we haven't captured. By convention, the arrow of time flies from left to right. We have "right censored" data, in that most players in the database will play next season and in seasons after that. Corsi data only starts in 2007, so we also have "left censored" data. There are a large number of players in the database who played prior to 2007 but we have no way to know what their results were. Finally, the data has cohort issues. Players who were 18 in 2007 were only 24 in 2013. They have no overlap with players who were 25 or older in 2007. Trying to compare players at age 20 to players at age 28 may be apples and oranges.

**Why it might change with age**

Young players have to learn the ropes. They have to fill out their lanky frames, grow into their bodies, and pay their dues. Older players lose a step. I'm sure there are other adages I'm forgetting. You might think the curve of Corsi versus Age looks like:

**Why it might not**

That all sounds good, but the NHL is a cut-throat business. Teams generally don't have the leeway to let players learn on the job or fade away gracefully. Even if the Corsi versus Age curve truly looks like figure 1, the NHL part of it is probably the broad flat top. We don't get to see the ends because they take place in other leagues.

**Models**

In all these analyses, I'm looking at Corsi as a rate. Average = 0. First, all players, unweighted:

> LinearModel.1 = lm(CORSION ~ Age, data=CorsiAges)

> summary(LinearModel.1)

Call:

lm(formula = CORSION ~ Age, data = CorsiAges)

Residuals:

Min 1Q Median 3Q Max

-143.443 -6.295 0.828 7.370 101.517

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -2.36217 1.01118 -2.336 0.0195 *

Age 0.01980 0.03725 0.532 0.5950

Next all players but weighted by minutes played:

> LinearModel.2 = lm(CORSION ~ Age, data=CorsiAges, weights=Minutes)

> summary(LinearModel.2)

Call:

lm(formula = CORSION ~ Age, data = CorsiAges, weights = Minutes)

Weighted Residuals:

Min 1Q Median 3Q Max

-824.62 -156.34 -22.65 113.37 967.59

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.34161 0.69307 3.379 0.000733 ***

Age -0.07941 0.02493 -3.185 0.001455 **

So maybe. If so, a 20 year-old has an expected Corsi of 0.75 and a 40 year-old has an expected Corsi of -0.83. If real, that's not enough to worry about. Plus the apples to oranges issue.

Next, nesting age within each player and weighting by minutes:

> LinearModel.13 = lm(CORSION ~ NAME/Age, data=CorsiAges, weights=Minutes)

> anova (LinearModel.13)

Analysis of Variance Table

Response: CORSION

Df Sum Sq Mean Sq F value Pr(>F)

NAME 1549 173078767 111736 4.5021 < 2.2e-16 ***

NAME:Age 1256 49032978 39039 1.5730 < 2.2e-16 ***

Residuals 3150 78178478 24819

It looks like the NAME:Age interaction is significant off the charts. But this is an overparameterized model. Looking at the NAME term, the top 5 and bottom 5 players are not exactly a Who's Who of Corsi.

Variable |
Estimate |
StdError |
tvalue |
Pr(>|t|) |

TREVORGILLIES |
7.031e+02 |
1.368e+03 |
0.514 |
0.607313 |

MIKEIGGULDEN |
1.067e+03 |
2.093e+03 |
0.51 |
0.610228 |

KYLEGREENTREE |
8.946e+02 |
1.793e+03 |
0.499 |
0.617921 |

DARRENMCCARTY |
8.703e+02 |
1.803e+03 |
0.483 |
0.62943 |

IVANVISHNEVSKIY |
6.704e+02 |
1.442e+03 |
0.465 |
0.641978 |

MIKKOLEHTONEN |
-2.346e+03 |
2.427e+03 |
-0.967 |
0.3337 |

RICKARDRAKELL |
-1.313e+03 |
1.317e+03 |
-0.997 |
0.31908 |

BARRYTALLACKSON |
-1.847e+03 |
1.760e+03 |
-1.05 |
0.293805 |

JONMATSUMOTO |
-2.779e+03 |
2.133e+03 |
-1.303 |
0.192674 |

JAREDROSS |
-2.054e+03 |
1.554e+03 |
-1.322 |
0.186429 |

The top5/bottom 5 on the NAME:Age term list looks more familiar.

Variable |
Estimate |
StdError |
tvalue |
Pr(>|t|) |

ANZEKOPITAR:Age |
4.679e+00 |
8.827e-01 |
5.301 |
1.23e-07 |

DUSTINBROWN:Age |
3.864e+00 |
9.345e-01 |
4.135 |
3.64e-05 |

RYANOREILLY:Age |
1.037e+01 |
2.594e+00 |
3.999 |
6.52e-05 |

PATRICEBERGERON:Age |
4.482e+00 |
1.199e+00 |
3.738 |
0.000189 |

EVGENIMALKIN:Age |
3.539e+00 |
9.584e-01 |
3.693 |
0.000226 |

MANNYMALHOTRA:Age |
-4.798e+00 |
1.156e+00 |
-4.15 |
3.41e-05 |

NIKOLAIKULEMIN:Age |
-5.760e+00 |
1.264e+00 |
-4.555 |
5.42e-06 |

ANDREASLILJA:Age |
-7.884e+00 |
1.690e+00 |
-4.666 |
3.20e-06 |

ALEXOVECHKIN:Age |
-4.353e+00 |
8.701e-01 |
-5.003 |
5.96e-07 |

DIONPHANEUF:Age |
-5.222e+00 |
8.221e-01 |
-6.351 |
2.44e-10 |

Just looking at Kopitar and Phaneuf, yes the trends are there. Looking at Corsi Rel, it suggests that some of this is being driven by changes in the quality of their teams.

Year |
NAME |
CORSIREL |
CORSION |
CORSIOFF |
Minutes |
Birth |
Age |

2007 |
ANZEKOPITAR |
5.5 |
-4.99 |
-10.45 |
1225.9 |
1987 |
20 |

2008 |
ANZEKOPITAR |
12.3 |
9.5 |
-2.77 |
1175.06 |
1987 |
21 |

2009 |
ANZEKOPITAR |
11.2 |
8.94 |
-2.24 |
1267.72 |
1987 |
22 |

2010 |
ANZEKOPITAR |
8.3 |
8.68 |
0.4 |
1140.75 |
1987 |
23 |

2011 |
ANZEKOPITAR |
13.8 |
19.35 |
5.5 |
1225.08 |
1987 |
24 |

2012 |
ANZEKOPITAR |
18.4 |
25.43 |
7.06 |
726.62 |
1987 |
25 |

2013 |
ANZEKOPITAR |
15.4 |
25.24 |
9.88 |
1205.4 |
1987 |
26 |

2007 |
DIONPHANEUF |
8 |
9.25 |
1.22 |
1446.48 |
1985 |
22 |

2008 |
DIONPHANEUF |
-1.6 |
10.45 |
12.01 |
1395.2 |
1985 |
23 |

2009 |
DIONPHANEUF |
5.6 |
6.36 |
0.74 |
1415.07 |
1985 |
24 |

2010 |
DIONPHANEUF |
-1.5 |
-6.2 |
-4.73 |
1228.92 |
1985 |
25 |

2011 |
DIONPHANEUF |
3.3 |
-0.36 |
-3.66 |
1521.1 |
1985 |
26 |

2012 |
DIONPHANEUF |
-7.3 |
-18.16 |
-10.85 |
859.2 |
1985 |
27 |

2013 |
DIONPHANEUF |
-5.8 |
-20.04 |
-14.27 |
1326.4 |
1985 |
28 |

About as many players go up with age as go down with age. The pattern doesn't look any different in players under 30 versus players over 30. Another year or two of data will help sort this out. I doubt Kopitar keeps going up, up, up or Phaneuf down, down, down.

Finally, looking at age cohorts

For this part, I limited the anlysis players who were in the league in 2007-08 and broke the data into age cohorts. I separated the players by their age in 2007 into 18-20, 21-25, 26-30, 31-35, and 35+. I then did the analysis separately for each group. For the three youngest cohorts, there is no age effect (weighted or unweighted).

> LinearModel.1 = lm(CORSION ~ Age, data=Cohort1820)

> summary(LinearModel.1)

Call:

lm(formula = CORSION ~ Age, data = Cohort1820)

Residuals:

Min 1Q Median 3Q Max

-54.865 -6.236 0.856 6.776 41.859

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -7.6241 6.6880 -1.140 0.255

Age 0.3683 0.2961 1.244 0.214

Residual standard error: 10.87 on 320 degrees of freedom

Multiple R-squared: 0.004812, Adjusted R-squared: 0.001703

F-statistic: 1.547 on 1 and 320 DF, p-value: 0.2144

> LinearModel.3 = lm(CORSION ~ Age, data=Cohort2125)

> summary(LinearModel.3)

Call:

lm(formula = CORSION ~ Age, data = Cohort2125)

Residuals:

Min 1Q Median 3Q Max

-143.678 -6.203 1.166 7.219 87.351

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.21332 3.51140 0.061 0.952

Age -0.07019 0.13613 -0.516 0.606

Residual standard error: 13.25 on 1718 degrees of freedom

Multiple R-squared: 0.0001547, Adjusted R-squared: -0.0004273

F-statistic: 0.2658 on 1 and 1718 DF, p-value: 0.6062

> LinearModel.5 = lm(CORSION ~ Age, data=Cohort2630)

> summary(LinearModel.5)

Call:

lm(formula = CORSION ~ Age, data = Cohort2630)

Residuals:

Min 1Q Median 3Q Max

-79.126 -6.011 0.341 6.581 36.899

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.0374 4.2088 0.722 0.471

Age -0.1349 0.1388 -0.972 0.331

Residual standard error: 11.16 on 1178 degrees of freedom

Multiple R-squared: 0.0008006, Adjusted R-squared: -4.763e-05

F-statistic: 0.9438 on 1 and 1178 DF, p-value: 0.3315

In the two older cohorts, there seems to be an age effect (weighted or unweighted). Interestingly, Corsi goes up as these players get older.

> LinearModel.7 = lm(CORSION ~ Age, data=Cohort3135)

> summary(LinearModel.7)

Call:

lm(formula = CORSION ~ Age, data = Cohort3135)

Residuals:

Min 1Q Median 3Q Max

-34.790 -6.116 0.241 5.764 32.481

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -19.7605 6.7889 -2.911 0.00375 **

Age 0.5191 0.1969 2.636 0.00863 **

Residual standard error: 9.555 on 552 degrees of freedom

Multiple R-squared: 0.01243, Adjusted R-squared: 0.01064

F-statistic: 6.948 on 1 and 552 DF, p-value: 0.008628

> LinearModel.9 = lm(CORSION ~ Age, data=Cohort35up)

> summary(LinearModel.9)

Call:

lm(formula = CORSION ~ Age, data = Cohort35up)

Residuals:

Min 1Q Median 3Q Max

-38.249 -7.743 0.234 6.841 34.104

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -54.4234 19.7084 -2.761 0.00675 **

Age 1.3990 0.5139 2.722 0.00755 **

Residual standard error: 10.84 on 109 degrees of freedom

Multiple R-squared: 0.06366, Adjusted R-squared: 0.05507

F-statistic: 7.41 on 1 and 109 DF, p-value: 0.007552

I think some of this is selection bias. Some of this is Chris Chelios skewing the data. Looking at the scatter plot tends to confirm this. (It also tends to highlight what an outlier Chris Chelios was. Those 3 dots out at 45, 46, and 47 are him.)

Eric T looked a this a couple months ago and said "The average player peaks at a bit over 51 percent Corsi, which is something like 60th percentile among regulars. By age 34 or 35, he's dropped to around 47 percent, which would be about 20th percentile. " Here's his figure for this. I've made one little change.

Obviously, I'm very skeptical about this. But let's suppose that somehow he has managed to stumble upon The Truth and the average player really does peak at 51% and gradually drop off to 47%. The line I drew in is the lower limit of the 95% Confidence Interval for a full season of Corsi. The upper limit didn't fit on his figure, but it is up at 57%, which is roughly where it says "Eric T looked a this". So a "change" from 51% to 47% is a lot less than the magnitude of the randomness. It's not significant. So if a team is considering acquiring a new player, and they have three options, a 29 year-old, a 31 year-old, and a 33 year-old, I would not choose among them based on a concern that Corsi would change as they age.