clock menu more-arrow no yes mobile

Filed under:

Does Corsi Change With Age?

Maybe. Maybe not. We really don't have enough data for a clear-cut answer.

Elsa

Does Corsi change with age? It's actually a good question. When you try to answer this you run into a lot of problems. First, a player's Corsi varies a lot from season to season. Second, there is a lot of "censoring". Censoring is when there is data out there that we haven't captured. By convention, the arrow of time flies from left to right. We have "right censored" data, in that most players in the database will play next season and in seasons after that. Corsi data only starts in 2007, so we also have "left censored" data. There are a large number of players in the database who played prior to 2007 but we have no way to know what their results were. Finally, the data has cohort issues. Players who were 18 in 2007 were only 24 in 2013. They have no overlap with players who were 25 or older in 2007. Trying to compare players at age 20 to players at age 28 may be apples and oranges.

Why it might change with age

Young players have to learn the ropes. They have to fill out their lanky frames, grow into their bodies, and pay their dues. Older players lose a step. I'm sure there are other adages I'm forgetting. You might think the curve of Corsi versus Age looks like:


Myage2_medium


Why it might not

That all sounds good, but the NHL is a cut-throat business. Teams generally don't have the leeway to let players learn on the job or fade away gracefully. Even if the Corsi versus Age curve truly looks like figure 1, the NHL part of it is probably the broad flat top. We don't get to see the ends because they take place in other leagues.

Myage3_medium

Models

In all these analyses, I'm looking at Corsi as a rate. Average = 0. First, all players, unweighted:

> LinearModel.1 = lm(CORSION ~ Age, data=CorsiAges)

> summary(LinearModel.1)

Call:

lm(formula = CORSION ~ Age, data = CorsiAges)

Residuals:

Min 1Q Median 3Q Max

-143.443 -6.295 0.828 7.370 101.517

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -2.36217 1.01118 -2.336 0.0195 *

Age 0.01980 0.03725 0.532 0.5950

Corsiage_medium

Next all players but weighted by minutes played:

> LinearModel.2 = lm(CORSION ~ Age, data=CorsiAges, weights=Minutes)

> summary(LinearModel.2)

Call:

lm(formula = CORSION ~ Age, data = CorsiAges, weights = Minutes)

Weighted Residuals:

Min 1Q Median 3Q Max

-824.62 -156.34 -22.65 113.37 967.59

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.34161 0.69307 3.379 0.000733 ***

Age -0.07941 0.02493 -3.185 0.001455 **

So maybe. If so, a 20 year-old has an expected Corsi of 0.75 and a 40 year-old has an expected Corsi of -0.83. If real, that's not enough to worry about.  Plus the apples to oranges issue.

Next, nesting age within each player and weighting by minutes:

> LinearModel.13 = lm(CORSION ~ NAME/Age, data=CorsiAges, weights=Minutes)

> anova (LinearModel.13)

Analysis of Variance Table

Response: CORSION

Df Sum Sq Mean Sq F value Pr(>F)

NAME 1549 173078767 111736 4.5021 < 2.2e-16 ***

NAME:Age 1256 49032978 39039 1.5730 < 2.2e-16 ***

Residuals 3150 78178478 24819

It looks like the NAME:Age interaction is significant off the charts. But this is an overparameterized model. Looking at the NAME term, the top 5 and bottom 5 players are not exactly a Who's Who of Corsi.

Variable

Estimate

StdError

tvalue

Pr(>|t|)

TREVORGILLIES

7.031e+02

1.368e+03

0.514

0.607313

MIKEIGGULDEN

1.067e+03

2.093e+03

0.51

0.610228

KYLEGREENTREE

8.946e+02

1.793e+03

0.499

0.617921

DARRENMCCARTY

8.703e+02

1.803e+03

0.483

0.62943

IVANVISHNEVSKIY

6.704e+02

1.442e+03

0.465

0.641978

MIKKOLEHTONEN

-2.346e+03

2.427e+03

-0.967

0.3337

RICKARDRAKELL

-1.313e+03

1.317e+03

-0.997

0.31908

BARRYTALLACKSON

-1.847e+03

1.760e+03

-1.05

0.293805

JONMATSUMOTO

-2.779e+03

2.133e+03

-1.303

0.192674

JAREDROSS

-2.054e+03

1.554e+03

-1.322

0.186429

The top5/bottom 5 on the NAME:Age term list looks more familiar.

Variable

Estimate

StdError

tvalue

Pr(>|t|)

ANZEKOPITAR:Age

4.679e+00

8.827e-01

5.301

1.23e-07

DUSTINBROWN:Age

3.864e+00

9.345e-01

4.135

3.64e-05

RYANOREILLY:Age

1.037e+01

2.594e+00

3.999

6.52e-05

PATRICEBERGERON:Age

4.482e+00

1.199e+00

3.738

0.000189

EVGENIMALKIN:Age

3.539e+00

9.584e-01

3.693

0.000226

MANNYMALHOTRA:Age

-4.798e+00

1.156e+00

-4.15

3.41e-05

NIKOLAIKULEMIN:Age

-5.760e+00

1.264e+00

-4.555

5.42e-06

ANDREASLILJA:Age

-7.884e+00

1.690e+00

-4.666

3.20e-06

ALEXOVECHKIN:Age

-4.353e+00

8.701e-01

-5.003

5.96e-07

DIONPHANEUF:Age

-5.222e+00

8.221e-01

-6.351

2.44e-10

Just looking at Kopitar and Phaneuf, yes the trends are there. Looking at Corsi Rel, it suggests that some of this is being driven by changes in the quality of their teams.

Year

NAME

CORSIREL

CORSION

CORSIOFF

Minutes

Birth

Age

2007

ANZEKOPITAR

5.5

-4.99

-10.45

1225.9

1987

20

2008

ANZEKOPITAR

12.3

9.5

-2.77

1175.06

1987

21

2009

ANZEKOPITAR

11.2

8.94

-2.24

1267.72

1987

22

2010

ANZEKOPITAR

8.3

8.68

0.4

1140.75

1987

23

2011

ANZEKOPITAR

13.8

19.35

5.5

1225.08

1987

24

2012

ANZEKOPITAR

18.4

25.43

7.06

726.62

1987

25

2013

ANZEKOPITAR

15.4

25.24

9.88

1205.4

1987

26

2007

DIONPHANEUF

8

9.25

1.22

1446.48

1985

22

2008

DIONPHANEUF

-1.6

10.45

12.01

1395.2

1985

23

2009

DIONPHANEUF

5.6

6.36

0.74

1415.07

1985

24

2010

DIONPHANEUF

-1.5

-6.2

-4.73

1228.92

1985

25

2011

DIONPHANEUF

3.3

-0.36

-3.66

1521.1

1985

26

2012

DIONPHANEUF

-7.3

-18.16

-10.85

859.2

1985

27

2013

DIONPHANEUF

-5.8

-20.04

-14.27

1326.4

1985

28

About as many players go up with age as go down with age. The pattern doesn't look any different in players under 30 versus players over 30.  Another year or two of data will help sort this out.  I doubt Kopitar keeps going up, up, up or Phaneuf down, down, down.

Finally, looking at age cohorts

For this part, I limited the anlysis players who were in the league in 2007-08 and broke the data into age cohorts. I separated the players by their age in 2007 into 18-20, 21-25, 26-30, 31-35, and 35+. I then did the analysis separately for each group. For the three youngest cohorts, there is no age effect (weighted or unweighted).

> LinearModel.1 = lm(CORSION ~ Age, data=Cohort1820)

> summary(LinearModel.1)

Call:

lm(formula = CORSION ~ Age, data = Cohort1820)

Residuals:

Min 1Q Median 3Q Max

-54.865 -6.236 0.856 6.776 41.859

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -7.6241 6.6880 -1.140 0.255

Age 0.3683 0.2961 1.244 0.214

Residual standard error: 10.87 on 320 degrees of freedom

Multiple R-squared: 0.004812, Adjusted R-squared: 0.001703

F-statistic: 1.547 on 1 and 320 DF, p-value: 0.2144

> LinearModel.3 = lm(CORSION ~ Age, data=Cohort2125)

> summary(LinearModel.3)

Call:

lm(formula = CORSION ~ Age, data = Cohort2125)

Residuals:

Min 1Q Median 3Q Max

-143.678 -6.203 1.166 7.219 87.351

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.21332 3.51140 0.061 0.952

Age -0.07019 0.13613 -0.516 0.606

Residual standard error: 13.25 on 1718 degrees of freedom

Multiple R-squared: 0.0001547, Adjusted R-squared: -0.0004273

F-statistic: 0.2658 on 1 and 1718 DF, p-value: 0.6062

> LinearModel.5 = lm(CORSION ~ Age, data=Cohort2630)

> summary(LinearModel.5)

Call:

lm(formula = CORSION ~ Age, data = Cohort2630)

Residuals:

Min 1Q Median 3Q Max

-79.126 -6.011 0.341 6.581 36.899

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.0374 4.2088 0.722 0.471

Age -0.1349 0.1388 -0.972 0.331

Residual standard error: 11.16 on 1178 degrees of freedom

Multiple R-squared: 0.0008006, Adjusted R-squared: -4.763e-05

F-statistic: 0.9438 on 1 and 1178 DF, p-value: 0.3315

In the two older cohorts, there seems to be an age effect (weighted or unweighted). Interestingly, Corsi goes up as these players get older.

> LinearModel.7 = lm(CORSION ~ Age, data=Cohort3135)

> summary(LinearModel.7)

Call:

lm(formula = CORSION ~ Age, data = Cohort3135)

Residuals:

Min 1Q Median 3Q Max

-34.790 -6.116 0.241 5.764 32.481

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -19.7605 6.7889 -2.911 0.00375 **

Age 0.5191 0.1969 2.636 0.00863 **

Residual standard error: 9.555 on 552 degrees of freedom

Multiple R-squared: 0.01243, Adjusted R-squared: 0.01064

F-statistic: 6.948 on 1 and 552 DF, p-value: 0.008628

> LinearModel.9 = lm(CORSION ~ Age, data=Cohort35up)

> summary(LinearModel.9)

Call:

lm(formula = CORSION ~ Age, data = Cohort35up)

Residuals:

Min 1Q Median 3Q Max

-38.249 -7.743 0.234 6.841 34.104

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -54.4234 19.7084 -2.761 0.00675 **

Age 1.3990 0.5139 2.722 0.00755 **

Residual standard error: 10.84 on 109 degrees of freedom

Multiple R-squared: 0.06366, Adjusted R-squared: 0.05507

F-statistic: 7.41 on 1 and 109 DF, p-value: 0.007552

I think some of this is selection bias. Some of this is Chris Chelios skewing the data.   Looking at the scatter plot tends to confirm this. (It also tends to highlight what an outlier Chris Chelios was. Those 3 dots out at 45, 46, and 47 are him.)

Ch30up_medium

Eric T looked a this a couple months ago and said "The average player peaks at a bit over 51 percent Corsi, which is something like 60th percentile among regulars. By age 34 or 35, he's dropped to around 47 percent, which would be about 20th percentile. " Here's his figure for this. I've made one little change.

Aging_-_f_corsi_medium

Obviously, I'm very skeptical about this. But let's suppose that somehow he has managed to stumble upon The Truth and the average player really does peak at 51% and gradually drop off to 47%. The line I drew in is the lower limit of the 95% Confidence Interval for a full season of Corsi. The upper limit didn't fit on his figure, but it is up at 57%, which is roughly where it says "Eric T looked a this". So a "change" from 51% to 47% is a lot less than the magnitude of the randomness. It's not significant. So if a team is considering acquiring a new player, and they have three options, a 29 year-old, a 31 year-old, and a 33 year-old, I would not choose among them based on a concern that Corsi would change as they age.