clock menu more-arrow no yes mobile

Filed under:

Corsi Error: Part 2

If Corsi production rates were fixed, the 95% Confidence Interval for Corsi/60 for a season would be +/- 4.8. Corsi production varies considerably from game to game. As a result, the 95% Confidence Interval for Corsi/60 for a season appears to be +/- 13.7.

Variable Corsi Rates - Teams

I pulled the Blues 2011-2012 Corsi events on a game by game basis. Because it is hard to define a shift at the team level, I broke each game into 40 second increments. For each increment, I determined if it was 5v5. If it was, I counted the number of Corsi events for each team. For example, in Game 20015 against NSH the event frequency table looks like:

Count

STL

NSH

0

41

55

1

19

14

2

10

4

3

3

0

4

0

0

5

0

0

6

0

0

7

0

0

Total

73

73

I used Monte Carlo and created 100,000 seasons of 82 games. The Monte Carlo has two levels. On the first level, for each game I picked a game at random from the "population" of Blues games from 2011-12. I generated the event tables from that game. I used the Blues Corsi counts to generate the offensive Corsi probability density function and their opponent's Corsi counts to generate the defensive Corsi probability density function for the simulated game. On the second level, I used a random number generator to simulate 48 minutes of 5v5 hockey. I then computed Corsi/60 for each season. I got a mean of 6.45 and a standard deviation of 3.21. The 95% confidence interval is about +/- 6.29.

Variable Corsi Rates - Players

Looking at that same game against Nashville, here are the the actual 5v5 shift-by-shift event counts for David Backes and Barret Jackman.

Count

Backes

NSH

Jackman

NSH

0

10

17

13

19

1

9

4

6

4

2

1

1

5

1

3

1

0

0

0

4

1

0

0

0

5

0

0

0

0

6

0

0

0

0

7

0

0

0

0

Total

22

22

24

24

Sharp-eyed readers might note that Backes had one shift with 4 events, whereas the Blues had no increments with more than 3. Shifts and increments are not the same. Obviously, the shift in question spanned across two increments.

For each player, I pulled every 5v5 shift from every game. Then, for each player, I created 10,000 seasons of 1804 shifts broken into 82 games of 22 shifts each. The Monte Carlo once again has two levels. On the first level, for each game I picked a game at random from the "population" of each Blues player's games from 2011-12. I generated the event tables from that game.  For example, for David Backes, I used his Corsi counts to generate the offensive Corsi probability density function and the opponent's Corsi counts to generate the defensive Corsi probability density function for the simulated game. On the second level, I used a random number generator to simulate 22 shifts of 5v5 hockey.  I calculated Corsi/60 for each simulated season.   I then calculated the variance and standard deviation of Corsi/60 across those 10,000 seasons. For Backes, the standard deviation across those hypothetical seasons was 8.007. Repeating this for all the Blues regulars I got

Player

Std Dev

ARNOTT

7.4621

BACKES

8.0073

BERGLUND

6.9529

COLAIACOVO

7.2470

JACKMAN

9.0432

LANGENBRUNNER

7.0033

MCDONALD

7.4879

NICHOL

9.2192

OSHIE

6.9192

PIETRANGELO

6.5739

POLAK

8.0527

REAVES

8.6063

SHATTENKIRK

7.7718

SOBOTKA

7.6761

STEEN

6.0738

STEWART

7.9755

HUSKINS

8.1449

GRACHEV

6.1366

PORTER

7.3864

COLE

6.5986

RUSSELL

9.9555

PERRON

6.5107

CROMBEEN

9.7999

AVERAGE

7.6785

With an average standard deviation of 7.68 the average 95% confidence interval is about +/- 15.05. Converted to Corsi%, the 95% confidence interval is about 42.3% to 57.7%

MLE or not MLE

Using the event tables this way is based on the premise that the observed values of λ for that game are "maximum likelihood estimators" of the underlying λs. I was concerned whether the extreme λs were truly MLE. Is essence, the question is "is the table we see the result of an extreme λ and a typical table or a less extreme λ and an atypical table?" The answer is that, for most players, the most extreme game is more likely to result from a less extreme λ and a slightly atypical result than from the most extreme λ and a typical result. If you rerun the Monte Carlo simulations with only the λs that are MLE, the average standard deviation decreases slightly to 6.99 which gives a 95% confidence interval for Corsi/60 of about +/- 13.70. The corresponding 95% confidence interval of Corsi% is about 43.2% to 56.8%

Limitations

These simulations assume roughly 1200 minutes spread over 82 games of 22 shifts each. If player plays more than 1200 minutes, their variability will be slightly less. With fewer minutes, variability will go up. In the real world, the number of shifts obviously varies from game to game. Shift length also varies some.  The way Corsi is constructed tends to mitigate these effects.  Variations in Zone Starts and QOC would also increase the observed variation in Corsi.

Conclusion

Corsi is an estimate and contains considerable error. Over a full season of about 80 games/1200 minutes, even if Corsi production rates were fixed, Corsi/60 for individual players would have a 95% Confidence Interval of about +/- 4.8. However, Corsi production rates are highly variable. Monte Carlo analysis suggests that the Corsi/60 of individual players playing a full season has a 95% Confidence Interval of about +/- 13.7. The 95% confidence interval of Corsi% is about 43.2% to 56.8%

In part 3 I will look at this issue using a different technique.