/cdn.vox-cdn.com/uploads/chorus_image/image/24846529/1223173_20-sided_dice_2.0.jpg)
Variable Corsi Rates - Teams
I pulled the Blues 2011-2012 Corsi events on a game by game basis. Because it is hard to define a shift at the team level, I broke each game into 40 second increments. For each increment, I determined if it was 5v5. If it was, I counted the number of Corsi events for each team. For example, in Game 20015 against NSH the event frequency table looks like:
Count |
STL |
NSH |
0 |
41 |
55 |
1 |
19 |
14 |
2 |
10 |
4 |
3 |
3 |
0 |
4 |
0 |
0 |
5 |
0 |
0 |
6 |
0 |
0 |
7 |
0 |
0 |
Total |
73 |
73 |
I used Monte Carlo and created 100,000 seasons of 82 games. The Monte Carlo has two levels. On the first level, for each game I picked a game at random from the "population" of Blues games from 2011-12. I generated the event tables from that game. I used the Blues Corsi counts to generate the offensive Corsi probability density function and their opponent's Corsi counts to generate the defensive Corsi probability density function for the simulated game. On the second level, I used a random number generator to simulate 48 minutes of 5v5 hockey. I then computed Corsi/60 for each season. I got a mean of 6.45 and a standard deviation of 3.21. The 95% confidence interval is about +/- 6.29.
Variable Corsi Rates - Players
Looking at that same game against Nashville, here are the the actual 5v5 shift-by-shift event counts for David Backes and Barret Jackman.
Count |
Backes |
NSH |
Jackman |
NSH |
0 |
10 |
17 |
13 |
19 |
1 |
9 |
4 |
6 |
4 |
2 |
1 |
1 |
5 |
1 |
3 |
1 |
0 |
0 |
0 |
4 |
1 |
0 |
0 |
0 |
5 |
0 |
0 |
0 |
0 |
6 |
0 |
0 |
0 |
0 |
7 |
0 |
0 |
0 |
0 |
Total |
22 |
22 |
24 |
24 |
Sharp-eyed readers might note that Backes had one shift with 4 events, whereas the Blues had no increments with more than 3. Shifts and increments are not the same. Obviously, the shift in question spanned across two increments.
For each player, I pulled every 5v5 shift from every game. Then, for each player, I created 10,000 seasons of 1804 shifts broken into 82 games of 22 shifts each. The Monte Carlo once again has two levels. On the first level, for each game I picked a game at random from the "population" of each Blues player's games from 2011-12. I generated the event tables from that game. For example, for David Backes, I used his Corsi counts to generate the offensive Corsi probability density function and the opponent's Corsi counts to generate the defensive Corsi probability density function for the simulated game. On the second level, I used a random number generator to simulate 22 shifts of 5v5 hockey. I calculated Corsi/60 for each simulated season. I then calculated the variance and standard deviation of Corsi/60 across those 10,000 seasons. For Backes, the standard deviation across those hypothetical seasons was 8.007. Repeating this for all the Blues regulars I got
Player |
Std Dev |
ARNOTT |
7.4621 |
BACKES |
8.0073 |
BERGLUND |
6.9529 |
COLAIACOVO |
7.2470 |
JACKMAN |
9.0432 |
LANGENBRUNNER |
7.0033 |
MCDONALD |
7.4879 |
NICHOL |
9.2192 |
OSHIE |
6.9192 |
PIETRANGELO |
6.5739 |
POLAK |
8.0527 |
REAVES |
8.6063 |
SHATTENKIRK |
7.7718 |
SOBOTKA |
7.6761 |
STEEN |
6.0738 |
STEWART |
7.9755 |
HUSKINS |
8.1449 |
GRACHEV |
6.1366 |
PORTER |
7.3864 |
COLE |
6.5986 |
RUSSELL |
9.9555 |
PERRON |
6.5107 |
CROMBEEN |
9.7999 |
AVERAGE |
7.6785 |
With an average standard deviation of 7.68 the average 95% confidence interval is about +/- 15.05. Converted to Corsi%, the 95% confidence interval is about 42.3% to 57.7%
MLE or not MLE
Using the event tables this way is based on the premise that the observed values of λ for that game are "maximum likelihood estimators" of the underlying λs. I was concerned whether the extreme λs were truly MLE. Is essence, the question is "is the table we see the result of an extreme λ and a typical table or a less extreme λ and an atypical table?" The answer is that, for most players, the most extreme game is more likely to result from a less extreme λ and a slightly atypical result than from the most extreme λ and a typical result. If you rerun the Monte Carlo simulations with only the λs that are MLE, the average standard deviation decreases slightly to 6.99 which gives a 95% confidence interval for Corsi/60 of about +/- 13.70. The corresponding 95% confidence interval of Corsi% is about 43.2% to 56.8%
Limitations
These simulations assume roughly 1200 minutes spread over 82 games of 22 shifts each. If player plays more than 1200 minutes, their variability will be slightly less. With fewer minutes, variability will go up. In the real world, the number of shifts obviously varies from game to game. Shift length also varies some. The way Corsi is constructed tends to mitigate these effects. Variations in Zone Starts and QOC would also increase the observed variation in Corsi.
Conclusion
Corsi is an estimate and contains considerable error. Over a full season of about 80 games/1200 minutes, even if Corsi production rates were fixed, Corsi/60 for individual players would have a 95% Confidence Interval of about +/- 4.8. However, Corsi production rates are highly variable. Monte Carlo analysis suggests that the Corsi/60 of individual players playing a full season has a 95% Confidence Interval of about +/- 13.7. The 95% confidence interval of Corsi% is about 43.2% to 56.8%
In part 3 I will look at this issue using a different technique.