## Corsi Error: Part 2

If Corsi production rates were fixed, the 95% Confidence Interval for Corsi/60 for a season would be +/- 4.8. Corsi production varies considerably from game to game. As a result, the 95% Confidence Interval for Corsi/60 for a season appears to be +/- 13.7.

Variable Corsi Rates - Teams

I pulled the Blues 2011-2012 Corsi events on a game by game basis. Because it is hard to define a shift at the team level, I broke each game into 40 second increments. For each increment, I determined if it was 5v5. If it was, I counted the number of Corsi events for each team. For example, in Game 20015 against NSH the event frequency table looks like:

 Count STL NSH 0 41 55 1 19 14 2 10 4 3 3 0 4 0 0 5 0 0 6 0 0 7 0 0 Total 73 73

I used Monte Carlo and created 100,000 seasons of 82 games. The Monte Carlo has two levels. On the first level, for each game I picked a game at random from the "population" of Blues games from 2011-12. I generated the event tables from that game. I used the Blues Corsi counts to generate the offensive Corsi probability density function and their opponent's Corsi counts to generate the defensive Corsi probability density function for the simulated game. On the second level, I used a random number generator to simulate 48 minutes of 5v5 hockey. I then computed Corsi/60 for each season. I got a mean of 6.45 and a standard deviation of 3.21. The 95% confidence interval is about +/- 6.29.

Variable Corsi Rates - Players

Looking at that same game against Nashville, here are the the actual 5v5 shift-by-shift event counts for David Backes and Barret Jackman.

 Count Backes NSH Jackman NSH 0 10 17 13 19 1 9 4 6 4 2 1 1 5 1 3 1 0 0 0 4 1 0 0 0 5 0 0 0 0 6 0 0 0 0 7 0 0 0 0 Total 22 22 24 24

Sharp-eyed readers might note that Backes had one shift with 4 events, whereas the Blues had no increments with more than 3. Shifts and increments are not the same. Obviously, the shift in question spanned across two increments.

For each player, I pulled every 5v5 shift from every game. Then, for each player, I created 10,000 seasons of 1804 shifts broken into 82 games of 22 shifts each. The Monte Carlo once again has two levels. On the first level, for each game I picked a game at random from the "population" of each Blues player's games from 2011-12. I generated the event tables from that game.  For example, for David Backes, I used his Corsi counts to generate the offensive Corsi probability density function and the opponent's Corsi counts to generate the defensive Corsi probability density function for the simulated game. On the second level, I used a random number generator to simulate 22 shifts of 5v5 hockey.  I calculated Corsi/60 for each simulated season.   I then calculated the variance and standard deviation of Corsi/60 across those 10,000 seasons. For Backes, the standard deviation across those hypothetical seasons was 8.007. Repeating this for all the Blues regulars I got

 Player Std Dev ARNOTT 7.4621 BACKES 8.0073 BERGLUND 6.9529 COLAIACOVO 7.2470 JACKMAN 9.0432 LANGENBRUNNER 7.0033 MCDONALD 7.4879 NICHOL 9.2192 OSHIE 6.9192 PIETRANGELO 6.5739 POLAK 8.0527 REAVES 8.6063 SHATTENKIRK 7.7718 SOBOTKA 7.6761 STEEN 6.0738 STEWART 7.9755 HUSKINS 8.1449 GRACHEV 6.1366 PORTER 7.3864 COLE 6.5986 RUSSELL 9.9555 PERRON 6.5107 CROMBEEN 9.7999 AVERAGE 7.6785

With an average standard deviation of 7.68 the average 95% confidence interval is about +/- 15.05. Converted to Corsi%, the 95% confidence interval is about 42.3% to 57.7%

MLE or not MLE

Using the event tables this way is based on the premise that the observed values of λ for that game are "maximum likelihood estimators" of the underlying λs. I was concerned whether the extreme λs were truly MLE. Is essence, the question is "is the table we see the result of an extreme λ and a typical table or a less extreme λ and an atypical table?" The answer is that, for most players, the most extreme game is more likely to result from a less extreme λ and a slightly atypical result than from the most extreme λ and a typical result. If you rerun the Monte Carlo simulations with only the λs that are MLE, the average standard deviation decreases slightly to 6.99 which gives a 95% confidence interval for Corsi/60 of about +/- 13.70. The corresponding 95% confidence interval of Corsi% is about 43.2% to 56.8%

Limitations

These simulations assume roughly 1200 minutes spread over 82 games of 22 shifts each. If player plays more than 1200 minutes, their variability will be slightly less. With fewer minutes, variability will go up. In the real world, the number of shifts obviously varies from game to game. Shift length also varies some.  The way Corsi is constructed tends to mitigate these effects.  Variations in Zone Starts and QOC would also increase the observed variation in Corsi.

Conclusion

Corsi is an estimate and contains considerable error. Over a full season of about 80 games/1200 minutes, even if Corsi production rates were fixed, Corsi/60 for individual players would have a 95% Confidence Interval of about +/- 4.8. However, Corsi production rates are highly variable. Monte Carlo analysis suggests that the Corsi/60 of individual players playing a full season has a 95% Confidence Interval of about +/- 13.7. The 95% confidence interval of Corsi% is about 43.2% to 56.8%

In part 3 I will look at this issue using a different technique.

More from St. Louis Game Time

## Trending Discussions

Log In Sign Up

forgot?
Log In Sign Up

### Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

### Forgot password?

Try another email?

### Join St. Louis Game Time

You must be a member of St. Louis Game Time to participate.

We have our own Community Guidelines at St. Louis Game Time. You should read them.

### Join St. Louis Game Time

You must be a member of St. Louis Game Time to participate.

We have our own Community Guidelines at St. Louis Game Time. You should read them.

### Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.