In the case of goalies, we have a pretty good handle on the relationship between underlying talent and observed save percentage. A goalie whose talent is 0.920 who sees 1600 shots should save around 1472 of them. There is a 95% probability that he will save between 1449 and 1492 of them. This gives us a 95% Confidence Interval for observed save percentage of 0.9056 to 0.9328. We know all this because a goalie saving the puck can be modeled with a Binomial process.
What would the relationship look like if Corsi events were generated by a Poisson process?
Lambda (or λ) is the event rate or the average number of events per unit of time. The expected value of a Poisson is λ*t. The variance is also λ*t and the standard deviation is sqrt (λ*t). Corsi events occur at a rate of about 0.8355 per minute of 5v5 time, so λ = 0.8355 per minute. If an average player plays 1200 minutes in a season, we would expect him to generate 1003 offensive Corsi events and 1003 defensive Corsi events. Both of these would have a variance of 1003 and a standard deviation of 31.66.
The difference in 2 Poisson random variables is defined by a distribution called a Skellam. The expected value of a Skellam is (λ1 - λ2)*t. The variance is (λ1+ λ2)*t and the standard deviation is sqrt( (λ1+ λ2)*t ). Net Corsi Events would be a Skellam process. Here λ1 = λ2 = 0.8355 per minute and t = 1200 minutes. The average player would have an expected value of Net Corsi events of 0, with a variance of 2006 and a standard deviation of 44.79. A 95% Confidence Interval for Net Corsi events is -88 to +88. Converted to Corsi/60, this gives a 95% Confidence Interval of -4.4 to +4.4. Converted to Corsi%, this gives a 95% Confidence Interval of 47.8% to 52.2%.
However, Corsi events ARE NOT Poisson. Poisson process requires that events are independent. Corsi events are significantly dependent. Corsi events cluster more than Poisson events. As a result the variance will be greater. The Poisson variance describes a floor value for Corsi variance.
There is no formula for calculating the variance of Corsi events. However, we can estimate the variance fairly accurately using a technique called "Monte Carlo". I scanned the 2011-2012 5v5 data. I chopped each game into 40 second increments, determined whether they were 5v5, and counted how many Corsi events happened for each team in each increment. There were 102567 Corsi events in 184260 increments, for an average rate of 0.557 per 40 seconds. The results:
I used the probability distribution function defined above to describe offensive and defensive Corsi events. I used a random number generator to simulate seasons of 1800 shifts. I generated 100,000 seasons. For each season, I calculated Net Corsi events. The Variance of Net Corsi events is 2420.6, and the standard deviation is 49.2. Converted to Corsi/60 the 95% Confidence Interval is +/- 4.82. Converted to Corsi% the 95% Confidence Interval is to 47.6% to 52.4%.
So an average player playing an average full season has a 95% probability of having an observed Corsi/60 in the range -4.8 to +4.8 if the rate of Corsi production is constant. Unfortunately, both λ vary significantly from game to game. This increases the variance (and the width of the Confidence Interval) for Corsi considerably as we will see in part 2.