There are two basic methods for estimating the talent of goaltenders: Frequentist and Bayes. The Frequentist approach is WYSIWYG. For example, Anton Khudobin has faced 864 ES shots and made 806 saves. A Frequentist would say the best estimate for his talent is 806/864 or 0.933. I would add the disclaimer that “This is only an estimate. There is a 95% probability that his true talent lies between 0.914 and 0.949.” A Bayesian would take 0.933 and add additional information to come up with their best estimate. They would say “The population of NHL goalies is such that most goalies are around 0.920. As a result, the Bayesian best estimate for Khudobin is 0.926.”
For Khudobin, I really can't argue with that. But for some goalies it doesn't make sense. Imagine I construct a coin that comes up heads 80% of the time. I give the coin to a Bayesian and ask for his prior assumption. “Well, all known coins come up heads 50% of the time, so my best estimate is 50%.” He flips the coin 100 times and gets 80 heads. (This would happen about once in 56 billion tries with a fair coin.) After getting this data, the Bayesian's best estimate for the coin would be something like 72%.
To me 72% is a number that is illogical. There would seem to be only 2 numbers that you can justify: 50% and 80%. Maybe is really is a fair coin and you just happened to see once-in-the-lifetime-of-the-universe results. However, both the Bayesian and I have accepted that this is no ordinary coin. It's an outlier. Once you accept that, the population frequency of normal coins is irrelevant and the properties of normal coins are immaterial.
So back to goalies. Coming into this season, Tim Thomas had seen 9014 ES shots and stopped 8385. A Frequentist would say his ESS% is most likely 0.930. A Bayesian would say it is most likely 0.927. A small difference, yes, but still a difference. And the difference comes from hanging onto the prior assumptions about 0.920 goalies. We have established he is an outlier. The prior no longer matters. So when should you let go of the prior assumption?
Calculating the Threshold.
To be statistically significant, the difference in the two save percentage has to be greater than a scaling factor times the standard deviation. So
P1 – P2 > SF*SD
Here P1 = 0.930, P2 = 0.920, and SD = sqrt (0.930*0.070/N). The scaling factor is 1.96 if you want to be 95% confident that there is a true difference or 2.576 if you want to be 99% confident.
0.010 > 2.576 * sqrt (0.0651/N) Solving for N gives 4319. (3287 for 95% confidence). Nobody is going to remember 4319. Call it 4400.
So if a goalie gets to 4400 shots and has a ES save percentage of 0.930 or greater there is a 99+% probability he is an outlier. I would argue that once you get to 4400 shots (or even 3300 shots) you are better off using the Frequentist estimate.
One final thought. Thomas (really any goalie) may have a talent that is actually a little higher than his observed save percentage. One of the last pieces of randomness in front of a goalie is a defensive player tipping a puck. If the tip changes the outcome, it could change a save into a goal against, or it could turn a goal against into something else. I would argue that the net result of this is more saves being turned into goals than vice-versa. Of 100 shots on net, 92 or 93 are destined to be saves. There is more opportunity to turn a save into a goal. Also, of the goals turned into something else, most are probably counted as blocks not saves. As a result, the true talent of the goalie is likely a little higher than the observed save percentage.