clock menu more-arrow no yes mobile

Filed under:

Predicting Goaltending Performance

You can't predict randomness.

Bruce Bennett

In "Quantifying the added importance of recent data", Eric T. starts with the statement "It almost certainly has to be true that the recent results matter more than the older ones, but how much more? If you're trying to guess how a goalie will do this year, is last year's performance 10 percent more important than the year before that, twice as important, or ten times as important?"

This is a completely unjustified assumption. It would make more sense to first ask if prior performance has any significant relationship to current results.

Let's see whether there is any correlation between different years. For each year, I'm looking at only the goalies who have a 4 year track record leading up to the year of analysis. So for 2003, Year-1 is 2002, Year-2 is 2001, Year-3 is 2000 and Year-4 is 1999. To be included in the analysis, a goalie also had to play in all 5 seasons. Also, I'm just skipping over the lost season of 2004. So for 2005, Year-1 is 2003. Everything in these tables are the correlation coefficients. If you are bothered by 2004, the years 2001-2003 and 2009-2012 are unaffected.

Year

Year-1

Year-2

Year-3

Year-4

2001

0.2715

0.2236

-0.0209

-0.1878

2002

0.3376

0.1083

0.2405

-0.0515

2003

0.4379

0.1310

0.2037

0.4460

2005

0.3814

0.0427

0.0813

0.1523

2006

0.1918

0.0818

-0.0049

0.0400

2007

0.2496

0.1637

-0.0540

-0.1452

2008

0.3063

0.0975

0.0455

0.0373

2009

0.5158

-0.0114

0.1230

0.0230

2010

0.1702

0.4042

0.2027

-0.0708

2011

0.1741

0.1957

0.1336

0.1592

2012

0.4895

0.4167

0.3788

0.1765

Average

0.3205

0.1685

0.1208

0.0526

So Year-1 is a little better than the rest. Not much predictive power on average. Years -2, -3, and -4 are just about useless. I would also point out that 2012 has a strange pattern. It correlates with Years -2, -3, and -4 at a degree much higher than average.

A correlation of 0.32 may sound impressive but there really is not much relationship there. In any given year, the number of goalies is small enough that 0.32 is not quite statistically significant. Even if it were, it means that one variable explains only about 10% of the variability in the the other variable.

Eric T. continues "One way to try to answer this is a direct analysis of how things have turned out for goalies in recent years, how their eventual performance compared to their most recently completed seasons

This is what is called retrospective analysis. Essentially, you are looking at your data after the fact and drawing some lines to connect the dots. Sometimes you might come up with meaningful relationships. Sometimes you might not. If you look at enough random relationships some will seem to have associations. If you find some relationships, the way to determine whether these relationships are meaningful or not is to apply your analysis to another set of data. This is called a "validation sample" or a "hold-out sample". (You "hold" this data "out" of the first analysis.) Meaningful relationships will continue to be present in the validation sample. Spurious ones will not be there anymore.

Eric T. suggests using the following system to predict a goalie's performance. "If we are predicting 2013, 2012 gets a weight of 100, 2011 gets a weight of 70, 2010 gets a weight of 50, and 2009 gets a weight of 30."

It is certainly true that for any year, there is some set of a, b, c, and d that optimizes the equation ESSP(year) = a*(year-1) + b*(year-2) + c*(year-3) + d*(year-4). If there are no trends in the data, those weights will vary from year to year. Conversely, if you try to use a single set of weightings across the board, the results will not be very good. If you use his formula to "predict" 2012, it looks pretty good. R is 0.618. The problem is that this weighting is designed to optimize the "prediction" of 2012. By being specific, it is not robust. Let's look at the validation samples:

Year

Eric T's Formula

2001

0.2631

2002

0.3391

2003

0.4596

2005

0.2927

2006

0.1802

2007

0.2161

2008

0.2699

2009

0.3866

2010

0.3682

2011

0.2383

Average

0.3014

His equation "predicts" 2012 because it was derived from the 2012 data. It's overall performance in years other than 2012 is a little worse than how Year-1 does by itself.

Global Modeling

What do we get if we try to model the entire database as ESSP(year) = a*(year-1) + b*(year-2) + c*(year-3) + d*(year-4)? Here Year = y, Year-1 = x2, Year-2 = x2, etc.

> LinearModel.1 = lm(y ~ x1 + x2 + x3 + x4, data=CorrAll)

> summary(LinearModel.1)

Call:

lm(formula = y ~ x1 + x2 + x3 + x4, data = CorrAll)

Residuals:

Min 1Q Median 3Q Max

-0.139067 -0.006588 0.002703 0.011412 0.042301

Coefficients:

Variable Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.33635 0.09443 3.562 0.000411 ***

x1 0.39723 0.06366 6.240 1.09e-09 ***

x2 0.14869 0.07396 2.010 0.045047 *

x3 0.04766 0.05619 0.848 0.396821

x4 0.03565 0.05040 0.707 0.479819

Residual standard error: 0.01855 on 411 degrees of freedom

Multiple R-squared: 0.1128, Adjusted R-squared: 0.1042

F-statistic: 13.06 on 4 and 411 DF, p-value: 5.036e-10

> anova(LinearModel.1)

Analysis of Variance Table

Response: y

Df Sum Sq Mean Sq F value Pr(>F)

x1 1 0.015708 0.0157081 45.6259 4.906e-11 ***

x2 1 0.001751 0.0017508 5.0854 0.02465 *

x3 1 0.000359 0.0003589 1.0425 0.30785

x4 1 0.000172 0.0001722 0.5002 0.47982

Residuals 411 0.141499 0.0003443

The R-squared is 0.1128, meaning we are explaining only 11.28% of the variability.   Since Year-3 and Year-4 (x3 and x4) really aren't contributing to the model, we can remove these without losing anything.

> LinearModel.2 = lm(y ~ x1 + x2, data=CorrAll)

> summary(LinearModel.2)

Call:

lm(formula = y ~ x1 + x2, data = CorrPlus)

Residuals:

Min 1Q Median 3Q Max

-0.138896 -0.006842 0.002741 0.011437 0.042763

Coefficients:

Variable Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.39441 0.08165 4.830 1.92e-06 ***

x1 0.40171 0.06333 6.343 5.92e-10 ***

x2 0.16424 0.07279 2.256 0.0246 *

Residual standard error: 0.01854 on 413 degrees of freedom

Multiple R-squared: 0.1095, Adjusted R-squared: 0.1052

F-statistic: 25.38 on 2 and 413 DF, p-value: 4.006e-11

> anova (LinearModel.1, LinearModel.2)

Analysis of Variance Table

Model 1: y ~ x1 + x2 + x3 + x4

Model 2: y ~ x1 + x2

Res.Df RSS Df Sum of Sq F Pr(>F)

1 411 0.14150

2 413 0.14203 -2 -0.00053111 0.7713 0.4631

Career Save Percentage

You might argue that Eric T.'s formula is an approximation of Career Save Percentage. What if you just use Career Save Percentage? To do this, I used both the straight-forward Frequentist approach, which is just observed saves/observed shots and a Bayesian approach, which adjusts the Frequentist result towards the average since average goalies are the most common. The adjustments are generally small. These goalies all have 5 seasons of data or they wouldn't be in the analysis. Most of the differences between the two estimates are in the range of 0.002 to 0.004.

Before I even did this part, I was sure that (1) if the Bayesian approach was better the effect would be negligible. A single season of data isn't enough to see a difference between a 0.920 goalie and a 0.930 goalie. It certainly won't see a difference in a prediction based on a 0.926 goalie versus a prediction based on a 0.923 goalie. (2) Neither approach would be better than a correlation of about 0.3 on average.

Year

Bayes

Frequency

2001

0.0955

0.1900

2002

0.0782

0.1691

2003

0.1828

0.3047

2005

0.2049

0.3879

2006

0.5311

0.4863

2007

0.3261

0.1623

2008

0.3216

0.1481

2009

0.1914

0.0644

2010

0.2789

0.1897

2011

0.2567

0.1055

2012

0.4746

0.4802

Average

0.2674

0.2444

I love it when I'm right.

> LinearModel.4 = lm(y ~ FCareer, data=CorrPlus)

> summary(LinearModel.4)

Call:

lm(formula = y ~ FCareer, data = CorrPlus)

Residuals:

Min 1Q Median 3Q Max

-0.156420 -0.006965 0.002247 0.011310 0.050006

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.2250 0.1378 1.633 0.103

FCareer 0.7494 0.1500 4.995 8.71e-07 ***

Residual standard error: 0.01906 on 414 degrees of freedom

Multiple R-squared: 0.05683, Adjusted R-squared: 0.05455

F-statistic: 24.95 on 1 and 414 DF, p-value: 8.713e-07

> LinearModel.5 = lm(y ~ BAYES, data=CorrPlus)

> summary(LinearModel.5)

Call:

lm(formula = y ~ BAYES, data = CorrPlus)

Residuals:

Min 1Q Median 3Q Max

-0.152609 -0.006546 0.002930 0.011241 0.042726

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.09502 0.17877 -0.532 0.595

BAYES 1.09724 0.19455 5.640 3.16e-08 ***

Residual standard error: 0.01891 on 414 degrees of freedom

Multiple R-squared: 0.07135, Adjusted R-squared: 0.06911

F-statistic: 31.81 on 1 and 414 DF, p-value: 3.156e-08

Mistaking Randomness for a Pattern

Finally let's look at a simulation. Lets look at 600 goalies. Each goalie faces 1200 shots in each of 2 seasons. Plotting Year1 against Year2 we get

Corrsimall_medium

Pearson's product-moment correlation

data: CorrSim$Year1 and CorrSim$Year2

t = 7.3861, df = 598, p-value = 5.103e-13

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.2140495 0.3608331

sample estimates:

cor

0.2891399

The correlation is about what Eric T's method gets. But this is just a random number generator. Here's where the correlation comes from. I think the NHL goalie population is about 10% elite (0.930) goalies, 80% average (0.920) goalies, and 10% below-average (0.910) goalies. This sample population mirrored that distribution. Isolating the average goalies:

Corr920_medium

Pearson's product-moment correlation

data: Corr920$Year1 and Corr920$Year2

t = -1.3745, df = 478, p-value = 0.1699

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-0.15139486 0.02690697

sample estimates:

cor

-0.06274459

Nothing there. Now the others:

Corr930_medium

Pearson's product-moment correlation

data: Corr930$Year1 and Corr930$Year2

t = 10.9827, df = 118, p-value 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.6093943 0.7895948

sample estimates:

cor

0.7109766

The talent distribution in the population is creating the apparent correlation. We're not predicting anything beyond the reality that an elite goalie is likely to wind up with a higher save percentage than a below-average goalie.

So how to predict goalies?

You can use Eric T.'s formula. You can use 0.336 + 0.397*(Year-1) + 0.149*(Year-2) + 0.48*(Year-3) + 0.36*(Year-4). You could use 0.394 + 0.402*(Year-1) + 0.164.*(Year-2). You could use the Frequentist Career Save Percentage (specifically, 0.225 + 0.749*FCareer). You could use the Bayesian Career Save Percentage (specifically, -0.095 + 1.097*Bayes). Hell, you could just use last year's save percentage. It really doesn't matter. None of these formulas work very well.

The problem is a signal to noise issue. These formulas are trying to predict the differences between goalies. Unfortunately, the differences are much smaller than the random variations from season to season.