We study the potential value to stakeholders of probabilistic long-term forecasts, as quantified by the mean information gain of the forecast compared to climatology. We use as a case study the USA Climate Prediction Center (CPC) forecasts of 3-month temperature and precipitation anomalies made at 0.5-month lead time since 1995. Mean information gain was positive but low (about 2% and 0.5% of the maximum possible for temperature and precipitation forecasts, resp.) and has not increased over time. Information-based skill scores showed similar patterns to other, non-information-based, skill scores commonly used for evaluating seasonal forecasts but tended to be smaller, suggesting that information gain is a particularly stringent measure of forecast quality. We also present a new decomposition of forecast information gain into Confidence, Forecast Miscalibration, and Climatology Miscalibration components. Based on this decomposition, the CPC forecasts for temperature are on average underconfident while the precipitation forecasts are overconfident. We apply a probabilistic trend extrapolation method to provide an improved reference seasonal forecast, compared to the current CPC procedure which uses climatology from a recent 30-year period. We show that combining the CPC forecast with the probabilistic trend extrapolation more than doubles the mean information gain, providing one simple avenue for increasing forecast skill.

Long-term forecasts offer prospects for enhancing climate readiness and assisting adaptation in sectors including agriculture, fisheries, municipal water supply, hydropower, tourism, and public health [

Scoring rules, which provide a metric of skill for a forecast system based on comparing previously issued forecasts to what actually occurred, may be used both to compare different forecast systems and to test improved versions of forecast systems, such as different weightings of ensemble members or methods of bias adjustment [

Information theory offers simple, general metrics of forecast performance (as information gain (IG) relative to a “no-skill” prior probability distribution). Information gain as a forecast skill score, some of whose advantages are already mentioned by Good [

While information measures are not new to meteorology applications, they have not been widely applied to long-range forecasts. Acceptance for IG as a metric for scoring and optimizing long-range forecasts therefore requires systematic comparison against other commonly employed measures, such as the correlation coefficient, mean square error, the Brier skill score, and the ranked probability skill score (RPSS).

An additional consideration for scoring seasonal forecasts is what “no-skill” baseline to compare them against. Generally, a climatological mean or probability distribution from some past reference period is used as the baseline. However, in the presence of trends, climatology can give biased estimates of the expected value or probability distribution of the climate variable being forecast [

The seasonal forecast product we will evaluate here is the 3-month outlook at 0.5-month lead from the Climate Prediction Center (CPC) of the US National Weather Service (

In this paper, our aims are to (1) estimate the information gain of a seasonal forecast and compare IG to other metrics previously used to evaluate seasonal forecasts, and (2) use trend estimation to better evaluate seasonal forecast skills in a shifting climate and suggest avenues for improving them.

Information metrics for scoring forecast skill are straightforward to interpret and generalize across the type of variable being forecast (e.g., discrete or continuous). If we consider a situation with

Alternatively, denote the forecast probability distribution as

In practice, assessment of the skill of a forecast system can be based on IG averaged over a large number of forecasts:

To facilitate comparing mean IG to other measures of forecast skill, it may be convenient to normalize it by the maximum possible IG, which would be

Particularly for seasonal forecasts, where because of modest inherent predictability the forecast is often similar to climatology, the following decomposition of information gain may offer insight:

In this new decomposition, the first term (Confidence), which is independent of the outcome, is the difference between the entropy of the reference and forecast distributions; it is high if the issued forecast has much lower entropy (is much more confident) than the reference. The second term (Forecast Miscalibration) should average zero for a well-calibrated forecast; a tendency to positive values suggests an underconfident forecast (since the outcomes forecasted as likely were even more likely to occur than was forecasted), while a tendency to negative values suggests overconfidence (outcomes that were forecasted as likely in fact did not occur as often as expected). The third term (Climatology Miscalibration) is independent of the forecast issued and is zero if the reference distribution is equal chances. This Confidence-Forecast Miscalibration-Climatology Miscalibration decomposition complements the reliability-resolution-uncertainty decomposition of Weijs et al. [

The formula for

Finally, we give the formulas for alternative metrics for evaluating probabilistic seasonal forecasts that we will compare with

This can be considered as a second-order polynomial approximation to

The cumulative variant of

The Heidke score [

Clearly, all nuance conveyed by the confidence of a probabilistic forecast is lost in HSS; a forecast vector

On the third Thursday of each month since the end of 1994, CPC releases forecast probabilities of high, low, or near-normal temperature and precipitation over the next 3 months on a 2° grid for the coterminous US (232 grid points); for example, January-February-March mean temperature and precipitation are forecast in mid-December. The 3 categories of high, low, and near normal are defined as thirds of a climatological distribution based on a recent 30-year period, so that a priori they are said to have equal chances; this was taken as our reference probability distribution

Given significant recent trends in climate quantities that have led to substantial shifts in probability distributions compared to past climatology, we considered improving on the equal-chances reference forecast vector by updating the climatological probability distribution each year since 1995 based on observations as

Thus, the Trend forecast

To explore whether CPC forecasts

Any measure of forecast skill is expected to vary from event to event. In order to declare a forecast system as having positive average skill, or one forecast system as having more skill than another, it is necessary to estimate the uncertainty of the average skill, with the set of available forecast-observation pairs viewed as samples from the stochastic forecast and climate systems [

Figure _{1} averaged over all grid points for each month. The confidence of the trend-based probability distribution is zero for the first year and gradually increases as more years of history become available to allow estimation of trends, generally surpassing the CPC forecast over recent years. The CPC forecasts’ confidence is more variable, peaking on occasions such as strong El Niño episodes when the forecasters believed that seasonal climate had more predictability. The combined forecast, as expected from its construction, consistently has more confidence than either of its components.

(a) Mean confidence score for the temperature forecasts (markers; lines are smoothed based on local linear regression with a bandwidth of 24 months). (b) Mean information skill score for the temperature forecasts, (c)-(d) same as (a)-(b), but for precipitation forecasts. The color scheme is blue for the CPC forecast, green for the Trend forecast, and red for the Combined forecast.

Temperature forecasts

Temperature forecasts

Precipitation forecasts

Precipitation forecasts

ISS is more variable than Conf since it depends on observations as well as on the forecast system; for many months, ISS is negative, meaning that the forecasts performed worse than an equal-chances prediction (Figures

Histogram of mean monthly

Same as Figure

Overall average values for Conf and ISS are given in Table

Average confidence and information gain for forecasts and trend extrapolation.

Temperature | Precipitation | |||||

CPC | Trend | Comb | CPC | Trend | Comb | |

| ||||||

1995–2012 | ||||||

Conf | 0.0144 | 0.0227 | 0.0463 | 0.0068 | 0.0072 | 0.0150 |

ISS | 0.0236 | 0.0215 | 0.0332 | 0.0031 | 0.0071 | 0.0090 |

2003–2012 | ||||||

Conf | 0.0164 | 0.0326 | 0.0627 | 0.0060 | 0.0094 | 0.0170 |

ISS | 0.0241 | 0.0402 | 0.0461 | 0.0047 | 0.0130 | 0.0159 |

It is of interest to see what regions have accounted for the CPC forecast and Trend confidence and skill. Figures

Mean confidence score for the CPC forecasts of (a) temperature and (b) precipitation. (c)-(d) Same, but for Trend forecast. (e)-(f) Same, but for Combined forecast.

CPC temperature forecasts

CPC precipitation forecasts

Trend temperature forecasts

Trend precipitation forecasts

Combined temperature forecasts

Combined precipitation forecasts

Mean information skill score for the CPC forecasts of (a) temperature and (b) precipitation. (c)-(d) Same, but for Trend forecast. (e)-(f) Same, but for Combined forecast.

CPC temperature forecasts

CPC precipitation forecasts

Trend temperature forecasts

Trend precipitation forecasts

Combined temperature forecasts

Combined precipitation forecasts

As another depiction of the geographic variation in forecast skill, Figure

Mean information skill score by quarter of the USA for forecasts of (a) temperature and (b) precipitation.

Temperature

Precipitation

Table

Skill scores for forecasts and trend extrapolation.

Temperature | Precipitation | |||||
---|---|---|---|---|---|---|

CPC | Trend | Comb | CPC | Trend | Comb | |

ISS | 0.0241 | 0.0402 | 0.0461 | 0.0047 | 0.0130 | 0.0159 |

RISS | 0.1341 | 0.2518 | 0.3506 | 0.0012 | 0.0523 | 0.0505 |

BSS | 0.0274 | 0.0458 | 0.0529 | 0.0053 | 0.0152 | 0.0185 |

RPSS | 0.0407 | 0.0725 | 0.0825 | 0.0080 | 0.0226 | 0.0276 |

HSS | 0.2163 | 0.3705 | 0.3714 | 0.0895 | 0.2980 | 0.3071 |

Fitting seasonal autoregressive models to the skill score time series showed that there was no significant trend in the CPC forecast skill either since 1995 or since 2003, regardless of the metric chosen (not shown). The mean CPC forecast skill for precipitation was not significantly different from zero under all metrics except HSS while the mean Trend skill was greater than zero for the period since 2003 under all metrics; for temperature, both CPC and Trend had significant skill under all metrics (not shown).

Many of our results—for example, that skill for precipitation is lower than that for temperature and that CPC does not optimally account for trends—are largely consistent with previous assessments of the CPC seasonal forecasts [

If there is specific reason to believe either that no trend in the variable being forecast exists or that trends observed over recent years have now reversed, then this information should be incorporated into the reference forecast instead of relying on trend extrapolation blindly. For example, while temperature increased rather linearly since the 1970s [

Once seasonal forecasts do incorporate trends appropriately, the time-varying trend extrapolation may in fact be a more appropriate reference probability distribution

The comparisons shown here suggest that there is great, relatively systematic variation in the skill score generated by different metrics, even when normalized to a common scale (where 0 corresponds to no skill and 1 to a perfect forecast). ISS is generally the more stringent skill score; for example, HSS averages more than a factor of 10 greater than ISS for the CPC seasonal forecasts. This suggests the need for more exploration of how well the different skill scores correspond to user requirements; in general, no single skill score can be expected to capture all aspects of forecast performance, which can only be completely described by the full joint probability distribution of forecasts and observations [

Information gain measures show that at least the CPC seasonal temperature forecast has measurable skill, but that for it and the precipitation forecast the skill can be at least doubled by adjusting the probability distribution based on recent trends. Comparing seasonal forecasts to probabilistic trend extrapolation and comparing confidence scores to information gain (where the two should on average be equal for a well-calibrated forecast) are tools introduced here that should help improve seasonal forecasts substantially.

The authors gratefully acknowledge support from NOAA under Grants NA11SEC4810004 and NA12OAR4310084. All statements made are the views of the authors and not the opinions of the funding agency or the US government.