This paper develops and empirically compares two Bayesian and empirical Bayes spacetime approaches for forecasting nextday hourly groundlevel ozone concentrations. The comparison involves the Chicago area in the summer of 2000 and measurements from fourteen monitors as reported in the EPA's AQS database. One of these approaches adapts a multivariate method originally designed for spatial prediction. The second is based on a statespace modeling approach originally developed and used in a case study involving one week in Mexico City with ten monitoring sites. The first method proves superior to the second in the Chicago Case Study, judged by several criteria, notably root mean square predictive accuracy, computing times, and calibration of 95% predictive intervals.
This paper compares two methods for temporally forecasting nextday hourly groundlevel ozone concentrations over spatial regions. Software for implementing both methods along with demo files can be downloaded from
These forecasts are needed to forewarn susceptible groups of high ozone concentrations that are associated with acute health effects. Such effects are well documented in the air quality criterion document (Ozone [
One general method for making such forecasts relies on the fusion of measured hourly ozone concentration values and simulated values obtained from chemical transport models (CTMs) such as CMAQ. Two papers [
The first method in this paper denoted by M1 adapts a multivariate method developed for modeling spacetime fields [
The second method denoted by M2 uses a method originally developed for modeling hourly ozone concentrations in Mexico City [
The main finding in this paper is that in the case study M1 outperforms M2 in a number of ways. First is its computational efficiency. To run the M2 approach, it often took about a week or so to get the results, while M1 only took about ten to twelve hours at the same Linux server. Thus, M2 would not be suitable for making 24 ahead forecasts, while M1 running on a faster processor could be used for that purpose. We also found that M1 produced more accurate forecasts than M2, as measured by their rootmeansquaredprediction errors. Moreover, M1’s predictive error bands proved to be better calibrated. In other spacetime domains, a similar assessment would have to be made to select a forecasting procedure, and M2 may be superior in some. Overall, we believe that the value of this paper lies in the guidance on how that assessment could be made and the source of software that can be used for it.
The layout of this paper will now be described. Section
This section presents our two temporal forecasting approaches. Although both are general and can be used in other contexts, we develop them as methods for forecasting an hourly response tomorrow given data up to today. The measured value of the response is available and serves as a “test value” at each of
The general approach [
The general theory involves a geographical region that includes
Validating the modeling assumptions above will usually require a transformation of the random responses, a square root transformation in this paper's case study. Then systematic components such as the temporal trend over the whole regions will need to be removed. These can be accurately inferred from the typically large dataset formed by aggregating the data over all sites and times. Finally, something needs to be done to eliminate autocorrelation in the temporal sequence of responses. For example, the temporal series can often be filtered using a regional time series model without sitespecific parameters. However, our relative abundance of data leads us here to a different approach described in detail in the next subsection, that splits the transformed, detrended residuals into separate, disjoint subsequences of responses, which are separated widely enough in time as to be uncorrelated and hence independent under our Gaussian sampling model. In our experience, the residuals obtained after these steps have been taken usually satisfy the model assumptions above, and these comprise the response vectors in M1. So the model above can then be applied to each subsequence and type II maximum likelihood estimators found for the hyperparameters. These can subsequently be averaged across the subsubsequences to get an overall estimate. While this approach would be less efficient than a fulldata approach under a correctly specified model, it avoids the risk of model misspecification in complex situations like that of the case study. The forecasting model developed below can then be applied, and the preliminary steps above reversed, to get the forecasts back on the scale of the raw data.
To elaborate on our distributional assumptions, the GIW prior for
Given the observations at the gauged sites (i.e.,
Deriving that forecasting model requires a general result that concerns a sequence of
Let
Conditional on the hyperparameters
This theorem gives the joint predictive distribution for
For expository simplicity, we describe the general method M1 in terms of the goal of forecasting ozone concentrations at a specific hour on Day 121 and each of
To begin, we follow the standard practice of transforming the hourly data by taking their square roots to achieve a more nearly Gaussian data distribution [
The next step in developing the forecast model would generally require the removal of any systematic, regional components in the series. In particular, it is necessary to learn which covariates/predictors to include in the design matrix
Finally, we need to address the autocovariance structure. While the responses are primarily an AR(2) time series after removing their diurnal pattern [
In Case
However, Case
Note that in both cases, the future response or responses in Theorem
For that we need some notation. Let
The two cases referred to above are as follows.
Hyperparameter estimates for its predictive distribution conditional on observed data are found first for the odd blocs
Construction of the predictive distribution of the response at hour
To get the joint predictive distribution of responses at these gauged sites for Day 121's last hour, let
For bloc
For hour
Notice that
This subsection generalizes M1 to get a method that provides an
Here the odd bloc responses are
Notice that the total number of observations in data submatrices can be different for each
As in Section
Here the odds block responses are
We also let
Given the final estimates in Section
From Theorem
An alternative approach to M1, through dynamic linear modeling, can also be used for forecasting and would seem an obvious choice, being an amalgamation of statespace time series models. Let
Let
Thus, the measurement and state equations of the DLM are given by
Given the distribution of the state parameters at the last time point,
This section implements the forecasting methods in the last section for one Chicago summer (from May 1 to August 31) using data for that urban area taken from the EPA's AQS database (2000). These extracted data come from fourteen irregularly distributed monitoring stations measuring hourly ozone concentrations in parts per billion (ppb), which, to assure the validity of our Gaussian model assumptions, are squareroottransformed as noted in the last section. Each has few missing values under the EPA 1997 Standard in 1997 (i.e., 80 parts per billion for the eighthour ground level ozone concentrations) during the overall time span across all available sites in this region.
To assess the model's performance for temporal forecasting, 14 sites are selected as “gauged” sites (i.e.,
Geographical locations for the Chicago AQS database (2000), where the latitude and longitude are measured in degrees. (G: gauged sites; UG: ungauged sites).
To explore these data further, weekday and hourly effects were computed for each site by averaging the transformed hour values over each of the seven weekdays over the whole summer. We found these effects to be very similar from one gauged site to the next. Thus, since “bloc” is the unit of time
The two methods considered in this paper were applied at all fourteen gauged sites (GSs for short) to predict the twentyfour left out (test) and squareroottransformed hourly observations on Day 121. For all sites, plots showing the observations during the six days leading up to test Day 121, as well as the twentyfour forecasts by both methods for that day, may be seen in Dou et al. [
To begin, Figure
The observed squareroot of ozone concentrations (
In contrast to the case of spatial prediction [
The answer seems to lie in the fact that the random coefficients for the twelve hour components of variance in the forecast at any monitoring site, is not as uncertain in forecasting at that site than they are in spatial prediction at other sites, which may be a substantial distance from the monitoring sites. Consequently these components, although small, would have large posterior variances in the predictor than these components of the forecaster at one of the monitoring sites. In other words, much more information is available in the data leading up to the last day for the forecasts for the test values at that site than is available at a remote and unmonitored sites.
Incidentally, the lower bounds for M2 forecasts can go below zero so in practice would need to be truncated. Moreover, few of the test values lie within the 95% credibility band.
Huerta et al. [
The rootmeansquarepredictive error (RMSPE) of the onedayahead prediction at fourteen gauged sites by using M1 and M2. M1 dominates M2 in all but 1 case.
Gauged site  RMSPE (M1)  RMSPE (M2) 

1  0.71  3.06 
2  0.63  2.72 
3  0.63  2.16 
4  0.70  2.06 



6  0.86  1.52 
7  0.47  2.05 
8  1.00  2.73 
9  0.77  2.65 
10  0.71  2.35 
11  0.70  3.04 
12  0.67  1.85 
13  0.85  3.50 
14  1.04  2.37 
Huerta et al. [
We do not see in any of the plots in Huerta et al. [
A summary very similar to that above for GS 1 also applies to the omitted figures for GS's 2–4, 7–13, although for GS 11 M1 forecasts diverge from the test values over the final four hours, and for GS 13 M1 underestimates those test values in the middle of the day.
Figure
The observed squareroot of ozone concentrations (
Summaries similar to that for GS 5 applies to GS 6 with the exception that M1 forecasts track the test value series quite well unlike M2 and to GS 14, where both methods underestimate the test values, M1 being closer overall than M2.
Figure
The width of the 95% pointwise predictive intervals of the onedayahead prediction at 14 gauged sites using M1.
Figure
The width of the 95% pointwise predictive intervals (PIs) of the onedayahead prediction at 14 gauged sites using M2.
Boxplots of the coverage probabilities using M1 and M2 at the 95% nominal level.
Table
For forecasting groundlevel hourly ozone concentrations in a Chicago summer, M1 seems better than M2. It seems more accurate, and its 95% predictive credibility interval is better calibrated. However, in any practical application M1 and M2 would need to be assessed in the same manner as in this paper before making a final selection. It should be noted that a new model also based on the dynamic linear approach has been proposed by Sahu et al. [
Those methods M1 and M2 are two quite different approaches to modeling spacetime process and comparing and contrasting them at a more fundamental level seems worthwhile. To begin, both are quasiBayesian models in that they rely on some preliminary data analyses. Thus the diurnal cycles are identified for the M2 mean function, while regional nonsitespecific weekday effects are found for M1. Both methods can then incorporate predictors or covariates in their parametric mean functions with random coefficients as well as reflect diurnal patterns of variation. M1 proceeds with this in two steps. First, regional timedependent covariates or predictors are identified for the construction of the design matrix
Both approaches put spatial covariance structures on their mean models as well as on the residuals. In contrast to M2, M1 does not require a nonstationary spatial covariance structure, and the form of the spatial covariance matrix is completely unspecified at level one of the Bayesian hierarchy. This is not important for the Chicago analysis where the spatial ozone field is quite flat, but we believe it would be an important difference between the models in say Los Angeles or Seattle, where M1 would be favored. M2 prescribes its temporal correlation structure through the structure of its mean function, notably a random walk model for its model coefficient vector. In contrast, M1's 24hour bloc covariance matrix is unspecified at level one of the hierarchical model, leaving the data a big role in determining its form. However, this feature comes at the price of an assumption that the 24 autocovariance matrix is separable from the spatial covariance. Moreover, the covariance is constant over time. Both of these assumptions are limitations of M1.
Both M1 and M2 rely on both autocorrelation as well as temporal correlation for forecasting next day ozone levels. We believe responses will be somewhat autocorrelated from day to day and that feature can be exploited to enhance the forecasting performance. As formulated, M2 does borrow that additional strength, where M1 loses in the way we have implemented is parent, the BSP, by dropping a day to avoid having to formulate a multivariate time series model for the vectors of daily bloc responses. However, this is not strictly necessary. The more general version of the BSP approach does allow for that correlation, and in principle we would have estimated the hyperparameters that approach, suffered the consequences of possible misspecification and increased the computational burden of implementing M1. Thus M1 was formulated under the assumption of uncorrelated responses between days, unlike M2 which makes no such assumption, with the goal of ensuring timely 24 ahead ozone forecasts.
M2 has a much more general parent, in the dynamic linear model (DLM) and undoubtedly other implementations of the DLM could be made that retained its positive features while overcoming some of the limitations of M2 noted above. For example, a nonstationary spatial covariance could undoubtedly be used. As well the random walk model which has serious limitations could be replaced by say a more reasonable model like an AR(1), albeit with an added parameter burden. That would in turn further restrict the number of monitoring sites it could realistically handle in an urban area. As it stands, M1 computational efficiency enables it to handle a much larger number of sites than M2 in an urban area such as the greater Los Angeles area, which has 30 sites well beyond the reach of M2.
Although any ozone forecast for hourly concentrations 24 hours in advance cannot be much better than the baseline estimate, we have included Case
We have not considered the realistic case where only a limited number of hours of Day 120 data are available. That is because this case would be just a formalistic extension of methods M1 and M2.
Finally, we would emphasize that the results in Section
Overall, we have found that for forecasting Chicago's next day ozone concentration levels, M1 would be more practical and more accurate than M2. With its wellcalibrated forecast intervals, it seems a promising methodology for practical application.
Let
The predictive distribution of
The result is straightforward by Theorem
Hence, we have
We have
Let
(i) The result is straightforward by Theorem
(ii) denote
We first decompose
Consequently, we have
Therefore, we have
This work was partially supported by funding from the Pacific Institute of the Mathematical Sciences as well as the Natural Science and Engineering Research Council of Canada.