The rapid industrial development has led to the intermittent outbreak of pm2.5 or haze in developing countries, which has brought about great environmental issues, especially in big cities such as Beijing and New Delhi. We investigated the factors and mechanisms of haze change and present a longterm prediction model of Beijing haze episodes using time series analysis. We construct a dynamic structural measurement model of daily haze increment and reduce the model to a vector autoregressive model. Typical case studies on 886 continuous days indicate that our model performs very well on next day’s Air Quality Index (AQI) prediction, and in severely polluted cases (AQI ≥ 300) the accuracy rate of AQI prediction even reaches up to 87.8%. The experiment of oneweek prediction shows that our model has excellent sensitivity when a sudden haze burst or dissipation happens, which results in good longterm stability on the accuracy of the next 3–7 days’ AQI prediction.
Industry of developing countries is mainly centralized around big cities, accompanied by a large population, consumption, and pollution. Together with Tianjin city and Hebei province, Northern China has become one of the most prosperous and polluted areas on Earth. By 2013, the transient population of Beijing was 37.5 million, and the intermittent outbreak of air pollution has greatly impacted every citizen’s life: physiological diseases [
This paper presents an AQI prediction model of Beijing based on time series analysis. We collected Beijing’s AQI data of 29 continuous months since 2013 and constructed a dynamic structural prediction model. Statistical methods are used to obtain the maximum likelihood estimation of the prediction model. And both shortterm and longterm experiments are carried out to test the accuracy and robustness of our model.
The remainder of this paper is organized as follows. In Section
Generally, pm2.5, or haze, is born mainly through anthropogenic factors [
Many researches use backpropagation neural network as the simulation model [
Considering the above points, this paper presents a new AQI prediction model integrated with natural factor, humanity factor, and selfevolution factor.
The change of daily pm2.5 concentration depends on two factors: daily overall production of pm2.5 by human activities
Thus, the daily net growth of pm2.5
Parameters
In order to facilitate the research and modeling process, we have proved that this model could be reduced to a vector autoregressive model.
Formula (
Assume that there exists sequence autocorrelation in formula (
The next work is to find the most satisfying value of
Assume that we use previous
In the substitution process, many assumptions are neglected. But the ordinary least square method (OLS estimation) should not be used in the estimation of formula (
The government could make policies to control pm2.5 production of industry to obtain “satisfying” daily production of pm2.5; that is,
The net growths of previous days’ pm2.5 and policy control index also have an effect on daily accumulation of pm2.5:
Analogized from formulas (
In
This is the standard form of vector autoregressive model. So it is proved that our prediction model (formula (
The regression parameters of our haze prediction model can be obtained as follows.
Let
The dynamic structural system (formula (
Assume that the disturbance terms are not sequence correlated or correlated to each other, which means
Let
Suppose
Finally,
Above all, the prediction model of Beijing AQI has considered factors including industry emission and policy control, together with the chemical changes of previous days’ pollution accumulation and the diffusion conditions. This model also takes the correlations between these factors into consideration and introduces time series haze features into the dynamic structural model. The policy control index is simulated by the record of 4 severe haze episodes during this period. The diffusion is evaluated by weather record of daily wind power.
We collected the daily AQI and daily weather information from 28 Oct. 2013 to 31 Mar. 2016. This complete sequence is used to test the accuracy of the prediction model. The next day’s AQI prediction experiment (Section
The next day’s weather forecast information is applied in next day’s AQI prediction. The observed and predicted daily mean AQI in Beijing are illustrated in Figure
(a, b, c) Next day’s AQI prediction on 886 continuous days.
Figure
All the 15 outliers in Figure
Date of outlier  Label 

Nov. 2, 2013 

Dec. 7, 2013  ? 
Dec. 25, 2013 

Feb. 14, 2014  ? 
Feb. 25, 2014  ? 
Mar. 26, 2014  ? 
Oct. 10, 2014 

Oct. 11, 2014 

Nov. 19, 2014  ? 
Nov. 20, 2014  ? 
Nov. 30, 2014 

Dec. 9, 2014  ? 
Jan. 4, 2015  ? 
Jan. 15, 2015 

Mar. 7, 2015 

(a) Daily AQI of the 886 days. (b) Annual AQI from 2013 to 2016.
The pie chart in Figure
The deviation of predicted and observed AQI.
Prediction accuracies of different air qualities.
In the longterm prediction, we use history haze data sequence and weather forecast information to predict the next 7 days’ AQI. A sample is correctly predicted if the deviation of a sample is less than 20% or the predicted air quality level matches the observed level. From 26 Dec. 2015 to 31 Mar. 2016, we predict the AQI in the next 7 days and check the accuracy of
The accuracy of longterm AQI prediction.
Three haze episodes in Jan. 2016.
Three haze episodes in Feb. 2016.
This paper presented a dynamic structural model to predict Beijing’s daily AQI. This model integrated natural factor, humanity factor, and selfevolution factor into the time series model. This dynamic structural measurement model of daily haze increment is proven to be a vector autoregressive model. Experiments reflected two highlights of this model. First, our model is very sensitive to and performs very well on predicting sudden changes of AQI, including both outbreaks and diffusions. Second, the model has great robustness on the task of longterm AQI prediction. Lastly, limited by the coarse time granularity, our model sometimes “foresees” but never delays or misses any sudden changes of haze.
Many researchers use simple backpropagation neural network to accomplish nonlinear prediction models. But since methods based on time series are proven to be effective in haze prediction modeling, we believe that recurrent neural networks give better performances in such a prediction task. Although the related factors are limited in existing models, the overfitting problem should still be concerned, because, in longterm prediction, a deviation could spread and be exaggerated in the following days’ predictions.
The authors declare that they have no competing interests.
This research was supported by the National Natural Science Foundation of China under Grant 71271209, Beijing Municipal Natural Science Foundation under Grant 4132052, and Humanity and Social Science Youth Foundation of Ministry of Education of China under Grant 11YJC630268.