Design of Deep Belief Networks for Short-Term Prediction of Drought Index Using Data in the Huaihe River Basin

,


Introduction
With the global environmental degradation and water resource shortages, droughts are becoming increasingly eye catching and have aroused the attention of many countries and regions. Drought is considered the most complex but least understood of all natural hazards, affecting more people than any other disasters 1 . In recent years, drought disasters continuously happened and caused serious impact on production and life in China. The losses caused by drought ranked the first in all natural hazards in China 2 . For example, in the extreme drought in Southwest China during 2009 to 2010, five provinces and cities suffered droughts which have seriously threatened people's life and economic production activities. The Chinese northern region also suffered severe drought in 2011. Long and 2 Mathematical Problems in Engineering severe droughts have direct impacts on industrial production, people's lives, and ecological environment and even lead to desertification and other natural disasters. Droughts have become serious constraints to the sustainable development of Chinese society and economy 3,4 . Drought prediction is an important content in the planning and management of water resource systems of a river basin. How to effectively monitor and forecast droughts has become the research focus, which can help to take effective strategies and measures to mitigate the damages of droughts. There are some forecasting methods used in drought prediction fields. Lohani and Loganathan used a nonhomogeneous Markov chain model to characterize the stochastic behavior of drought, and an early-warning system in the form of a decision tree enumerating is proposed for drought management 5 . Jia et al. established a grey-time series combined method GTCM to predict annual precipitation of Huangcun Meteorological Observation, Daxing county, Beijing 6 . Yang et al. proposed a chaotic Bayesian method based on multiple criteria decision making to forecast nonlinear hydrological time series, which can be applied in drought forecast 7 . The predictability of drought severity from spatiotemporal varying indices of large-scale climate phenomena was studied by integrating linear and nonlinear statistical data models, and the model was used for the Murray-Darling Basin MDB in Australia 8 . Meteorological droughts were characterized using the standardized precipitation index SPI developed by McKee et al. 9 . Drought classes based on standardized precipitation index SPI values were derived by Markov chain model in Alentejo, Southern Portugal 10 . Peng et al. used weighted Markov chain to predict the future drought index, weighted by the standardized self-coefficients. The drought indexes of Nanjing city from 1959 to 2004 were a specific application with this method and satisfactory results were obtained 11 . SPI is calculated from monthly precipitation data collected from 36 weather stations in Guanzhong plain and Weibei tablelands, and the Markov chain model with weights was applied to predict SPI drought intensity by using standardized self-coefficients as weights 12 . The vegetation temperature condition index VTCI based on remote sensing data is used for drought monitoring. The ARIMA models were developed to simulate the VTCI series and be used in Guanzhong Plain in China 13 . The loglinear modeling for three-dimensional contingency tables was used for short-term prediction of drought severity classes. The results show that three-dimensional loglinear modeling of monthly drought class transitions is able to capture the trends for both drought initiation, establishment, and drought dissipation 14 . Mishra and Desai compared linear stochastic model ARIMA/SARIMA , recursive multistep neural network RMSNN , and direct multisteps neural network DMSNN for drought forecasting by using standardized precipitation index SPI series as drought index in the Kansabati River Basin in India 15 . Traditionally, forecasting research and practice has been dominated by conventional statistical methods. Recently, the study of long range or long memory has received many attentions in forecasting. Hurst developed a test for long-range dependence and found significant long-term correlations among fluctuations in the Nile's outflows and described these correlations in terms of power laws 16 . Mathematical models with long-range dependence were first introduced to statistics by Mandelbrot and his workers 17-19 . Longrange dependence is often encountered in practice, not only in hydrology, geophysics, and finance, but also in all fields of statistical applications 20-24 . Pelletier and Turcotte present power spectra of time-series data for tree ring width chronologies, atmospheric temperatures, river discharges, and precipitation averaged over hundreds of stations worldwide. They thought that long-range persistence can have a dramatic effect on the likelihood of severe hydrologic drought and computed recurrence intervals for droughts of different magnitudes, In this paper, we propose a deep belief network DBN for short-term prediction of drought index. The aims of this study are to present and evaluate the performance of DBN model as a drought prediction method. This model was applied to forecast drought index using standardized precipitation index SPI series in the Huaihe River Basin, China. The results are compared and analyzed with BP neural network for demonstration of the validity of the DBN model. The remainder of the paper is organized as follows. In Section 2, the standardized precipitation index SPI and BP neural network are introduced, and the deep belief networks DBN model for drought index prediction is proposed. In Section 3, a case is studied, and discussions are arranged. Finally in Section 4, the main conclusions and a discussion for future work are given.

Standardized Precipitation Index (SPI)
The SPI was formulated by Mckee et al. of the Colorado Climate Center in 1993. The purpose is to assign a single numeric value to the precipitation which can be compared across regions with markedly different climates 11 . The SPI is an index based on the probability of precipitation for any time scale. Technically, the SPI is the number of standard deviations that the observed value would deviate from the long-term mean, for a normally distributed random variable. The SPI can be computed for different time scales and can provide early warning of drought and help assess drought severity. The SPI is a probability index that considers only precipitation, while Palmer's indices are water balance indices that consider water supply precipitation , demand evapotranspiration , and loss runoff . So, SPI is less complex than PDSI 35 . Now, the standardized precipitation index SPI is widely accepted and used throughout the world 36 . The computing procedure of the SPI value is as follows 37, 38 . Assuming that a precipitation series of some time scale is x, then its probability density function of Γ distribution is expressed as where Γ γ is a gamma function and Γ γ ∞ 0 x γ−1 e −x dx. β and γ are the shape parameter and the scale parameter, respectively, and β > 0, γ > 0. The precipitation value x > 0.
The shape and scale parameters can be estimated by the maximum likelihood method as follows: where A ln x − 1/n n i 1 ln x i , n stands for the number of precipitation observations, x i are the samples of the precipitation data, and x is the mean of these samples.
The gamma distribution is not defined for x 0; however, the actual precipitation can be 0. Therefore, cumulative probability of precipitation for a certain time scale can be calculated using the following formula 38, 39 : and t x/ β. u is the probability of zero precipitation and can be calculated as m/n. m is the total number of precipitation series, and n is the number of zeros in the precipitation series.
The cumulative probability, H x , is then transformed to the standard normal random variable with mean as zero and variance as one. Following Edwards and Mckee 40 and Hughes and Saunders 41 , SPI can be obtained as follows:  where

2.5
In 2.4 , the c i and d i are parameters during the computing process and c 0 2.515517, c 1 0.802853, c 2 0.010328, d 1 1.432788, d 2 0.189269, and d 3 0.001308.
According to SPI, drought can be classified. When the value of SPI is continuously negative, a drought event occurs. The event ends when the SPI becomes positive.

Backpropagation Neural Network (BPNN)
The BP neural network is a kind of multilayer feed-forward networks with training by error backpropagation algorithm 42 . It is a kind of supervised learning neural network, the principle behind which involves using the steepest gradient descent method to reach any small approximation. A general model of the BP neural network has a structure as described in Figure 1.
In Figure 1, there are three layers contained in BP: input layer, hidden layer, and output layer. Two nodes of each adjacent layer are directly connected, which is called a link. Each link has a weighted value presenting the relational degree between two nodes. The algorithm of BP neural network is to input the training samples from the input layer and then obtain the calculation output through the operation of corresponding thresholds, functions, and connection weights between nodes 42, 43 . The node function has usually selected S-type function as follows: The Q in the equation is a Sigmoid parameter which is the form of adjusted activation function, and the specific algorithm is introduced in 44 . The output error is obtained by the comparison between the calculation output and the sample output. If the error does not meet the requirements, the network weights and thresholds usually are adjusted along the 6 Mathematical Problems in Engineering · · · · · · · · · · · · Hidden nodes Hidden nodes Visible nodes (output) Visible nodes (input) .

Deep Belief Networks
A deep belief network DBN is a generative model with an input layer and an output layer, separated by many layers of hidden stochastic units. The multilayer neural network can efficiently be trained by composing RBMs using the feature activations of one layer as the training data for the next. Figure 2 shows an example of a DBN structure 28 . Usually a DBN consists of two kinds of different layers. They are visible layer and hidden layer. Visible layers contain input nodes and output nodes, and hidden layers contain hidden nodes. Hinton et al. proposed a greedy layerwise unsupervised learning algorithm for DBNs which is based on sequence training with restricted Boltzmann machines RBMs 28, 34 . A restricted Boltzmann machine RBM is composed of two different layers of units, with weighted connection between them. It consists of one layer of visible nodes neurons and one layer of hidden units. Figure 3 shows an RBM structure. Nodes in each layer have no connections between them and are connected to all other units in another layer. Connections between nodes are bidirectional and symmetric. Restricted Boltzmann machines RBMs have been used as generative models of many different types of data including labeled or unlabeled images windows of mel-cepstral coefficients that represent speech, and so on. Their most important use is as learning modules that are composed to form deep belief nets 28 .

Mathematical Problems in Engineering 7
Let v i and h j represent the states of visible node i and hidden node j, respectively. For binary state nodes, that is, v i and h j ∈ {0, 1}, the state of h j is set to 1 with probabilities 47 : where σ x is the logistic sigmoid function 1/ 1 exp −x , b j is the bias of j, and ν i is the binary state. w ij is the weight between ν i and h j . After binary states have been chosen for the hidden units, then set the state of ν i to be 1 with probability The training process of the RBM is described as follows. Firstly, a training sample is presented to the visible nodes, and the {ν i } is obtained. Then the hidden nodes state that {h j } are sampled according to probabilities in 2.7 . This process is repeated once more to update the visible and then the hidden nodes, and the one-step "reconstructed" states ν i and h j are obtained. The update in a weight is given as follows: where η is the learning rate, and · refers to the expectation of the training data. where ϕ j x is a sigmoid function with lower and upper asymptotes at θ L and θ H , ϕ j x j θ L θ H −θ L · 1/ 1 e −a j x j . N j 0, 1 represents a unit Gaussian. σ is a constant, and parameter a j is a "noise-control" parameter which controls the slope of the sigmoid function 49 . The update equations for w ij and a j are where η w and η a represent the learning rates, s j denotes the one-step sampled state of node j, and · refers to the expectation of the training data. We train sequentially as many RBMs as the number of hidden layers in the DBN to construct a DBN model. We adopt the learning algorithm according to 28,34,50 . The method of stacking CRBMs makes it possible to train many layers of hidden units efficiently and is one of the most common deep learning strategies. As each new layer is added, the overall generative model gets better. This process of learning is continued until a prescribed number of hidden layers in the DBN have been trained. In order to apply DBN model to drought prediction using SPI series, the DBN model with two hidden layers is selected in this paper. The main steps using DBN model for drought index prediction are as follows.
Step 1. Compute the different time-scale SPI series by precipitation data. The different time-scale SPI series are computed by precipitation data by the description method in Section 2.1, and different time-scale SPI series are obtained.
Step 2. Normalize the SPI series by formula 2.12 as follows: where SPI and SPI represent the normalized and original SPI data, respectively. The SPI min represents the minimum value of the corresponding SPI series, and SPI max represents the maximum value of the corresponding SPI series.
Step 3. Determine the optimal network structure by experiments. Determine the number of input nodes, the numbers of the first hidden and second hidden nodes, and weight coefficients by learning algorithm. The data of SPI series are split into two parts. The first part is used as a training sample, and the rest is used as a testing sample. During the training process, the network structures for different time-scale SPI series are determined according to the criterion of smallest RMSE and MAE.
Step 4. Forecast drought index based on DBN model and results analysis.

Experimental Design
We use four data sets of precipitation in the experiments. Four hydrologic stations were considered in this study. They are Bengbu, Fuyang, Xuchang, and Zhumadian in Huaihe River Basin which is located in the eastern part of China. Data sets contain monthly precipitation during 1958-2006. These data are used to calculate four different time scales of standardized precipitation index SPI , that is, SPI3, SPI6, SPI9, and SPI12. Taking the SPI3 as an example, all of the SPI sets are divided into two parts. The observations during 1958-1999 are as training set, and the remaining observations during 2000-2006 are as testing set.
Our purpose of this research is to explore if the DBN model can be used well in drought prediction by using the monthly rainfall data of four hydrologic stations from January 1958 to 2006 to calculate different time scales of SPI in Huaihe River Basin. In this paper, we use two criteria to evaluate the performance of a DBN in drought forecasting. They are root mean square error RMSE and mean absolute error MAE . The formulas of this two predictive accuracy measures are listed as follows: where y i is the observations of SPI, y i is the predicted SPI values, and T is the total number of predictions. We use the learning sample to find an optimal network structure for these four different time-scales SPI. Taking the SPI3 of Bengbu data as an example, we explain how to determine an optimal network structure. In our experiment, the DBN has two hidden layers.
The key for our experiment is to determine the numbers of input and hidden nodes. We determine the optimal number of input nodes and two hidden layer nodes by experiments. On one hand, neural networks with too few hidden nodes may not have enough power to model the data. On the other hand, neural networks with too many hidden nodes may lead to overfitting problems and finally result in poor forecasting performance 30 . In our experiment, the number of input nodes and hidden nodes of the DBN network structures is selected by experimentation. The number of input nodes ranges from 2 to 10. Because the forecasting performance of neural networks is not as sensitive to the number of hidden nodes as to the number of input nodes, so the number of hidden nodes is selected by five levels, that is, 5, 10, 15, 20, and 25. We did the experiment for 45 times to find the optimal structure of DBN. We compared the RMSE and MAE, and we determined the number of every layer node. The results are shown in Table 1. In Table 1, we find when the CRBM structure is 8-25, the RMSE is the smallest, and when the CRBM structure is 9-5, the MAE is the smallest. We can find that the most optimal structure is most likely to appear when the number of input nodes is 8 or 9. Then we do the next step. The results of the next step have just been shown in Table 2. We can find that the best DBN structure is 9-5-10-1. That is, the DBN has 9 input nodes, 5 nodes in the first hidden layer, 10 nodes in the second hidden layer, and 1 output node, and the RMSE and MAE are the smallest of all.
According to above processes, we can determine the optimal structures of DBN for the four stations and different time-scale SPI series. We try nine levels of input nodes from 2 to 10 in combination with five hidden nodes 5, 10, 15, 20, and 25 for CRBM training. We can find the optimal network structure in a similar way for all of the SPI series. The optimal network structures of DBN for the different four stations and different time-scale SPI series are shown in Table 3.

Results and Discussion
In this paper, the DBN and BP neural network model are used for forecasting the different time-scale SPI series, and the results of their prediction are compared. The quantitative performance evaluations of DBN and BP neural network are carried out by using RMSE and MAE. The results are shown in Table 4.
We can find that the prediction errors of the DBN are smaller than the prediction errors of BP neural network in Table 4  results of SPI12 are better than SPI9, SPI9 is better than SPI6, and SPI6 is better than SPI3. In a word, DBN has a higher precision in drought prediction based on SPI than BP neural network. Figure 4 shows the test results of SPI3, SPI6, SPI9, and SPI12 of Bengbu station. It is obvious that the prediction values of different time-scale SPI series are very close to the actual ones. The comparison results between observations and predicted data of Fuyang station, Xuchang station, and Zhumadian station are shown in Figures 5, 6, and 7 using DBN and BP neural network for SPI6 series.
From Figures 5, 6, and 7, the predicted data of SPI based on DBN model agreed with observations very well. The majority of DBN outputs are nearer to the real SPI values than those of BP neural network. The results show that the DBN model is appropriate for short term of drought index and can obtain higher precision.

Conclusion
In this paper, we proposed a deep belief network DBN for short-time drought index prediction. The forecasting model based on DBN is used to forecast different time-scale SPI series of four stations in Huaihe River Basin, China. Compared with the BP neural network, the DBN-based model is more reliable and efficient for short-term prediction of drought index. The errors results show that the DBN model outperforms the BP neural network. This study shows that the DBN model is a useful tool for drought prediction. Due to the complexity of the formation mechanism of the drought disasters and the long memory of hydrological data, some new method which can deal with long-range dependence will be  thought about, and further studies are needed to deal with more complex situations for drought prediction.