Standard Precipitation Index Drought Forecasting Using Neural Networks, Wavelet Neural Networks, and Support Vector Regression

Drought forecasts can be an effective tool for mitigating some of the more adverse consequences of drought. Data-driven models are suitable forecasting tools due to their rapid development times, as well as minimal information requirements compared to the information required for physically based models. This study compares the effectiveness of three data-driven models for forecasting drought conditions in the Awash River Basin of Ethiopia. The Standard Precipitation Index (SPI) is forecast and compared using artificial neural networks (ANNs), support vector regression (SVR), and wavelet neural networks (WN). SPI 3 and SPI 12 were the SPI values that were forecasted. These SPI values were forecast over lead times of 1 and 6 months. The performance of all the models was compared using RMSE, MAE, and R2. The forecast results indicate that the coupled wavelet neural network (WN) models were the best models for forecasting SPI values over multiple lead times in the Awash River Basin in Ethiopia.


Introduction
Droughts, a natural occurrence in almost all climatic zones, are a result of the reduction, for an extended period of time, of precipitation from normal amounts.Extended periods of drought can lead to several adverse consequences, which include a disruption of the water supply, low agricultural yields, and reduced flows for ecosystems.Consequently, the ability to forecast and predict the characteristics of droughts, specifically their initiation, frequency, and severity, is important.Effective drought forecasts are an effective tool for water resource management as well as an effective tool for the agricultural industry.
Currently, drought monitoring in Ethiopia is conducted by the National Meteorological Services Agency (NMSA).The NMSA regularly produces a 10-day bulletin that gives an analysis of rainfall based on the long-term average or normal.This bulletin is then circulated to a wide range of users, ranging from local development agents to decision makers at a national level.In addition to rainfall analysis, the normalized vegetation index (NDVI) is provided, which is a satellite-based index widely used to monitor vegetation and drought conditions.The NMSA produces a regular 10day bulletin regarding NDVI variation that compares the current vegetation condition with normal or conditions of the previous year [1].However, the NDVI is sensitive to changes in vegetative land cover and may not be effective in areas where vegetation is minimal.In addition, the NMSA of Ethiopia produces medium and seasonal forecasts of precipitation using the aforementioned NDVI.
Unlike other natural hazards, droughts have a slow evolution time [2].The consequences of droughts take a significant amount of time to come into effect with respect to their inception, and when they are perceived by ecosystems and hydrological systems.Due to this feature, effective mitigation of the most adverse drought impacts is possible, more than in the case of other extreme hydrological events such as floods, earthquakes, or hurricanes, provided Applied Computational Intelligence and Soft Computing a drought monitoring system, which is able to promptly warn of the onset of a drought and to follow its evolution in space and time, is in operation [3].
A common tool utilized to monitor current drought conditions is a drought index.Several drought indices can be used to forecast the possible evolution of an ongoing drought, in order to adopt appropriate mitigation measures and drought policies for water resources management [4].This is because a drought index is expressed by a numeric number, which is believed to be far more functional than raw data during decision-making [2].Several drought indices have been developed around the world in the past based on rainfall as the single variable, including the widely used Deciles [5], Standardized Precipitation Index (SPI) [6], and Effective Drought Index (EDI) [7].There is also the wellknown Palmer Drought Severity Index (PDSI) [8], which considers temperature along with rainfall.The SPI drought index was chosen to forecast drought in this study due to its simplicity, its ability to represent droughts on multiple time scales, and because it is a probabilistic drought index.In addition, the study by Ntale and Gan [9] determined that the SPI is the most appropriate index for monitoring the variability of droughts in East Africa because it is easily adapted to local climate, has modest data requirements, and can be computed at almost any time scale.
Forecasting any hydrologic phenomena can be done using either a physical, conceptual, or data-driven approach.The latter approach is widely used in hydrologic forecasting because data-driven models have low information requirements with respect to the number of variables required for inputs compared to physically based models.Datadriven models also have rapid development times.Unlike physical and conceptual models, data-driven models are not difficult to implement for the purposes of real-time forecasting.Artificial neural networks (ANNs) have been used in several studies as a drought-forecasting tool [10][11][12][13][14][15][16].The most popular type of ANN used for the purposes of drought forecasting is the multilayer perceptron (MLP) that is usually optimized with a back propagation algorithm.However, ANNs are limited in their ability to deal with nonstationarities in the data, a weakness also shared by multiple linear regression (MLR) and autoregressive integrated moving average (ARIMA) models.
This limitation with nonstationary data has led to the recent formation of hybrid models, where data is preprocessed for nonstationary characteristics and then run through a forecasting method such as ANNs to cope with the nonlinearity.Wavelet analysis, an effective tool to deal with nonstationary data, has recently been applied in hydrological forecasting to examine the rainfall-runoff relationship in a Karstic watershed [17], to characterize daily streamflow [18,19] and monthly reservoir inflow [20], to evaluate rainfallrunoff models [21], to forecast river flow [22][23][24], to forecast future precipitation values [25], and for the purposes of drought forecasting [26].The study conducted by Kim and Valdes [26] is the only study that has explored the ability of a wavelet-neural network conjunction model (WN) to forecast a given drought index.However, no studies that assess the ability of WN models to forecast the SPI drought index in particular have been explored.Support Vector Machines (SVMs) are a relatively new form of machine learning that was developed by Vapnik [27].The term SVM is used to refer to both classification and regression methods as well as the terms Support Vector Classification (SVC) and Support Vector Regression (SVR), which refer to the problems of classification and regression, respectively [28].There are several studies where SVRs were used in hydrological forecasting.Khan and Coulibaly [29] found that an SVR model was more effective at predicting 3-12 month lake water levels than ANN models.Rajasekaran et al. [30] used SVR successfully for storm surge predictions, and Kisi and Cimen [31,32] used SVR to estimate daily evaporation and daily streamflow, respectively.Finally, SVR have been successfully used to predict hourly streamflow by Asefa et al. [33] and were shown to perform better than ANN and ARIMA models for monthly streamflow prediction by Wang et al. [34] and Maity et al. [35], respectively.Yuan and Tan [36] used SVRs as a screening tool to test for drought resistance of rice.However, to date SVRs have not been applied to forecast a given drought index.
This study compared the effectiveness of three datadriven models for forecasting drought conditions in the Awash River Basin of Ethiopia.The Standard Precipitation Index (SPI) was forecasted and compared using artificial neural networks (ANNs), support vector regression (SVR), and wavelet networks (WN).SPI 3 and SPI 12 were forecast over lead times of 1 and 6 months.The forecast lead times were chosen because a 1-month lead time is a typical short-term lead time and a 6-month lead time is representative of the bimodal rainfall pattern in the Awash River Basin.Forecast results of this study are useful for the agricultural water management sector and have the potential to be applied by water resources managers to effectively manage water resources in the region.In addition, accurate forecasts using these data-driven models can complement the forecasts already being used by the NMSA of Ethiopia.

Theoretical Development
In the following section, the computation of the SPI is briefly described.In addition to the description of the SPI, this section also describes the data-driven models that were used to forecast the SPI.

The Standard Precipitation Index (SPI).
The Standard Precipitation Index (SPI) was developed by McKee et al. [6].As mentioned in the previous section, one of the main advantages of the SPI is that it only requires precipitation data as an input, which makes it ideal for areas where data collection is not as extensive (such as in Ethiopia).The fact that the SPI is based solely on precipitation makes its evaluation relatively easy [37].The SPI is a standardized index.Standardization of a drought index ensures independence from geographical position as the index in question is calculated with respect to the average precipitation in the same place [37].The computation of the SPI drought index for any location is based on the long-term precipitation record (at least 30 years) cumulated over a selected time scale [38].This long-term precipitation time series is then fitted to a gamma distribution, which is then transformed through an equal probability transformation into a normal distribution [38,39].Positive SPI values indicate wet conditions with greater than median precipitation, and negative SPI values indicate dry conditions with lower than median precipitation [38].Table 1 below indicates SPI drought classes.
In most cases, the probability distribution that best models observational precipitation data is the Gamma distribution [37].The density probability function for the Gamma distribution is given by the expression [37]: where α > 0 is the shape parameter, β > 0 is the scale parameter, and x > 0 is the amount of precipitation.Γ(α) is the value taken by the standard mathematical function known as the Gamma function, which is defined by the integral [37]: In general, the Gamma function is evaluated either numerically or using the values tabulated depending on the value taken by parameter α.
In order to model the data observed with a gamma distributed density function, it is necessary to estimate parameters α and β appropriately.Different methods have been suggested in the literature for the estimate of these two parameters.For example, the Thom [40] approximation is used for maximum probability in Edwards and McKee [41]: where for n observations The estimate of the parameters can be further improved by using the interactive approach suggested in Wilks [42].
After estimating coefficients α and β the density of probability function g(x) is integrated with respect to x and we obtain an expression for cumulative probability G(x) that a certain amount of rain has been observed for a given month and for a specific time scale [37]: The Gamma function is not defined by x = 0, and since there may be no precipitation, the cumulative probability becomes [37] H where q is the probability of no precipitation.H(x) is the cumulative probability of precipitation observed.The cumulative probability is then transformed into a normal standardized distribution with null average and unit variance from which we obtain the SPI index.The above approach, however, is neither practical nor numerically simple to use if there are many grid points of many stations on which to calculate the SPI index.In this case, an alternative method is described in Edwards and McKee [41] using the technique of approximate conversion developed in Abramowitz and Stegun [43] that converts the cumulative probability into a standard variable Z.The SPI index is then defined as where where x is precipitation, H(x) is the cumulative probability of precipitation observed, and c 0 , c 1 , c 2 , d 0 , d 1 , d 2 are constants with the following values:

Artificial Neural Networks (ANNs).
Artificial neural networks (ANNs) are flexible computing frameworks that resemble the structure of a nerve system.ANNs have been used to model a broad range of hydrologic time series over the past two decades.The main advantage of using ANNs is that there is no need to define the physical processes between the inputs and outputs [11].This feature makes ANNs suitable for the purposes of drought forecasting, where all the variables that may cause a drought are not fully understood.In this paper, the multilayer perceptron (MLP) feedforward network was used to forecast the SPI time series.Figure 1 is an illustration of a typical feed-forward neural network.ANN models in this study were trained with the Levenberg Marquardt (LM) back propagation algorithm.MLPs have been used extensively in hydrologic forecasting studies [10,12,23,26,44,45] due to their simplicity.In terms of their architecture, MLPs consist of an input layer, one or more hidden layers, and an output layer.The hidden layer contains the neuron-like processing elements that connect the input and output layers and is given by [26] where n is the number of input variables; m is the number of hidden neurons; x i (t) = the ith input variable at time step t s ; w ji = weight that connects the ith neuron in the input layer and the jth neuron in the hidden layer; w j0 = bias for the jth hidden neuron; f n = activation function of the hidden neuron; w k j = weight that connects the jth neuron in the hidden layer and kth neuron in the output layer; w k0 = bias for the kth output neuron; f 0 = activation function for the output neuron; y k (t s ) is the forecasted kth output at time step t s [26].[27] as a tool for classification and regression.SVMs embody the structural risk minimization principle, while neural networks embody the empirical risk minimization principle.In contrast to ANNs that seek to minimize training error, SVMs attempt to minimize the generalization error.SVMs have two components: support vector classification (SVC) and support vector regression (SVR).Since the main objective of this study is to forecast the SPI, the SVR was used.

Support Vector Regression. Support vector machines (SVM) were developed by Vapnik
Support vector regression (SVR) is used to describe regression with SVMs [27].In regression estimation with SVR, the purpose is to estimate a functional dependency f ( x) between a set of sampled points X = { x 1 , x 2 , . . ., x l } taken from R n and target values Y = {y 1 , y 2 , . . ., y l } with y i ∈ R (the input and target vectors (x i 's and y i 's) refer to the monthly records of the SPI index).Assuming that these samples have been generated independently from an unknown probability distribution function P( x, y) and a class of functions [27]: (11) where W and B are coefficients that have to be estimated from the input data.The main objective is to find a function f ( x) ∈ F that minimizes a risk functional [46]: where l is a loss function used to measure the deviation between the target, y, and estimate f ( x), values.As the probability distribution function P( x, y) is unknown, one cannot minimize the risk functional directly, but can only compute the empirical risk function as [46] where N is the number of samples.This traditional empirical risk minimization is not advisable without any means of structural control or regularization.To avoid this issue a regularized risk function with the smallest steepness among the functions that minimize the empirical risk function can be used as [46] where γ is a constant (γ ≥ 0).This additional term reduces the model space and thereby controls the complexity of the solution resulting in the following form of this expression [46,47]: where C c is a positive constant that has to be selected beforehand.The constant C c that influences a trade-off between or an approximation error and the regression (weight) vector W is a design parameter.The loss function in this expression, which is called an ε-insensitive loss function (l ε ), has the advantage that it will not need all the input data for describing the regression vector W and can be written as [46] This function behaves as a biased estimator when it is combined with the regularization term (γ W ).The loss is equal to 0 if the difference between the predicted and observed value is less than ε.The nonlinear regression function is described by the following expression [27,46,48]: where α i , α * 1 ≥ 0 are the Lagrange multipliers, B is a bias term, and K(x, x i ) is the Kernel function which is based upon Reproducing Kernel Hilbert Spaces [32].The Kernel function enables operations to be performed in the input space as opposed to the potentially high-dimensional feature space.Several types of functions are treated by SVR such as polynomial functions, Gaussian radial basis functions, exponential radial basis functions, multilayer perception functions, and functions with splines and so forth [32].

Wavelet Transforms.
Wavelet transforms are mathematical functions that can be used for the analysis of time-series that contain nonstationarities.Wavelet transforms allow for the use of long time intervals for low frequency information and shorter intervals for high frequency information.They are capable of revealing aspects of data like trends, breakdown points, and discontinuities that other signal analysis techniques might miss [26].Another advantage of wavelet analysis is the flexible choice of the mother wavelet according to the characteristics of the investigated time series [45].An important step in the use of wavelet transforms is the choice of a mother wavelet (ψ).The continuous wavelet transform (CWT) is defined as the sum over all time of the signal multiplied by scale and shifted versions of the wavelet function ψ [26]: where s is the scale parameter; τ is the translation and * corresponds to the complex conjugate [26].The CWT produces a continuum of all scales as the output.Each scale corresponds to the width of the wavelet; hence, a larger-scale means that more of a time series is used in the calculation of the coefficient than in smaller scales.The CWT is useful for processing different images and signals; however, it is not often used for forecasting because its computation is complex and time consuming.As an alternative, in forecasting applications, the discrete wavelet transform (DWT) is used, due to its simplicity and shorter computation time.DWT scales and positions are usually based on powers of two (dyadic scales and positions).This is achieved by modifying the wavelet representation to [49] ψ j,m (m) = 1 where j and m are integers that control the scale and translation, respectively, while s 0 > 1 is a fixed dilation step and τ 0 is a translation factor that depends on the aforementioned dilation step.The effect of discretizing the wavelet is that the time-space scale is now sampled at discrete levels.The DWT operates two sets of functions: high-pass and low-pass filters.
The original time series is passed through high-pass and low-pass filters, and detailed coefficients and approximation series are obtained.One of the inherent challenges of using the DWT for forecasting applications is that if we change values at the beginning of our time series, all of the wavelet coefficients will subsequently change.To overcome this problem, a redundant algorithm, known as the à trous algorithm can be used, given by [50] where h is the low pass filter and the finest scale is the original time series.To extract the details, w i (k), that were eliminated in (21), the smoothed version of the signal is subtracted from the coarser signal that preceded it, given by [51] where c i (k) is the approximation of the signal and c i−1 (k) is the coarser signal.Each application of ( 20) and ( 21) creates a smoother approximation and extracts a higher level of detail.Finally, the nonsymmetric Haar wavelet can be used as the low pass filter to prevent any future information from being used during the decomposition [52].

The Awash River Basin
This study forecasted the SPI in the Awash River Basin of Ethiopia.The mean annual rainfall of the basin varies from about 1,600 mm in the highlands north east of Addis Ababa, to 160 mm in the northern point of the basin [53].The total amount of rainfall also varies greatly from year to year, resulting in severe droughts in some years and flooding in others.The total annual surface runoff in the Awash Basin amounts to some 4, 900 × 106 m 3 [54].
The Awash River Basin (Figure 2) was separated into three smaller basins for the purpose of this study on the basis of various factors such as location, altitude, climate, topography, and agricultural development.A study conducted by Edossa et al. [54] separated the Awash Basin in a similar fashion.The subbasins were called the Upper, Middle, and Lower Awash Basins, respectively.The reasoning behind the use of these three subbasins was to ensure the methods used in this study were effective in forecasting short-term drought in different conditions.The characteristics of each sub-basin are briefly described in the following sections.
3.1.Upper Awash Basin.The Upper Awash Basin has a temperate climate with annual mean temperatures ranging between 15-22 • C and an annual precipitation of between 500-2000 mm [54].Rainfall distribution in the Upper Awash Basin is unimodal.Seven rainfall gauges located in the Upper Awash River Basin were chosen for this study (Table 2).These stations were chosen because their precipitation records from 1970-2005 were either complete or relatively complete.Any station, which had over 10% of their records missing was not selected.

Middle Awash Basin.
The Middle Awash Basin is in the semiarid climatic zone with a long hot summer and a short mild winter.Annual rainfall varies between 200-1500 mm [54].The rainfall distribution is bimodal in this subbasin.Minor rains normally occur in March and April and major rains from July to August.Eight rainfall gauges located in the Middle Awash Basin were selected using the same criteria as in the Upper Awash Basin and are shown in Table 2.

Lower Awash Basin.
The Lower Awash River Basin has a hot, semi-arid climate.The annual mean temperature of the region ranges between 22 and 32 • C with average annual precipitation between 500 and 700 mm [54].Five rainfall gauges were selected form the Lower Awash Basin using the same criteria used in the two other sub-basins and are shown in Table 2.

Methodology
The methodology section of this paper describes how the SPI was calculated and then forecast over two separate lead times using ANN, WN, and SVR models.

SPI Calculation.
In order to calculate the SPI, a probability density function that adequately describes the precipitation data must be determined.The gamma distribution function was selected to fit the raw rainfall data from each station in this study.The SPI is a z-score and represents an event departure from the mean, expressed in standard deviation units.The SPI is a normalized index in time and space.SPI values can be categorized according to classes.In this study, the near normal class is established from the aggregation of two classes: −1 < SPI < 0 (mild drought) and 0 ≤ SPI ≤ 1 (slightly wet).The departure from the mean is a probability indication of the severity of the wetness or drought that can be used for risk assessment.The time series of the SPI can be used for drought monitoring by setting application-specific thresholds of the SPI for defining drought beginning and ending times.Accumulated values of the SPI can be used to analyze drought severity.In this study, the SPI SL 6 program developed by the National Drought Mitigation Centre, University of Nebraska-Lincoln, was used to compute time series of drought indices (SPI) for each station in the basin and for each month of the year at different time scales.
In each sub-basin, for each station, SPI 3 and SPI 12 were computed.These SPI values were subsequently forecast over lead times of 1 and 6 months.A 3-month SPI compares the precipitation for that period with the same 3-month period over the historical record.For example, a 3-month SPI at the end of September compares the precipitation total for the July-September period with all the past totals for that same period.A 3-month SPI indicates short and medium term trends in precipitation and is still considered to be more sensitive to conditions at this scale than the Palmer Index.A 3-month SPI can be very effective in showing seasonal trends in precipitation and is a good indicator of agricultural drought.SPI 12 reflects long-term precipitation patterns.SPI 12 is a comparison of the precipitation for 12 consecutive months with the same 12 consecutive months during all the previous years of available data and is a good indicator of long-term drought conditions.Because these time scales are the cumulative result of shorter periods that may be above or below normal, the longer SPIs tend toward zero unless a specific trend is taking place.Forecast lead times of 1 and 6 months were chosen because 1 month is the shortest possible monthly lead time and 6 months is representative of the bimodal rainfall pattern in parts of the Awash River Basin discussed in Section 3.2.

Wavelet Decomposition.
In the proposed WN model, the SPI data for each of the rainfall stations was decomposed into subseries of approximations and details (DWs).The process consists of a number of successive filtering steps.The original SPI time series is first decomposed into an approximation and accompanying detail signal.The decomposition process is then iterated, with successive approximation signals being decomposed in turn.As a result the original SPI time series is broken down into many lower resolution components.
When conducting wavelet analysis, the number of decomposition levels that is appropriate for the data must be chosen.A commonly used method to determine the number of decomposition levels is based on the signal length [55] and is given by L = int[log(N)], where L is the level of decomposition and N is the length of the signal.The training set in this study comprised between 1290 and 3017 samples (samples varied depending on the number of inputs for each rainfall station).Thus, the decomposition level was selected as L = 3.
As discussed in Section 2.4, the "a trous" wavelet algorithm with a low pass Haar filter was used to create four Figure 2: Awash River Basin (Source: [54]).
sets of wavelet subseries.These four sub-series included a low frequency component (the approximation) used to uncover the trend of each signal and a set of three high frequency components (the details) used to uncover the periodicity of the signal.All decomposed sub-series were added together to generate one time series and used as an input to the ANN models.Using the sum of all the sub-series as an input in this study provided more accurate results than using certain subseries or sub-series that exhibited the highest correlations with the original time series.

ANN Models.
All the ANN models were created with the MATLAB (R.2010a) ANN toolbox.The hyperbolic tangent sigmoid transfer function was the activation function for the hidden layer, while the activation function for the output layer was a linear function.All the ANN models in this study were trained using the LM back propagation algorithm.
The LM back propagation algorithm was chosen because of its efficiency and reduced computational time in training models [45].In this study, there were between 4-8 input neurons for each ANN model.The optimal number of input neurons for each station was selected using a trial and error procedure.The data-driven models were recursive models, where a model is forecast one lead time ahead, and the subsequent forecasts include the output from the previous forecast as an input.Hence, a forecast of 6 months lead time will have the outputs from forecasts of lead times of 1-5 months.Recursive models were used because it was determined that it would be simpler to use an ANN with one output neuron.Mishra and Desai [10] compared recursive ANN models and ANN models with more than one output neuron (direct ANN models) and found the results to be comparable for forecasting the SPI.The inputs and outputs were normalized between 0 and 1.A study by Wanas et al. [56] empirically determined that the best performance of a neural network occurs when the number of hidden nodes is equal to log(T), where T is the number of training samples.Another study conducted by Mishra and Desai [10] determined that the optimal number of hidden neurons is 2n + 1, where n is the number of input layers.In this study the optimal number of hidden neurons was determined to be between log(T) and (2n + 1).For example, if using the method proposed by Wanas et al. [56] gave a result of 4 hidden neurons and using the method proposed by Mishra and Desai [10] gave 6 hidden neurons, the optimal number of hidden neurons was between 4 and 6, thereafter the optimal number was determined using trial and error.These two methods helped establish an upper and lower bound for the number of hidden neurons.
For all the ANN models the cross validation technique [57] was used to partition the data sets; 80% of the data was used to train the models, while the remaining 20% of the data was used to test and validate the models, with 10% used for testing and 10% used for validation.The training set was used to compute the error gradient and to update the network weights and biases.The error from the validation set was used to monitor the training process.If the network overfits the data, the error in the validation set will begin to rise.When the validation error increases for a specified number of iterations, the training is stopped, and the weights and biases at the minimum of the validation error are returned.The testing data set is an independent data set and is used to verify the performance of the model.

WN Models.
The WN models were trained in the same way as the ANN models, with the exception that the inputs were made up from the wavelet decomposed subseries.In this study, the significant wavelets (approximation and detail series) were summed together once the insignificant coefficients were excluded, similar to what was done by Partal [58] and Kisi and Cimen [32].In this study, the summed sub-series provided better results than using the individual wavelet coefficients as inputs.
For WN models, an input layer with 4-8 neurons, a single hidden layer composed of 4-6 neurons, and one output layer consisting of one neuron were developed.The number of neurons was determined in the same way as for the traditional ANN models.All the ANN models that had wavelet decomposed subseries as their inputs were also partitioned in a similar manner to the traditional ANN models.

SVR Models.
All SVR models were developed using the OnlineSVR software created by Parrella [59].OnlineSVR is a technique used to build support vector machines for regression.The OnlineSVR software partitions the data into only two sets: a training set and a testing set.The SVR models were partitioned in a similar manner to the ANN and WN models.
All SVR models used the nonlinear radial basis function (RBF) kernel.As a result, each SVR model consisted of three parameters that were selected: gamma (γ), cost (C), and epsilon (ε).The γ parameter is a constant that reduces the model space and controls the complexity of the solution, C is a positive constant that is a capacity control parameter, and ε is the loss function that describes the regression vector without all the input data [32].These three parameters were selected based on a trial and error procedure.The combination of parameters that produced the lowest RMSE values for the training data sets was selected.

Performance Measures.
The performance of the forecasts resulting from the data-driven models was evaluated by the following measures of goodness of fit: where y i is the mean value taken over N, y i is the observed value, y i is the forecasted value, and N is the number of data points.The coefficient of determination measures the degree of association among the observed and predicted values.The higher the value of R 2 (with 1 being the highest possible value), the better the performance of the model The Root Mean Squared Error (RMSE where SSE is the sum of squared errors and N is the number of data points used.SSE is given by with the variables already having been defined.The RMSE evaluates the variance of errors independently of the sample size The Mean Absolute Error (MAE The MAE is used to measure how close forecasted values are to the observed values.It is the average of the absolute errors.

Results and Discussion
For each subbasin of the Awash River Basin, the station that showed the best performance results for each data driven model are presented below.In this study, SPI 3 and SPI 12 were forecast over lead times of 1 and 6 months to determine the effectiveness of the data-driven models over short-and long-term lead times.As shown in Table 3(a), the best data-driven model in the Upper Awash Basin for forecasts of SPI 3 and 12 is the WN model.All the models exhibited better results for forecasts of a 1-month lead time (L1) compared to forecasts of 6-months lead time (L6).Forecasts of SPI 12, for all the data-driven models, had better performance results than forecasts of SPI 3 in terms of R 2 , RMSE, and MAE, regardless of forecast lead time.The best 1-month lead time WN forecast of SPI 12 had results of 0.9534, 0.0600, and 0.0536 in terms of R 2 , RMSE, and MAE, respectively.The second best results were from ANN models with results of 0.9451, 0.0610, and 0.0603 in terms of R 2 , RMSE and MAE, respectively.Figures 3 and 4 show the ANN and WN 1-month forecast results for SPI 12 at the Ejersalele station.
The performance of both these models is quite similar, as indicated by Figures 3 and 4. Both models adequately represent the periods of abundant and acute precipitation as indicated by the peaks and valleys in the figures.
Similar to the results for the Upper Awash Basin, the best forecast results in the Middle Awash Basin were from WN models.The WN models had the best results for both SPI 3 and SPI 12, for forecast lead times of 1 and 6 months, respectively (Table 3(b)).The forecast results of all the datadriven models deteriorated when the forecast lead time was increased from 1 to 6 months.
Figure 5 illustrates the relationship between the observed SPI 12 and the predicted SPI 12 from the ANN model at  the Nazereth station.The ANN model underestimates the severity of the drought period at 112 months.In contrast, the WN model for SPI 12 at the Nazereth station displays improved results with respect to the drought period at 112 months (Figure 6).In the Lower Awash Basin, the forecast results exhibited the same trend shown in the Upper and Middle sub-basins.The WN models had the best results for both SPI 3 and SPI 12, for forecast lead times of 1 and 6 months, respectively.Figures 7 and 8 illustrate the best SPI 12 forecasts at the Dubti station where both ANN and WN models predict the periods of abundant and acute precipitation quite well.When the forecast lead time was increased, the performance of all the models deteriorated, especially with respect to R 2 .Data-driven models in the Upper and Lower Awash basins exhibited their best results for forecasts of SPI 12, indicating that data-driven models are more effective in predicting long-term drought conditions in those two basins, while in the Middle Awash Basin most models also exhibited their best results for forecasts of SPI 12 except WN models, which exhibited their best results for forecasts of SPI 3.This trend could be due to the fact that long-term SPI, which is a cumulative of short-term time scales, tend toward zero unless a specific trend is taking place.The exception regarding the WN models in the Middle Awash Basin may be due to the fact that the precipitation record at this station is relatively stable, meaning there are not many changes from one month to the next and the SPI 3 is not sensitive to those changes.Overall, all three data-driven models forecast SPI 3 and SPI 12 well for forecast lead times of 1 and 6 months.The results indicate that ANN models are more effective than SVR models at forecasting in this study.The use of wavelet analysis improved the forecast results of ANN models, specifically in predicting extreme events as shown in Figure 6.Indeed, using a measure for peak relative error as shown by it was determined that the relative error of the ANN model, 95%, was reduced to 88% when a WN model was used.The fact that wavelet analysis is an effective tool at revealing local discontinuities helps explain why it was more effective in predicting the extreme events in the Middle Awash Basin.Wavelet analysis may help de-noise the original SPI time series compared to a traditional ANN model.The forecast of this de-noised signal may further explain the fact that extreme events are forecast better using wavelet analysis.An increase in forecast lead time results in a deterioration of performance in all the models.However, this deterioration does not result in poor models, indicating the stability of these data-driven models in predicting the SPI.The results in terms of RMSE and MAE do not deteriorate drastically with an increase of lead time.For example, for the Dubti station, the RMSE and MAE of SVR models deteriorate by 0.05 and 0.26%, respectively.
There is variability with regards to the best forecasts of both SPI 3 and SPI 12 amongst the three subbasins.For example, the best forecast of SPI 3 at a 1-month lead time occurred in the Middle Awash Basin (WN model), while the best forecast of SPI 12 at a 1-month lead time occurred in the Upper Awash Basin (WN model).While each subbasin has a different climatology, there does not seem to be a clear trend linking climatology with forecast accuracy.It seems that the reason behind the best models for each data-driven method being in various subbasins is linked with the characteristics of the individual station and not the characteristics of the subbasin as a whole.
In addition, the forecast results for SPI 12 are better than the forecast results for SPI 3 in almost all cases.For SPI 3 and other short-term SPI, each new month has a large impact on the period sum of precipitation [6].As a result, the SPI 3 is sensitive to any change in precipitation from one month to another.In the case of SPI 12, each individual month has less impact on the total and the index is not as sensitive to changes in precipitation from one month to the next.The fact that SPI 3 is more sensitive to changes in precipitation results in less accurate forecast results than SPI 12.However, the effects of wavelet analysis are more significant for SPI 3 than for SPI 12, especially for forecast lead times of 6 months.As stated previously, the ANN forecasts of SPI 12 are not as sensitive to changes in precipitation and thus good results are obtained.The ability of wavelet analysis to improve these results exists as shown but is not as high as the improvement seen in SPI 3 forecasts because ANN forecasts of SPI 3 suffer due to the sensitivity of SPI 3 to slight changes in precipitation over the long-term record.
All three subbasins had a different climatology.The forecast results have all shown that WN models are the most effective at forecasting the SPI in all the sub-basins in terms of R 2 , RMSE and MAE.Whether this is the case in all climatic zones needs to be explored in future studies.

Conclusion
This study tried to determine the most effective data-driven model for forecasts of the SPI drought index in the Awash River Basin of Ethiopia.WN models were shown to be the most effective model for forecasts of SPI 3 and 12 in all three subbasins.WN models showed greater correlation between observed and predicted SPI compared to simple ANNs and SVR models.WN models also consistently showed lower values of RMSE and MAE compared to the other data driven models explored in this study.All the data-driven models showed increased forecast results for SPI 12 compared to SPI 3. Forecast results deteriorated as the forecast lead time increased for all the models.Of the two machine learning techniques, ANNs are more effective in forecasting the SPI compared to SVR models.This trend occurs in all three subbasins and should be studied in other regions to determine if ANNs are more effective tools for drought forecasting compared to SVR models.It is thought that WN models provide more accurate results because preprocessing the original SPI time series with wavelet decompositions "denoises" the data.Future studies should attempt to explore WSVR models, ensemble WN and WSVR models, and explore SPI forecasts using these new methods in other regions with different characteristics.Future studies should also attempt to quantify time shift error as it is a part of forecasting problems with regression models.

Figure 3 :
Figure 3: SPI 12 forecast results for the best ANN model at the Ejersalele station (1-month lead time).

Figure 4 :
Figure 4: SPI 12 forecast results for the best WA-ANN model at the Ejersalele station (1-month lead time).

Figure 8 :
Figure 8: SPI 12 forecast results for the best WN model at the Dubti station (1-month lead time).

Table 2 :
Descriptive statistics for the Awash River Basin.

Table 3 :
(a) Performance results for the Ejersalele station, Upper Awash Basin, (b) Performance results of Nazereth Station, Middle Awash Basin, (c) Performance results of Dubti Station, Lower Awash Basin.
Figure 5: SPI 12 forecast results for the best ANN model at the Nazereth station (1 month lead time).Figure 6: SPI 12 forecast results for the best WN model at the Nazereth station (1-month lead time).