Reservoir Inflow Prediction under GCM Scenario Downscaled by Wavelet Transform and Support Vector Machine Hybrid Models

Climate change has significant impacts on changing precipitation patterns causing the variation of the reservoir inflow. Nowadays, Indonesian hydrologist performs reservoir inflow prediction according to the technical guideline of Pd-T-25-2004-A.This technical guideline does not consider the climate variables directly, resulting in significant deviation to the observation results. This research intends to predict the reservoir inflow using the statistical downscaling (SD) of General Circulation Model (GCM) outputs. The GCM outputs are obtained from the National Center for Environmental Prediction/National Center for Atmospheric Research Reanalysis (NCEP/NCAR Reanalysis). A new proposed hybrid SD model named Wavelet Support Vector Machine (WSVM) was utilized. It is a combination of the Multiscale Principal Components Analysis (MSPCA) and nonlinear Support Vector Machine regression. The model was validated at Sutami Reservoir, Indonesia. Training and testing were carried out using data of 1991–2008 and 2008–2012, respectively. The results showed that MSPCA produced better extracting data than PCA. The WSVM generated better reservoir inflow prediction than the one of technical guideline. Moreover, this research also applied WSVM for future reservoir inflow prediction based on GCM ECHAM5 and scenario SRES A1B.


Introduction
Global warming caused by increased concentrations of greenhouse gases has led to climate change.It has an impact on changes in rainfall patterns in spatial-temporal perspective.Based on the fourth report of the Intergovernmental Panel on Climate Change (IPCC) [1], the pattern of rainfall and extreme rainfall events in the Southeast Asian countries will change as the climate changes.In Indonesia, rainfall pattern is also significantly changed into a spatial-temporal one due to the climate change.Some regions have experienced a timing shift between wet months and dry months [2].Climate change also affects the changes in the trends or patterns of rainfall at the Brantas Watershed in East Java, Indonesia [3].
The spatial-temporal changes in the precipitation patterns may lead to the changes in the reservoir inflow.Nowadays, prediction of the reservoir inflow is performed using the Pd-T-25-2004-A technical guideline issued by the Department of Settlement and Regional Infrastructure (Kimpraswil), Indonesia [4].This technical guideline does not consider the climate variable directly, which results in the significant deviation when compared to the observation results.
The objective of this research is to predict the reservoir inflow by including the climate variables in a direct manner.The atmospheric data of General Circulation Model (GCM) outputs are taken from the National Oceanic and Atmospheric Administration (NOAA), National Center for Environmental Prediction/National Center for Atmospheric Research Reanalysis (NCEP/NCAR Reanalysis).The NCEP/NCAR Reanalysis outputs have a coarse spatial resolution (2.5 ∘ × 2.5 ∘ ), making it unusable to the hydrologic modeling in watershed scale [5][6][7].The inappropriate spatial resolution is to be resolved by developing the downscaling technique [8,9].Downscaling techniques can be grouped into two types of approaches, that is, dynamic downscaling (DD) and statistical 2 Advances in Civil Engineering downscaling (SD).DD model approach is a Regional Climate Model (RCM), which refers to the physical boundary conditions on a regional scale GCM.This approach requires a complex design and very high computational cost [10,11].SD model is a computationally simple and economical one that is carried out by determining the transfer function (empirical) that connects between atmospheric circulation variables (predictors) and local climate variables (predictand) [12].
Streamflow modeling using the SD model is performed using two approaches, that is, indirect downscaling and direct downscaling.The first approach is performed by linking the SD model of precipitation and the hydrological model [7,[13][14][15].Meanwhile, the second approach is performed using the SD model of the streamflow which is based on the GCM outputs directly [9,16,17].In this approach, influences of land use, soil cover, and groundwater storage are not considered.
The choice of potential predictors of the GCM outputs is an important part in the SD model [11].The predictor selection may vary in each region, depending upon the characteristics of the GCM outputs and the characteristics of the predictand [18].Besides, the selection of the optimum domain grid of the GCM outputs may result in a better correlation between predictor and predictand.Ghosh and Mujumdar [16] and Sachindra et al. [17] employed the optimum domain grids of GCM outputs of 5 × 5 and 7 × 6, respectively, for the SD streamflow, while Tripathi et al. [11] and Tolika et al. [8] employed the optimum domain grids of GCM outputs of 6 × 6 and 4 × 6, respectively, for the SD precipitation.The preprocessing or extraction data of the predictor in the optimum GCM domain grid is performed using the Principal Component Analysis (PCA) [9,16,17].Moreover, extraction data can be performed based on wavelet transform, namely, Multiscale Principal Component Analysis (MSPCA).It is better suited for extraction data containing contributions that change over time and frequency [19].
This study developed direct statistical downscaling models to predict reservoir inflow using a novel hybrid model, namely, Wavelet Support Vector Machine (WSVM).WSVM is a combination of GCM output data extraction based on wavelet transform and nonlinear Support Vector Machine regression.  1.

Data and Methods
In order to provide the inputs for the calibration and validation of SD model, NCEP/NCAR Reanalysis data for the period 1991-2012 were obtained from http://www.esrl .noaa.gov/psd/.The selection of the potential predictor of the NCEP/NCAR Reanalysis data is based on the correlation coefficient above 0.5 to the predictand (local precipitation).The potential predictors consist of precipitation water (prwtr), zonal velocity component (uwnd), meridional velocity component (vwnd), temperature (air), pressure (pres), sea level pressure (slp), relative humidity at 500 hPa (rhum500), relative humidity at 850 hPa (rhum850), specific humidity at 500 hPa (shum500), specific humidity at 850 hPa (shum850), omega at 500 hPa (omega500), omega at 850 hPa (omega850), zonal velocity component at 850 hPa (uwnd850), and meridional velocity component at 850 hPa (vwnd850).Moreover, monthly precipitation and reservoir inflow with similar period were obtained from Perum Jasa Tirta I, Malang, Indonesia.The datasets were separated into two groups, that is, training data (1991 to 2008) and testing data (2008 to 2012).
For the future projection of reservoir inflow, monthly outputs of the GCM ECHAM5 under Special Report on Emission Scenario (SRES) A1B of IPCC were obtained from the Programme for Climate Model Diagnoses and Intercomparison (PCMDI) website (http://www.ipcc-data.org/sim/gcmmonthly/SRES AR4/index.html)for the period 2013-2035.The GCM ECHAM5 is used for reservoir inflow prediction which is referred to in previous studies [20,21].The SRES A1B is a climate change scenario which indicated that the atmospheric CO 2 concentrations reach 720 ppm in the year 2100 [18].A research carried out by Ambarsari and Tedjasukmana [22] demonstrated that the atmospheric CO 2 concentrations over Indonesia increased ranging from 370 ppm to 390 ppm during 8 years (2002 to 2010).If the increasing rate of the CO 2 concentration is assumed to be constant (2.5 ppm per year), it will reach 615 ppm in the year 2100 (closest to the SRES A1B).

Wavelet Transform.
The wavelet analysis is an important tool to provide information for both frequency and time domain of the time series data.The wavelet transform decomposes time series data into different frequencies using wavelet functions.The application of the wavelet transform for streamflow forecasting was carried out by several researchers, such as Guo et al. [23], Kisi and Cimen [24], and Santos and Silva [25].Moreover, the wavelet transform is performed in order to reduce the high data dimension [26].
The advantage of wavelet transform over PCA in reducing the data dimension is ability to provide a lot of alternative matrixes of transformation which can be selected, in such a way that the resulting dimension is compatible with the original data [26].The matrixes of wavelet coefficient are obtained by dilation and translation of two types of wavelet functions, namely, father wavelet () and mother wavelet () [27].

Multiscale Principal Component Analysis (MSPCA).
The data dimension reduction by wavelet transform may cause multicollinearity; thus, further analysis using PCA is needed.Combined data dimension reduction between wavelet transform and PCA is called Multiscale Principal Component Analysis (MSPCA).MSPCA combines the ability of PCA to decorrelate the variables by extracting a linear relationship with that of wavelet analysis to extract deterministic features and approximately decorrelate autocorrelated measurements.MSPCA computes the PCA of the wavelet coefficients at each scale, followed by combining the results at relevant scales [19].The applications of the preprocessing data through employing MSPCA can be seen in several previous literatures, such as Aminghafari et al. [28], Sharma et al. [29], Widjaja et al. [30], and Anwar et al. [31].
Figure 2 presents the MSPCA algorithm.The NCEP/ NCAR Reanalysis predictors are decomposed using Daubechies wavelet of order 10 (db-10) up to level 3, yielding detail coefficients (  ) and approximation coefficients (  ) with  being the level.The PCA is applied to the wavelet coefficients at each scale.If the first eigenvalue exceeds all the mean eigenvalue data (Kaiser's rule), the new wavelet coefficients are computed.Otherwise, the wavelet coefficients at that scale are set to zero.The new wavelet coefficients are obtained (noted as  D and  Â ).For all scales, reconstruction of the new wavelet coefficients and the final principal components (PCs) are obtained.

Support Vector Machine (SVM).
In the past decades, the traditional Artificial Neural Networks (ANN) such as Multilayer Backpropagation (MLBP) and Radial Basis Function (RBF) have been used intensively in hydrological modeling.The local minima and overfitting are frequently encountered in modeling with ANN [32].Recently, Vapnik [33] developed a new machine learning algorithm, called Support Vector Machine (SVM), which provides an excellent solution to these problems.
Basically, SVM is based on the idea that input space is mapped to a high dimensional feature space.It is illustrated in Figure 3 [34].
The nonlinear relationship can be expressed as Advances in Civil Engineering where ŷ is the output model,  is an adjustable weight vector, and  is a bias.The parameters can be estimated by minimizing the cost function [35,36]: ( The slack variables   and  *  describe the -insensitive loss function.The constant  is a user-specified positive parameter.The first term of the cost function is to find the appropriate value of  to improve generalization model.The second term of the cost function is penalty function to arrange deviation output and target larger than  using the insensitive loss function.According to Smola and Schölkopf [36], the SVM soft margin setting is shown in Figure 4. The optimization in (2) is the primal problem for regression.It can be solved using Lagrange multipliers method [35], which is expressed as and  *  are Lagrange multipliers which are nonnegative real constants.The data points that are not zero at (  −  *  ) are called the support vectors.From (3) the nonlinear function estimation of SVM is obtained and expressed as (  ,   ) is the inner-product kernel defined in accordance with Mercer's theorem [35].It is expressed as There are several kernel functions that can be used including polynomial, Gaussian or Radial Basis Function (RBF), and sigmoid.In this study, RBF is used to map the input space into the higher dimensional feature space.The advantage of RBF kernel is that it can effectively handle the conditions when the relationship between predictors and predictand is nonlinear.Moreover, the RBF is computationally simpler than polynomial kernel which has more parameters [11].The RBF kernel is given by The SVM with RBF kernel involves selection of the penalty parameter () and RBF kernel parameter ().The optimal values of SVM parameters are obtained by the grid search methods [37].Architecture of SVM for regression can be seen in Figure 5 [35].

Performances of SD Model.
The performances of SD model were evaluated by comparing the model output with reservoir inflow observation.Criteria of model performance were evaluated using the Coefficient of Determination ( 2 ), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). 2 measures the proportion of variability in the dependent variable (predictand) that is explained by the regression model through the independent variables (predictors).It is a measure of the goodness of fit for the estimated regression model.The RMSE is a measure of the difference between predicted and observed.It is used for iterative algorithms and is a better measure for high values.The MAE measures the average magnitude of the error in a set of forecasts without considering their direction and does not get too much influenced by higher values.

Reservoir Inflow Prediction Using Technical Guideline.
Nowadays, the prediction of reservoir inflow is made by referring to the technical guideline (Pd-T-25-2004-A) issued by the Department of Settlement and Regional Infrastructure, Kimpraswil, Indonesia [4].In the technical guideline, the reservoir inflow characteristics are classified into three hydrometeorological conditions, that is, the wet year, normal year, and dry year.The prediction of reservoir inflow is analyzed based on the historical data of the reservoir inflow by considering those three hydrometeorological conditions.The classification of hydrometeorological conditions refers to the percentage of the reservoir inflow volume as shown in Table 1.
Based on monthly inflow record, the annual inflow is determined to be cumulative for year.The results then are being listed in ascending order.The probability of each dataset can be obtained based on its rank number and the number of the annual inflow data.A plot of annual inflow and its associate probability can be established (Figure 6).The data can be grouped into three classes based on their probability value representing dry, normal, and wet years.Finally, inflow prediction for the consecutive months is obtained according to the type of the year.

Selection of Optimum Domain Grid and Potential Predictors.
There is no specific guideline to determine the optimum size of domain grid of NCEP/NCAR Reanalysis [17].Ghosh and Mujumdar [16] and Sachindra et al. [17] utilized 5 × 5 and 7 × 6 grids, respectively, on their streamflow downscaling model.This work determined the optimum grid based on the correlation coefficient.Calculation results of various domain grids are presented in Table 2 where the target location was in the center of the domain grid (Figure 7).It clearly shows that the optimum grid is of 4 × 4 size which has the highest correlation value.

Preprocessing Data of NCEP/NCAR Reanalysis.
Each predictor at the domain grid 4 × 4 had 25 observation points.SD modeling for reservoir inflow prediction required 14 predictors.Then 350 observational points were employed.
Then both PCA and MSPCA were applied to reduce the data dimension.The results are listed in Table 3.Following [11], this work took cumulative variance more than 98%.It shows that MSPCA required only 7 PCs instead of 11 PCs of PCA to achieve 98% cumulative variance.Utilizing wavelet transform in MSPCA significantly reduced the data dimension compared to PCA alone.It was chosen for the decomposition of predictors due to its more detailed format, which provides better representation of data predictors.

SVM and WSVM Downscaling Model Calibration and
Validation.SVM downscaling model was made of PCA and MSPCA results as input data.The number of input data was 11 PCs of PCA and 7 PCs of MSPCA.The SVM with MSPCA input data was named as WSVM.The SVM with RBF kernel   used in this study has two parameters (, ) to be determined.These parameters are mutually dependent so that changing the value of one parameter changes other parameters.The parameter values are obtained by grid search method.The optimal parameters of SVM and WSVM were obtained by averaging the value from the fivefold cross validation.The optimal parameter of SVM is gained at  = 1.0 and the RBF kernel parameter () = 0.046, while the optimal parameter of WSVM is gained at  = 1.0 and the RBF kernel parameter () = 0.050.
The running results of the SVM and WSVM models during the training and testing stages are given in Table 4, whereas the plot of running results of the SVM and WSVM is shown in Figures 8 and 9.Moreover, the performance of Pd-T-25-2004-A can be seen in Table 4 and Figure 9.
The results show that the WSVM needs less PCs inputs than the SVM to generate the results with similar accuracy.It also proves that WSVM is a parsimonious model for reservoir inflow prediction with as less inputs or PCs as possible.

Comparison of Reservoir Inflow Modeling.
The results of running SD model to predict the reservoir inflow by employing SVM and WSVM (Figure 9 and Table 4) reveal that the SVM and WSVM models generate better prediction results when compared to the prediction model of the reservoir inflow currently used (Pd-T-25-2004-A).The technical guideline is not able to predict the reservoir inflow well when the abnormal shift of the seasons or the duration of rainy and dry seasons occurs (from the normal 6-month shift) as shown by the reservoir inflow prediction in 2010 which was lower than the observed reservoir inflow (Figure 9).The year 2010 was the wet year, where the duration of the rainy season was longer than the usual period.The direct downscaling model using SVM and WSVM is applicable to use for predicting reservoir inflow due to climate change.This model has an advantage of being able to include the global climate determinant (atmospheric circulation) variable in such a direct manner that allows the prediction of the reservoir inflow.However, this model also shows limitation since the effect of the physical characteristic of Sutami Watershed is not included.The reservoir inflow change is assumed to be influenced only by the change on the precipitation pattern due to the climate change.Yet, the reservoir inflow change is a complex combination between the effect of the global climate change and the changes on the physical characteristics of watershed.

Reservoir Inflow Prediction under the Climate Change Scenario.
Reservoir inflow prediction under the climate change scenario is based on the Special Report of Emission Scenario (SRES) A1B of Max Plank Institute (GCM ECHAM-5).Figure 10 shows the amount of monthly change of reservoir inflow under the climate changes scenario in the future for 22 years (2013-2035).
The trend of the future reservoir inflow (2013-2035) has the same pattern of the trend of historical reservoir inflow (1991-2012).In future, predicting of trend of reservoir inflow is important for optimal reservoir operation.According to Zhang et al. [38], the knowledge of trend in the streamflow is important for efficient management of water resources.

Conclusion
This work was successful in building and validating a statistical downscaling framework of reservoir inflow directly from GCM outputs.A new proposed hybrid SD model called WSVM was successfully applied to predict reservoir inflow.It utilized MSPCA that required less input data than PCA to generate similar performance results.Utilizing NCEP/NCAR Reanalysis output, the model succeeded to provide better prediction than the one of Indonesian Technical Guide that ignored the climate change effect.The model had also successfully forecasted the reservoir inflow trend for 2013-2035 period by using GCM ECHAM5.However, WSVM did not consider the natural hydrologic cycle such as land use change and groundwater storage on the reservoir inflow prediction.

Figure 1 :
Figure 1: Location of the rainfall stations and the streamflow station at Sutami Watershed, Indonesia.

Figure 8 :
Figure 8: Plot of the results of running SVM and WSVM (training stage).

Table 1 :
The classification of hydrometeorological conditions.

Table 2 :
Correlation value for various domain grids.
NOAA/ESRL physical science division

Table 3 :
The reduction of NCEP/NCAR predictors using PCA and MSPCA.