Long-Term Precipitation Analysis and Estimation of Precipitation Concentration Index Using Three Support Vector Machine Methods

1Faculty of Civil Engineering and Architecture, University of Nis, Aleksandra Medvedeva 14, 18000 Nis, Serbia 2Faculty of Computer Science and Information Technology, Department of Computer System and Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia 3Faculty of Mechanical Engineering, Department for Mechatronics and Control, University of Nis, Aleksandra Medvedeva 14, 18000 Nis, Serbia 4Department of Civil Engineering, Indian Institute of Technology, Hauz Khas, New Delhi 110016, India


Introduction
Precipitation is one of the important climatic variables due to its changes in the intensity and the amount affecting appearing of the hydrological hazards such as flood and drought [1].Therefore, numerous studies on precipitation variability and development of statistical indices to evaluate the changes of precipitation have been undertaken [2][3][4][5][6][7].In this study, the precipitation concentration index (PCI) is analysed.The PCI allows quantifying the relative distribution of precipitation patterns.It also provides a good presentation to the spatial variability of monthly precipitation [5,8] and information on long-term total variability in the precipitation amount record [9,10].The PCI can be used as an indicator of hydrological hazard risks such as floods and droughts.
In this study, the prediction model of PCI is introduced using the soft computing method, namely, the Support Vector Machine (SVM).The SVM, one of the novel soft computing learning algorithms, has found wide application in the field of computing, hydrology, and environmental science [11][12][13][14][15][16].Furthermore, it has been majorly applied in pattern recognition, forecasting, classification, and regression analysis [17][18][19][20].The most commonly used kernels include linear, polynomial, and radial basis function (RBF), whose selection depends on the nature of the observed data [21].Shamshirband et al. [22] used adaptive neurofuzzy inference system (ANFIS) and support vector regression (SVR) for precipitation estimation, while S. Chattopadhyay and G. Chattopadhyay [23], Nastos et al. [24], and Wu and Chau [25] applied artificial neural networks (ANNs).Chen et al. [26] 2 Advances in Meteorology  implemented SVM and multivariate analysis to project daily precipitation.Meyer et al. [27] compared four machine learning algorithms for their applicability in rainfall retrievals.
Metaheuristic optimization algorithms such as ant colony optimization (ACO), genetic algorithm (GA), particle swarm optimization (PSO), and cuckoo search (CS) have been applied in different fields of science [28][29][30][31][32][33][34][35][36][37].These algorithms are based on the mechanism of selection of the fittest in biological systems.A more recent approach in biological inspired metaheuristic optimization algorithms is firefly algorithm (FFA) developed by Yang [38].The FFA has been adjudged to be more efficient and robust in finding both local and global optima compared to other biological inspired optimization algorithms [39][40][41][42][43].The prediction accuracy of the SVM model highly relies on proper determination of model parameters [44][45][46][47].Although organized strategies for selecting parameters are important, model parameter alignment also needs to be made.In this study, the FFA is used for determination of SVM parameters, while the SVM was coupled with discrete wavelet transform.
Wavelet transform (WT) has a number of basis functions for selection that depends on the analysed signal.Wavelet analysis was used to decompose the time series of data into its various components, after which the decomposed components can be used as inputs for the SVM model.Over the past few years, this technique has become of enormous interest in engineering applications [48][49][50][51].Nalley et al. [52] used discrete wavelet transform (DWT) to analyse trends in precipitation in Canada, while Hsu and Li [53] clustered spatialtemporal precipitation data using WT.Partal and Kucuk [54] analysed long-term precipitation trend using DWT in Turkey.Kisi and Cimen [55] applied wavelet-Support Vector Machine conjunction model for daily precipitation forecast and concluded the proposed model increases the forecast accuracy.
The objectives of the current study are as follows: (1) to provide presentation of the spatial variability of monthly precipitation and information on long-term total variability in the precipitation data using precipitation concentration index and (2) to construct, develop, and evaluate the results of SVM-Wavelet, SVM-FFA, and SVM-RBF for PCI prediction.According to Gocic and Trajkovic [56], precipitation increases with the altitude; that is, dry areas in the northeast part of Serbia have the precipitation below 600 mm, and the area along the valley of the South Morava to Vranje has the precipitation to 650 mm, while in the mountains precipitation may rise up to 1000 mm per year.The mean annual precipitation for the observed period for the whole country is 662.4 mm.The spatial distribution of the mean annual precipitation in Serbia for the analysed period is illustrated in Figure 1(b).

Methodology for Precipitation Analysis.
The spatial distribution of the number of wet and dry years can be obtained where  is the annual precipitation,  is the annual mean precipitation, and  is the standard deviation of the annual precipitation.The dry year existed, where  ≤ −0.5, and wet one existed if  ≥ 0.5 [57].Precipitation concentration index (PCI) [58] is calculated as follows: where   is the precipitation amount in month .Classification of PCI values is shown in Table 1.

Soft Computing Methodologies
2.3.1.Support Vector Machine.Support Vector Machine (SVM) [59,60] is based on machine learning theory to maximize predictive accuracy; that is, where  is a normal vector, (1/2)‖‖ 2 is the regularization term,  is the error penalty factor,  is a bias,  is the loss function,   is the input vector,   is the target value,  is the number of elements in the training data set, (  ) is a feature space, and   and  *  are upper and lower excess deviation.The architecture of SVM is shown in Figure 2. The kernel function, that is, radial basis function (RBF) is denoted as where variables   and   are vectors in the input space and  is the regularization parameter.Lagrange multipliers are presented as   =   −  *  .The accuracy of prediction is based on the selection of three parameters, that is, ,  and , whose values are determined using firefly algorithm.

Bias Input layer
Hidden layer Output layer Hidden nodes (Support vectors) Weights (Lagrange multipliers)

Firefly Algorithm.
The firefly algorithm (FFA) [38,61,62] is based on the behaviour of insect named firefly.The major issues in FFA development are the formulation of the objective function and the variation of the light intensity.
A firefly is a kind of insects that uses the principle of bioluminescence to attract mates or prey.The luminance produced by a firefly enables other fireflies to trail its path in searching of their prey.This concept of luminance production helps in the development of algorithms that solve optimization problems.
For example, in the optimal design problem involving maximization of objective function, the fitness function is proportional to the brightness or the amount of light emitted by the firefly.Therefore, decreasing in the light intensity due to distance between the fireflies will lead to variations of intensity and thereby lessen the attractiveness among them.The light intensity with varying distance can be represented as where  is the light intensity at distance  from a firefly,  0 represents initial light intensity, that is, when  = 0, and  is the light absorption coefficient.As firefly's attractiveness is proportional to the light intensity observed by adjacent Advances in Meteorology fireflies, the attractiveness  at a distance  from the firefly can be represented as where  0 represents the attractiveness at distance  = 0.The Cartesian distance between any two fireflies  and  is given by The movement of firefly  as attracted to another brighter firefly  and can be represented as where the first term in the equation is due to the attraction, the second term represents the randomization with  as randomization coefficient, and   is the random number vector derived from a Gaussian distribution.The next movement of firefly  is updated as Steps in FFA development are presented in Figure 3.

Discrete Wavelet
Transform.The wavelet transform (WT) represents a mathematical expression for decomposing a time series' frequency signal into different components.In this study, wavelet analysis was used to decompose the time series of precipitation data into various components, after which the decomposed components were used as inputs for the SVM model.Flow chart of discrete wavelet algorithm, that is, used to determine SVM parameters, is shown in Figure 4. Continuous wavelet transform (CWT) of a signal () is a time-scale technique of signal processing that can be defined as the integral of all signals over the entire period multiplied by the scaled, shifted versions of the wavelet function (), given mathematically as where () is the mother wavelet function,  is the scale index parameter, that is, inverse of the frequency, and  is the time shifting parameter, also known as translation.The discrete wavelet transform (DWT) can be derived by discretizing (10), where the parameters  and  are given as follows: where the variables  and  are integers.Replacing  and  in (10) gives

Evaluating Accuracy of Proposed Models.
In this study, the following statistical indicators were applied to compare the developed SVM models: (1) root mean square error (RMSE): (2) coefficient of determination ( 2 ): (3) coefficient of efficiency (EI): where   and   are the experimental and predicted values of PCI index, respectively, and  is the size of test data.

Analysis of Precipitation Distribution.
The number of dry and wet years is tabulated in Table 2.The most frequented number of dry years is in the north of Serbia, while the number of wet years is greater than the number of dry years in the west of country.The number of dry years is 20, while the number of wet years is 19 for whole Serbia.
According to Gocic and Trajkovic [56], three precipitation subregions were detected: (1) subregion R1 (12 stations) is located in the north part of the country with the precipitation ranging from 223 to 1051 mm and the average value of precipitation of 608.2 mm, (2) subregion R2 (7 stations) is the wettest one and includes stations in the west of country with the precipitation between 385 and 1282 mm and with the mean value of precipitation of 784.5 mm, and (3) subregion R3 (10 stations) in the east and south part of Serbia with precipitation between 302 and 1113 mm and the mean of precipitation of 623.3 mm.
The annual precipitation shows an increasing trend in Serbia during the period of 1946-2012 (stronger in R2 and R1).Three CLINO periods (1961-1990, 1971-2000, and 1981-2010) were illustrated in Figure 5.The CLINO period 1981-2010 shows a significant increasing trend at all subregions.The most precipitation falls in June and has the value of 80.8 mm in Serbia (41.15% of total precipitation in summer), which is directly connected with the intensive convection of colder and humid, usually maritime, air masses.
Precipitation distribution is determined using the PCI. Figure 6 illustrates the spatial distribution of PCI in Serbia.The minimum PCI values were detected in Zlatibor (10.43) and Pozega (10.83), while the maximum was in Negotin (12.49).The majority of the stations had the values between 11.12 in Sjenica and 11.94 in Banatski Karlovac.

Performance Evaluation of Proposed SVM Models.
Precipitation data was used to obtain six parameters such as annual total precipitation, mean winter precipitation amount, mean spring precipitation amount, mean summer precipitation amount, mean autumn precipitation amount, and mean of precipitation for vegetable period (April-September).For the experiments, 38 years (57% of data) was used to train samples and the subsequent 29 years (43% of data) served to test samples.Table 3 illustrates six variables using the following statistical indicators, that is, the minimum, maximum, median, mean, standard deviation, and skewness.In this study, three SVM models, that is, SVM-Wavelet, SVM-RBF, and SVM-FFA, were analysed to predict the PCI index.The RBF was implemented as the kernel function to obtain three parameters, , , and , whose selection directly influences prediction accuracy.Table 4 provides the optimal values of parameters for the proposed SVM models.Firefly algorithm founds optimal SVM parameters according to searching algorithm.For the SVM-Wavelet and SVM-RBF approaches the parameters are selected manually after several trial and error iterations.
To evaluate SVM model performance, calculated PCI was plotted against the predicted ones.Figure 7 Figure 8 illustrates the spatial distribution of PCI in Serbia using three SVM methods, that is, SVM-Wavelet, SVM-FFA, and SVM-RBF.According to the obtained results, it can be concluded that the spatial distribution using values of SVM-Wavelet method is similar to the spatial distribution in Figure 6.

Performance Comparison of SVM Models.
To illustrate the performance characteristics of the developed SVM models for PCI prediction, three SVM models' prediction accuracies were compared with each other.The statistical indicators  decomposes nonlinear series in multiple linearized series in order to make it easier to regress.
The SVM-Wavelet model outperforms the SVM-RBF and the SVM-FFA models according to the obtained results.The predictions from the SVM models correlate highly with the actual PCI data.

Conclusion
The study carried out a systematic approach to create the SVM models for the PCI prediction such as SVM-Wavelet,  SVM-RBF, and SVM-FFA.The proposed SVM-Wavelet model was obtained by combining two methods, that is, the SVM and the wavelet transform.The RBF was selected as the kernel function for the SVM, while the FFA was used to obtain the SVM parameters.
Each of these SVM approaches has some advantages and disadvantages.SVM-FFA has firefly searching algorithm in order to find optimal SVM parameters.Wavelet approach divides series into subgroups in order to make it more linear and at the end all groups are merged.SVM-RBF approach is the basic approach with manual estimation of SVM parameters.Therefore SMV-RBF results are not as good as the other two approaches as was presented.
A comparison of the SVM-Wavelet, the SVM-RBF, and the SVM-FFA was performed in order to assess the prediction accuracy.Accuracy results, measured in terms of RMSE,  2 , and EI, indicate that SVM-Wavelet predictions are superior to the SVM-RBF and the SVM-FFA.
The main advantages of the SVM schemes are as follows: computationally efficient and well-adaptable with optimization and adaptive techniques.The developed strategy is not only simple, but also reliable and may be easy to implement in real time applications using some interfacing cards for control of various parameters.This can be combined with expert systems and rough sets for other applications.
The further research will test the proposed soft computing methods in a different part of the world and different climate types to confirm the results.Also, some hybrid soft computing models will be applied to compare with the developed models presented in this study.

Figure 1 :
Figure 1: (a) Spatial distribution of the 29 meteorological stations in Serbia map; (b) spatial distribution of the mean annual precipitation in Serbia for the period of 1946-2012.

Figure 2 :
Figure 2: The network architecture of SVM.

Figure 4 :
Figure 4: Flow chart of proposed discrete wavelet algorithm.
(a) presents the accuracy of developed SVM-Wavelet PCI predictive model, while Figures 7(b) and 7(c) present the accuracy of developed SVM-RBF and SVM-FFA PCI predictive models, respectively.The most of the points fall along the diagonal line for the SVM-Wavelet prediction model.It means the prediction results are in a very good agreement with the measured values for the SVM-Wavelet model.The confirmation of this is the high value for  2 ( 2 = 0.86).

Figure 5 :
Figure 5: The trend of annual precipitation by regions.

Figure 6 :
Figure 6: Spatial pattern of the precipitation concentration index.

Table 1 :
Classification of PCI values.

Table 2 :
Number of dry and wet years for the synoptic stations used in the study.

Table 3 :
Summary statistics for used data sets.

Table 4 :
User-defined parameters for SVM models.

Table 5 :
Comparative performance statistics of the SVM-Wavelet, SVM-RBF, and SVM-FFA models for PCI prediction. 2 , and EI were used for comparison.Table5summarizes the prediction results for test data sets since training error is not credible indicator for prediction potential of particular model.Results in Table5are obtained for the same number of runs and according to the multiple runs average results are calculated for each method.The same number of interactions is used in order to make the comparison fair and accurate. SM-Wavelet produced better results than the other two approaches since wavelet algorithm