A Forecasting Approach Combining Self-Organizing Map with Support Vector Regression for Reservoir Inflow during Typhoon Periods

This study describes the development of a reservoir inflow forecasting model for typhoon events to improve short lead-time flood forecasting performance. To strengthen the forecasting ability of the original support vector machines (SVMs) model, the selforganizing map (SOM) is adopted to group inputs into different clusters in advance of the proposed SOM-SVM model. Two different input methods are proposed for the SVM-based forecasting method, namely, SOM-SVM1 and SOM-SVM2.Themethods are applied to an actual reservoir watershed to determine the 1 to 3 h ahead inflow forecasts. For 1, 2, and 3 h ahead forecasts, improvements in mean coefficient of efficiency (MCE) due to the clusters obtained from SOM-SVM1 are 21.5%, 18.5%, and 23.0%, respectively. Furthermore, improvement in MCE for SOM-SVM2 is 20.9%, 21.2%, and 35.4%, respectively. Another SOM-SVM2 model increases the SOM-SVM1 model for 1, 2, and 3 h ahead forecasts obtained improvement increases of 0.33%, 2.25%, and 10.08%, respectively. These results show that the performance of the proposed model can provide improved forecasts of hourly inflow, especially in the proposed SOM-SVM2model. In conclusion, the proposed model, which considers limit and higher related inputs instead of all inputs, can generate better forecasts in different clusters than are generated from the SOM process. The SOMSVM2model is recommended as an alternative to the original SVR (Support Vector Regression) model because of its accuracy and robustness.


Introduction
Cyclones, typhoons, and hurricanes refer to the same meteorological phenomenon in different parts of the world.They are weather systems with strong winds that circulate anticlockwise around a low pressure area in the northern hemisphere and clockwise in the southern hemisphere.Taiwan is located in the northwestern Pacific, on one of the main typhoon paths, and is hit by three to five typhoon events each year on average.However, the rainfall distribution is uneven in both time and space due to the complex terrain conditions in Taiwan.Torrential rain due to typhoons leads to frequent serious disasters such as flooding, landslides, and debris flow.However, the rain is an important water resource that should be stored.Reservoirs are the most important and effective water storage facilities for solving the uneven rainfall problem.Therefore, reservoir inflow forecasting plays an important role in water resource planning and management.
There are numerous difficulties in constructing a physically based mathematical model because of the extremely complex and highly nonlinear relationship between typhoon rainfall and reservoir inflows.As an attractive alternative to physically based models, data-driven models that are based on artificial intelligence methods, such as neural networks, are favored and are practicably applicable in reservoir inflow forecasting [1][2][3][4][5].Support vector machines (SVMs) are novel, artificial intelligence-based methods.The SVMs, developed for classification and then extended for regression by Vapnik [6,7], are based on statistical learning theory.Based on the structural risk minimization (SRM) principle, SVMs theoretically minimize the expected error of a learning machine and reduce the problem of overfitting.In addition, the architecture of the SRM principle guarantees a unique and globally optimal solution by solving the convex optimization problem.A more detailed treatment of SVMs can be found in several text books [8,9].
The SVMs are proposed as alternative data-driven tools in many fields [10][11][12][13][14] and have excellent generalization ability.In the field of hydrology, problems such as time series forecasting have been reported in recent years [15][16][17][18][19][20][21][22][23][24][25][26][27][28].The SVMs are outstanding data-driven tools, incorporating the property of regression.However, when the SVM regression considers information with excessive noise or low relationships, the ability of generalization will be reduced.In hydrological cases, it will be found in high values; generalization depends on existence of very many low values.Moreover, for long lead-time forecasting, SVMs can only consider limited noisy information.In such cases, the model cannot forecast the inflow well.
To solve the above problems, in this study, the selforganizing map (SOM) is adopted in advance to group the inputs of SVMs.In each group, the inputs concerned in different inflow processes have a high relationship.Hence, the forecasts obtained by SVMs, which are developed using inputs in the same cluster, may have higher accuracy.
The SOM introduced by Kohonen [29,30] is a special category of artificial neural networks (ANNs).It can project high-dimensional input space on a low-dimensional topology so as to allow the number of clusters to be determined by inspection.This capability enables the discovery of the relationships among complex data and has been used in recent years [31][32][33][34][35][36][37][38][39][40].Furthermore, the clustering performance of SOM is better than that of conventional clustering methods [41][42][43].
For improving reservoir inflow forecasting, an approach consisting of a SOM-based clustering method and SVMs is proposed in this study.The SOM density map is obtained using only the past two-hour inflows as inputs for different events and lag time forecasting.Then, SVMs are performed on the basis of the results of the SOM-based clustering method to forecast reservoir inflow.Finally, the proposed approach is applied to the Feitsui Reservoir watershed in northern Taiwan to find the 1 h ahead inflow forecast.A flowchart of the proposed model is illustrated in Figure 1.

Methodology
In this paper, the SOM-SVM model, which combines the SOM with SVMs, is proposed.Details about SOM can be found in Kohonen [44].As to SVMs, one can refer to Vapnik [8] for more details.

Two SOM-SVM Models.
In this study, to avoid the lack of information variability which yields a lower learning ability of the neurons, an enlarged training data set, collected as training forms, is considered.This enlarged training data set implies that besides the training data set of each neuron, all the other training data sets in the same specified region are also adopted as training data sets.To more efficiently obtain well-performed forecasting results, two different specified region strategies are defined.The flowchart for these two specified region strategies concept is shown in Figure 1.The first specified regions are defined by the training data set that, according to the different inflow processes, is SOM-SVM1 model.That is, when the data set on the SOM feature map is simply divided into different training forms, each training form collects the neurons having the same pattern as the different rainfall-runoff processes.The concept of specified region strategy for SOM-SVM1 model is adopted from Hsu et al. [31].But, different from five regions' definition for Hsu, the specified regions of this study are only four.These training forms,   , of the SOM-SVM1 model are denoted by Here, (1) expresses the th neurons having the same rainfallrunoff process in the th region.In general, there are four different rainfall-runoff processes that can be considered specified regions in the typhoon events: increasing inflow region, base flow region, peaking hydrograph region, and recession region.However, for some learning results of the SOM generation, the rainfall-runoff process cannot be clearly divided, especially for the relationships between the increasing inflow region, the peaking hydrograph region, and the recession region.This indicates that, on occasion, there are only three rainfall-runoff training forms found on the SOM feature map.Therefore, when the peaking hydrograph region is not easily separated from the feature map, only three regions can be separated from the SOM feature map.Different from the SOM-SVM1 specified region definition strategy, the specified regions of the SOM-SVM2 model allow for the selection of stronger relationship neurons.That is, based on the consideration of well-performed results and simple definition of the feature relationship, the enlarged training forms of the SOM-SVM2 model are established.For each neuron generated in the feature map, the SOM-SVM2 model simply collects the enlarged training data sets of some specified higher relation neurons as training forms to implement the SVM model.The concept of the specified higher relation neurons means the neurons in the cross area are adopted as specified higher relation neurons beside themselves.Then, the enlarged training data set is adopted from these specified higher relation neurons.Therefore, for a  ×  SOM feature map, the training forms are denoted by where  −1 ,  − > 0,  +1 ,  + <  2 , and the value of  is an integer.When the neurons are located on the feature map, the enlarged training forms have the training forms adopted from the four surrounding neurons and the training forms adopted from itself for each neuron.As the occasions in which  −1 ,  − < 0 or  +1 ,  + >  2 , the neuron should be removed from the equation.It drives for the edge neurons, the enlarged training forms can only adopt the training data sets of the surrounding two or three neurons.This means that these training forms have the training data sets adopted from five neurons, or three or four neurons for the edge neurons.The above training forms generated from two different models are adopted to implement the SVM model.Then, in different specified regions, the SVM models are adopted to generate the forecasts from the forecast forms in each neuron.These forecast forms have only the forecast data sets inside each neuron.

The Study Area and Data.
In this paper, all the SVMbased models are applied to the Feitsui Reservoir watershed in northern Taiwan.Feitsui Reservoir is located downstream three major tributaries (Kingkwa Creek, Diyu Creek, and Peishih Creek).It has a surface area of 10 km 2 , a mean depth of 40 m, a maximum depth of 120 m, a full capacity of 406 million m 3 , and a total watershed area of 303 km 2 (see Figure 2).Feitsui Reservoir supplies water for Taipei city (the capital of Taiwan); thus, it is the most important reservoir in northern Taiwan.The rainfall data are collected from 1988 to 2008.The maximum and average yearly rainfalls are 5736.6mm and 3808.6 mm, respectively.In this paper, the 22 typhoon events used for model development are presented in Table 1.These 22 typhoon events used for inflow forecasting are divided into two sets, 21 training events and a testing event, in each event's forecasting.

Determination of Lag Length.
For the model construction in this paper, it is necessary to decide on the length of the forecast form at the beginning.Each forecast form of all the typhoon events is where  is the current time, Δ is the lead-time period (from 1 to 3 h),  , is rainfall in the th gauge at time ,   is the inflow at time ,  is the derivation of forecasts, and   and   denote the lag length of inflow and rainfall, respectively.
For the reservoir inflow forecasting model, model construction with appropriate lag lengths of input is an important component.In this paper, the criterion RPE is applied to determine the lag length of inputs.The RPE is defined by where () and ( + 1) are the root-mean-square-error (RMSE) for the model with  and  + 1 lag lengths, respectively.The RMSE can be obtained as where   is the inflow at time , F is the predicted inflow at time , and  is the number of time steps.In general, the RMSE decreases with increasing lag term.When the RPE value is less than 5%, the increase of lag lengths is stopped.By using this procedure, the most appropriate lag lengths of typhoon rainfall,   , and inflow,   , for a certain lead-time, Δ, can be determined.The appropriate lag lengths for two hours of rainfall and inflow are used to forecast the 1-to 3hour ahead inflows.Then, the general form with appropriate lag lengths for the SVM-based models is described as Additionally, for reasonable model comparison, the two indices of the SVM regression,  and , are simply defined as 1 and 0.1.

Criteria.
To discuss the individual and average performance of the SVM and two different SOM-SVM based models, three indices, relative root-mean-square-error (RRMSE), mean root-mean-square-error (MRMSE), and mean coefficient of efficiency (MCE), are used: (1) Relative root-mean-square-error (RRMSE): (2) Mean root-mean-square-error (MRMSE): (3) Mean coefficient of efficiency (MCE): where   is the average of observed inflows and  is the number of forecasting typhoon events.process.The difference between the SOM-SVM1 and SOM-SVM2 models is the training form selection strategy, which is mentioned above.To highlight the advantage of the selection strategy of the SOM-SVM1 model, the original SVM forecasts are generated for comparison with the performance of the SOM-SVM1 model.Furthermore, based on the concept of clusters considering the lead-time information, it may make the forecasts not able to obtain some important information from the neighbor neurons for different regions in its cluster strategy.The selection strategy of the SOM-SVM2 model that considers higher relationship inputs is then established.The performance of these three SVM-based models is finally estimated to allow for intercomparison.

SOM Clustering.
In this subsection, a SOM with a small dimension is considered the best option.If the clustering result is reasonable and satisfactory, the cluster analysis can be accepted.Otherwise, another SOM with a larger dimension is chosen to analyze input patterns.This step is repeated until a satisfactory result is obtained.That is, the inputs within each grid have the same characteristics associated with a certain inflow process.Then, for different events in different leadtime forecasts, SOMs are generated from the same process.
Taking the 1 h ahead forecasts of the Polly Typhoon event for instance, the SOM is constructed from the other 21 events.According to our experiments, a 5 × 5 dimension SOM is adopted herein.That is, the competitive layer contains 25 grids.After the SOM clustering is implemented, the corresponding feature map and density map can be obtained.
In the feature map, forecast forms with similar rainfallrunoff processes are located in neighboring grids.On the other hand, forecast forms with significantly different rainfall-runoff processes are located in different grids that are distant from each other.Such a characteristic is also retained in the density map, because the density map results from the feature map.The ID number and location of the 5 × 5 SOM generated from the 21 typhoon events (except for the Polly Typhoon) are presented in Figure 3.In addition, its density map is shown in Figure 4.In Figure 4, the number inside each neuron indicates how many training data sets are projected onto the same topology point.

SOM-SVM1
Model.According to the above SOM results, it is found that the map can be divided into four specified regions with the rainfall-runoff process characteristics of different neurons.The specified region divisions are shown in Figure 5.The association between the four specified regions and the rainfall-runoff processes is also clearly illustrated in Figure 6.The regions are (1) increasing inflow region (region 1), (2) base flow region (region 2), (3) peaking hydrograph region (region 3), and (4) recession region (region 4).For example, region 1, located in the upper left area of the SOM (Figure 5), represents increasing reservoir inflow during the period (Figure 6).In this clustering process, all the forecast forms of each event are divided into different neurons.Then, according to (1), the forecasting pattern within the neurons can be expressed as where, as in the equation mentioned above,   expresses the th specified region and   expresses the th neuron.After the above process, 22 SOMs could be generated, and each SOM is generated from the other 21 typhoon events.Among all 22 events, the longest event is 107 h, and the shortest event is 33 h; therefore, the SOMs are not generated from the same date length and data sets.Consequently, each generated SOM may be divided into four specified regions but is always a little different from the other 21 maps.The training forms of each neuron in the same specified region are collected as training form sets for model establishment in different specified regions.These results are used for SVMs regression to obtain the forecast forms of each neuron in the same specified region.Thus, the model SOM-SVM1 is established.

SOM-SVM2 Model.
The above SOM-SVM1 model is established considering high relativity from four reservoir inflow processes.However, some training forms might be allocated to different neurons and, thus, further allocated to different specified regions for different events.This means that these training forms sometimes still have important information for the other specified regions but cannot pass this information to them.Besides, with the different strengths and lengths of the 22 typhoon events, the training forms of each typhoon event would not distribute to the 25 neurons on the feature map evenly.The maximum density number for the 5 × 5 feature map is 295, and the minimum number is 12.For different flow processes, the high density map may have less information for other neurons, while the low density map may have more information.However, a large amount of information can be found for close neurons.Furthermore, as the clusters consider lead-time information, some of the forecasting forms of the SOM-SVM1 model cannot pass information to the forecasts in different regions.To cope with these outcomes, the SOM-SVM2 model is proposed.
According to the feature map generated from the SOM discussed in Section 3.2.1, a different enlarged training form is simply adopted here.The concept of this enlarged training form implies that, except for the training data set of each neuron, only the training data sets of surrounding neurons in the cross region are adopted as training forms.This means that, in the cross region of each neuron, four is the maximum number of surrounding neurons containing forecast forms that can be adopted as training forms for each neuron.However, for the neurons located on the edge of the competitive layer, only two or three surrounding neurons are adopted as training forms.Then, according to (2), the forecasting form of a  ×  SOM feature map can be written as where  −1 ,  − > 0,  +1 ,  + <  2 ,  expresses the neuron ID number, and  expresses integers.Table 2 presents the lists of neuron ID numbers for the enlarged training data set chosen from different neurons for the Polly Typhoon event generated by the SOM-SVM2 model.Taking the neuron ID numbers 1, 10, and 17 listed in Table 2 as examples, the enlarged training data set of these neurons is presented in Figure 7.The same enlarged training data set generations are produced within the same structure for each typhoon event.Then, the forecast forms are adopted to implement the SVMs for the SOM-SVM2 model results generation.Different from the SOM-SVM1 model, the enlarged training data set of SOM-SVM2 is not selected by considering the inflow process.In this study, irrespective of whether the number of neurons is selected, the training forms are simply selected from the neuron location.This enlarged training form selection strategy not only considers the training data with higher relationships but also does not require to exclude the training data set in a different specified region.Similarly, the above forecast forms are adopted to implement the SVMs for inflow forecasting.

Comparison between the Original SVM and the SOM-SVM1.
In the SOM-SVM1 model, the SOM process clusters the forecast forms into different neurons according to the input characteristics and then divides them into different regions related to different inflow processes of the typhoon events.It follows that the forecasts can be generated considering high relative inputs, and the lower relative inputs can be ignored.Taking Typhoon Polly, which is the typhoon event with different rise and fall rainfallrunoff processes, as an example, the SOM-SVM1 model can generate well-performed forecasts by considering forecast forms belonging to one of the inflow specified regions.Compared to the forecasts generated from the original SVM model, the original SVM generates the forecasts irrespective of whether or not the forecast forms have extreme values.However, the forecasts generated from the SOM-SVM1 model can arrange the forecast forms belonging to different rainfallrunoff processes.That is, the SOM-SVM1 model is established to strengthen the forecast ability for different rainfall-runoff processes.
The inflow results generated from the original SVM model and SOM-SVM1 model for 1 h ahead forecasting are arranged in Table 3.These show that, compared to the forecasts generated from the original SVM model, the RRMSE values derived from the SOM-SVM1 model are small for each typhoon event.Even the original SVM model can derive a better CE value for the Xangsane typhoon.This simply means that the CE value of the original SVM can perform better only for the Xangsane typhoon, and both models can generate a CE value up to 0.9.Actually, the CE values generated from       arranged inputs (selected from closer meaningful neurons) instead of all the inputs, the models can be strengthened for the relative selection strategies.The purpose of this study is to find an excellent selection strategy for typhoon events.Figures 8(a), 8(b), and 8(c) are again compared here.In Figures 8(a) and 8(b), the SOM-SVM1 models can generate higher concentrated forecasts, as mentioned above.However, in Figure 8(c), it can still be found that the forecasts of the SOM-SVM2 model have higher concentration than the other two SVM-based models.All three SVM-based model forecasting results for 1 h ahead are shown in Table 4.It can be found that the SOM-SVM1 model is outperforming the original SVM model, as mentioned above.However, except for the fact that all RRMSE values for the SOM-SVM2 model are lower than both of the original SVM model and the SOM-SVM1 model, the SOM-SVM2 model does not perform better than the SOM-SVM1 model at each CE value of the typhoon events.Nevertheless, in contrast to the other two SVM-based models, all the CE values of SOM-SVM2 surpass 0.9.Table 5 lists the improvements to MCE computed from the SOM-SVM1 and SOM-SVM2 models when compared with the original SVM model.The same improvements to MCE values are shown in Figure 12.These show that the SOM-SVM2 model can generate a more than 20% improvement over the original SVM model, and this improvement reaches 35.4% in 3 h ahead forecasts.This is excellent when compared to the other two models.Finally, the performances of the SOM-SVM1 and SOM-SVM2 models are compared again.As mentioned above, all the CE values of the SOM-SVM2 model exceed 0.9, but only 19 values exceed 0.9 for the SOM-SVM1 in 1 h forecasts.However, this does not mean that the quantity of CE values in 1 h ahead forecasts for the SOM-SVM2 model is higher than for the SOM-SVM1 model in every typhoon event.As the list in Table 4 shows, the SOM-SVM1 model has the better CE values for 8 events within all 22 events.However, the MCE indicates that the average CE value of the SOM-SVM2 model is better than for the SOM-SVM1 model.In any case, the RRMSE values still show that the SOM-SVM2 model can generate less inaccurate forecasts, as shown in Table 4.
Comparing the SOM-SVM1 and SOM-SVM2 models, the improvements for the SOM-SVM2 model are arranged in Table 6 and Figure 13.The MCE value generated from the SOM-SVM2 model is only 0.009 greater than from the SOM-SVM1 model for 1 h ahead forecasts.This means the improvement of the SOM-SVM2 model over the SOM-SVM1 model is just 0.96%.However, for 2 h and 3 h ahead inflow forecasts, the SOM-SVM2 model obviously improved, with 2.25% and 10.08% improvements, respectively.This means that after the training data set selected from the SOM-SVM2 model is adopted, the forecasts can be better generated than for the training data set selected from the SOM-SVM1 model.As the time is extended, the performance of SOM-SVM2 shows an strong improvement compared to the two other models.
Briefly, the model SOM-SVM2 is built to forecast typhoon events with simple structure concept and limited higher relative data.For the SOM-SVM1 model, the data distribution on the SOMs should be clustered for considering different reservoir inflow processes in advance and it makes the forecasts of typhoon events be generated from fewer and higher relative data.However, the SOM-SVM2 model is robust for the cross region selection strategy does not need to understand the distribution on the SOMs for reservoir inflow processes and still can collect fewer and higher relative data.Furthermore, the SOM-SVM2 model can generate better performance of typhoon events forecasts than both the original SVM model and SOM-SVM1 model.

Summary and Conclusions
The objective of this paper is to develop a precise and stable reservoir inflow forecasting model for reservoir operations during typhoon periods.For this purpose, instead of the original SVM model, two different enlarged training form selection strategies from SOM are combined to construct a piecewise nonlinear model.The first is the model that considers inputs selected from the inflow processes: the SOM-SVM1 model.The second is the model adopted for the training form of neurons on the SOM feature map: the SOM-SVM2 model.
In conclusion, there are at least three reasons favoring the use of the SOM-SVM2 model for inflow forecasts.First, both of the developed SOM-combined SVM models are established from the SOM model that strengthens the forecast ability well.Second, the SOM-SVM2 model can adopt the forecasting forms without considering the clusters on the SOM feature map.For the models established, the SOM-SVM2 model needs an enlarged training form containing the training data set of each of the neurons and the surrounding neurons.These neurons are selected depending only on the situation of the SOM feature map without external inflow definition for higher relationships.Finally, although the SOM-SVM2 model only derives a 0.96% improvement in 1 h reservoir inflow forecast results, it derives 2.25% and 10.08% improvements in MCE value for 2 h and 3 h lead-time forecast results, respectively, when compared to the SOM-SVM1 model.Moreover, the SOM-SVM2 model exhibits 20.9%, 21.2%, and 34.4% improvements for the MCE values for reservoir inflow forecast results with 1 h, 2 h, and 3 h lead-times, respectively, when compared to the original SVM model.In other words, the advantage of the SOM-SVM2 model becomes most obvious in long-term forecasts.The proposed SOM-SVM2 model is recommended as an alternative to the existing models because of its accuracy, robustness, and efficiency.This modeling technique is expected to be useful in improving reservoir inflow forecasting.

Figure 1 :
Figure 1: Flowchart showing development of the two different SOM-SVM models.

Figure 2 :
Figure 2: The Feitsui Reservoir watershed in northern Taiwan.

Figure 3 :
Figure 3: The ID number and location of each neuron for a SOM with 5 × 5 dimensions.

Figure 5 :
Figure 5: The classification results for the SOM-SVM1 model according to the 21 typhoon events (taking Typhoon Polly as the testing event).

Figure 6 :
Figure 6: The relationship between the four classification regions and the rainfall-runoff processes for the SOM-SVM1 model.

Figure 7 :
Figure 7: The neurons containing the enlarged training data set for the SOM-SVM2 model (taking neuron ID numbers 1, 10, and 17 as examples).

Figure 8 :
Figure 8: The observed inflows versus the forecasts obtained by (a) the original SVM model, (b) the SOM-SVM1 model, and (c) the SOM-SVM2 model.

Figure 9 :
Figure 9: Comparison of the observed inflow with the 1 h ahead forecasts for Typhoon Polly (1 h).

Figure 10 :
Figure 10: Comparison of the observed inflow with the 1 h ahead forecasts for Typhoon Polly (2 h).

Figure 11 :Figure 12 :
Figure 11: Comparison of the observed inflow with the 1 h ahead forecasts for Typhoon Polly (3 h).

Figure 13 :
Figure 13: The CE values for 1 h ahead forecasts from three SVMbased models.

Table 1 :
The 22 typhoon events used in the modeling.

Table 2 :
The neurons containing the testing data set (1st column) and the neurons containing the enlarged training data set (2nd column), for the SOM-SVM2 model.

Table 3 :
Performance comparison of the SVM, SOM-SVM1, and SOM-SVM2 models for 1 h ahead forecasts using RRMSE and CE as criteria.
while the SOM-SVM1 model has 19.In addition, according to Figures8(a) and 8(b), it can be found that the SOM-SVM1 model generates forecasts with significantly less scatter than the original SVM model when plotted against the measured values.Further lead-time results are presented in Table4.It can be found that the average values of both the value

Table 4 :
Performance comparison of SVM and SOM-SVM for different lead-times using MRMSE and MCE as criteria.SVM1 model.This is because, after the SOM cluster process, all the forecast forms with similar input characteristics are arranged to closer, or the same, neurons.As the SVM-based models are established from these

Table 5 :
The improvement in MCE due to the use of SOM-SVM2 and SOM-SVM1 instead of SVM.

Table 6 :
The improvement in MCE due to the use of SOM-SVM2 instead of SOM-SVM1.