Multiobjective Optimal Algorithm for Automatic Calibration of Daily Streamflow Forecasting Model

1School of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, China 2Hubei Key Laboratory of Digital Valley Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China 3State Key Laboratory of Disaster Prevention and Reduction for Power Grid Transmission and Distribution Equipment, Changsha 410129, China 4State Grid Hunan Electric Company Disaster Prevention and Reduction Center, Changsha 410129, China 5School of Engineering and Technology, Hubei University of Technology, Wuhan 430068, China


Introduction
Some research has been reported that streamflow processes are affected by several known factors, such as rainfall and soil moisture, and many potential factors.Consequently, the streamflow processes always result in being nonlinear and time-varying.And the differences between the characteristics of high flow processes and low flow processes are very significant most of time.Therefore, it tends to be very difficult to predict the streamflow processes.
Firstly, selecting and constructing a proper model are the great and first important step of the research.Statistics forecast models [1], such as autoregressive (AR) model and autoregressive integrated moving average (ARIMA) model, are used the most.However, these types of models often have poor performance when outside or near the limits of the data [2].Due to the strong ability of Artificial Neural Networks (ANNs) with nonlinear mapping, they have been employed in the field of streamflow forecasting.However, the ANNs also have several shortcomings, such as overfitting, falling into local optimization, and slow convergence speed, especially when dealing with complex hydrological problems.Support vector machine (SVM), which is proposed by Vapnik [3], is one of the most effective algorithms for prediction in this decade.SVM basically involves solving a quadratic programming problem.In theory, SVM can gain the global optimum solution of the original problem.During the past few years, SVM has been widely employed for streamflow 2 Mathematical Problems in Engineering forecasting, and the performance of SVM is shown to be better than ANNs [4][5][6][7][8].Thus, we build a daily streamflow forecasting model based on SVM here.
As the model is established, the next vital work is to make calibration of the hydrologic model.Calibration of hydrologic model is to find the optimal parameters which could make the model match the characteristic of the hydrology as good as possible.Usually, the hydrological models are calibrated under the framework of single-objective paradigm.And the major work is focused on choosing the single-objective function and the optimization strategy to optimize that measure.However, several research reports reveal that single-objective functions often cannot properly describe all characteristics of the hydrological models [9].Therefore, consequently, it stands to reason that multiobjective functions are needed for calibration of hydrologic model.In Gupta et al. [10], the advantages of a multiple-objective representation of the model calibration problem were discussed.Different with single-objective optimization, model calibration with multiobjective functions will result in a set of so-called equal good solutions.And each solution is not dominated by any one solution of the set of "equal good solutions."This set of "equal good solutions" is called the Pareto front or the set of nondominated solutions.The detail of the definition of Pareto front or nondominated solutions can be referred to [11].In general, two methods can be used: one is to transform the multiobjective optimization problem to single-objective optimization by setting weight coefficients to each objective function.Although this method is simple to implement, it can only gain a single Pareto solution at a time.The other method is to solve the multiobjective optimization problem directly [10].In the recent years, a variety of multiobjective optimization algorithms are proposed to optimize hydrologic models.In the research of hydrologic modeling, Yapo et al. [12] presented a multiobjective optimization algorithm MOCOM-UA for multiple-objective global optimization calibration of hydrological models and demonstrated its features and capabilities by a simple example involving calibration of a conceptual rainfall-runoff model using two objectives, and some related research of hydrologic calibration and evaluation presented that the MOCOM-UA can efficiently generate a set of Pareto optimal solutions [10,[12][13][14][15][16][17][18]; Vrugt et al. [9] developed an effective and efficient algorithm MOSCEM-UA, which can gain a fairly uniform approximation of the Pareto frontier; Khu and Madsen [19] introduced the nondominated sorting genetic algorithm II (NSGA-II) into the multiobjective calibration of the MIKE11/NAM rainfall-runoff model; Gill et al. [20] modified the multiobjective particle swarm optimization (MOPSO) to account for multiobjective problems by introducing the Pareto rank concept.And the application of the new MOPSO in multiobjective calibration of hydrologic model has gained encouraging results; De Vos and Rientjes [21] used the NSGA-II algorithm for calibration of a feed-forward ANN model and the HBV conceptual model to compare the multiobjective performance of the two models; de Vos and Rientjes [22] presented results on the application of MOSCEM-UA, NSGA-II, and two other single-objective optimization algorithms for the training of artificial neural network rainfall-runoff models and showed that more than one objective function can be helpful in constraining the neural network training.
In this paper, a new multiobjective optimization method named multiobjective optimization method based on differential evolution with adaptive Cauchy mutation and Chaos searching (MODE-CMCS) is proposed to optimize a threeparameter support vector machine model for daily streamflow forecasting.And the efficacy of the new algorithm MODE-CMCS is compared with the NSGA-II algorithm on a three-parameter SVM based daily streamflow forecasting model.
This paper is organized as follows: Section 2 provides details of the proposed MODE-CMCS algorithm and a simple introduction about the NSGA-II and Differential Evolution (DE) algorithm is also given; Section 3 describes the performance metrics used for multiobjective algorithm evaluation in this research; in Section 4, the MODE-CMCS is first employed on solving 5 benchmark test problems; Section 5 presents the application of the MODE-CMCS to multiobjective parameter optimization of a three-parameter SVM based hydrological model for streamflow prediction; and conclusions are made in the last section.

Multiobjective Optimization Algorithms
In this section, we will give a simple introduction about the NSGA-II, and after that the details of the proposed multiobjective algorithm will be described.
2.1.NSGA-II.NSGA-II, which is an improvement over the NSGA, is first proposed by Deb et al. [23].It is one of the most effective and efficient algorithms for solving multiobjective problems.The flowchart of NSGA-II is shown in Figure 1.For more details about NSGA-II, readers are encouraged to refer to [23].
Several main features of the algorithm NSGA-II can be summarized as follows: (1) NSGA-II is significantly more efficient than the original version of the algorithm.And NSGA-II can reduce the computational complexity into ( 2 ), where  is the number of objective functions and  is the size of evolving population. ( where random( ) returns a random float number from a uniform random distribution from the section [0, 1]; random(1, ) returns a random real number from a uniform random distribution from the section [1, ]; CR is the crossover parameter.
DE adopts a greedy selection method, and then the next generation can be gained by otherwise. (3)

MODE-CMCS.
Details of the proposed multiobjective algorithm MODE-CMCS are discussed in this section.The MODE-CMCS mainly contains 7 operators as follows.
(a) Nondominated Sorting and Improved Crowd Distance Assignment Operator.The nondominated sorting method is the same as the NSGA-II.But the MODE-CMCS uses a more precise crowd distance assignment operator.For NSGA-II algorithm, the crowd distance of each point is got by calculating the sum of the two points on either side of this point along each of the objectives.But this method does not take into account the distribution of this point.For example, in Figure 2, suppose that the ranges of objective function 1 and objective function 2 have been scaled to the same scope, according to the crowd distance assignment of NSGA-II; the crowd distance of point A is 12, which is equal to that of point B, while the crowd distance of point C is 6.But it is obvious that the point A has better diversity than point B; that is to say, the crowd distance of point A should be larger than that of point B. Therefore, in this approach, we propose a more precise crowd distance assignment method as follows: where  = 1, 2, . . ., ,  is the number of individuals,  is the number of objective functions, and   and   are the distances of the th solution to its lower and upper adjacent solution along the th objective, respectively.According to (4), the crowd distance of point A is 228 while those of point B and point C are 211.3196and 114, respectively.It is evident that the new crowd distance measurement is more precise than that of NSGA-II.
(b) Archive Set Updating Operator.Similar with the improved Strength Pareto Evolutionary Algorithm (SPEA2) [31], the archive set, which is designed to store the nondominated solutions until now, is employed to enhance the algorithm convergence performance.Suppose the archive set is ; the operations on archive set can be expressed as Algorithm 1.
(c) Mutation Operator.The mutation operation here is similar to DE.But unlike DE, the random individuals   1 ,   2 , and   3 are chosen from the archive set rather than the parent population.As the individuals in the archive set are all nondominated, it can accelerate the convergence of the algorithm to some extent.Furthermore, the mutation control parameter  should be set larger at the beginning of the procedure to allow the algorithm with higher global searching ability and be set smaller at the end of the procedure to let the algorithm search within a local scope.Thus, an adaptive dynamic control mechanism for choosing the suitable value of  during the evolutionary progress is presented in this paper as where  is the current generation number and  max denotes the maximum generation number.
(d) Crossover Operator.The crossover operator is the same as DE's recombination operator.However, as mentioned in [30], the traditional constant CR cannot completely guarantee the optimization's ergodicity in the search space.Therefore, in this paper, we adjust the value of CR according to the progress of generation by where  is the current generation number and  max denotes the maximum generation number.
(e) Selection Operator.This operator uses the greedy strategy based on the concept of dominate, and it can be described as Algorithm 2.
(f) Adaptive Cauchy Mutation Operator.DE explores new space mainly by mutation and crossover operations.From ( 1) and ( 2), we can see that when the diversity of the population decreases to some extent, the algorithm will fall into local optimal solution.To overcome this premature convergence problem, the adaptive Cauchy mutation operator is employed to make algorithm escape out of local optimal area.In the procedure of DEMO-CMCS, we first calculate the diversity of each dimension of the population according to the following equation: If the value is below the preset threshold , the mutation operation will be carried out on this dimension as where   , is th dimension of the th individual, (0, 1) denotes a standard Cauchy variable,  is the coefficient of Cauchy mutation,  is the current generation number, and  max denotes the maximum generation number.
(g) Chaos Searching Operator.It is well known that Chaos has the characteristic of ergodicity, stochastic property, and "regularity"; literature [32] has revealed that the Chaos can enhance the search accuracy of algorithm when the searching scope of Chaos is small.Therefore, a Chaos searching operator with a small searching scope is introduced at the end of the procedure.In this research, the Chaos searching is based on the Logistic map, whose equation is as ∈ (0, 1) ,   ̸ = 0.25, 0.5, 0.75,  = 1, 2, . . ., where  is iteration number;   ,  = 1, 2, . . . is a chaotic series between 0 and 1 when  = 4.
The procedure of the Chaos searching operator in this paper is as Algorithm 3.
With the above 7 operators, the flowchart of the proposed multiobjective algorithm MODE-CMCS can be expressed as Figure 3.

Performance Metrics for Multiobjective Algorithm
In this paper, two common indexes are employed.Otherwise, a novel index for evaluating diversity of the solutions is proposed.The details of the three indexes are as follows.

Generational Distance Metric (GD).
GD is used to estimate the convergence of the algorithm.It measures how far the obtained set of nondominated solutions is from the real Pareto frontier.The definition of GD is as where  is the number of obtained nondominated solutions and   is the minimum Euclidean distance in the objective space between th obtained solution and solutions of the optimal Pareto set.

Generalized Spread Metric (SP).
The generalized spread metric SP measures the diversity of the obtained set of solutions [23].It is designed as where  is the number of obtained nondominated solutions;   is the Euclidean distance between th solution and ( + 1)th solution in the objective space;  is the mean of all distances   ;   and   are the Euclidean distance between the boundary solutions of the obtained nondominated set and the known set of optimal Pareto solutions.

Generalized Spread Metric Based Correlation Analysis (SPC).
In practical applications, the size of the obtained set of Pareto solutions may be different using different algorithms (in this research, NSGA-II generates 100 Pareto optimal solutions while MODE-CMCS generates 30).At this situation, taking SP as the diversity evaluation indicator is unsuitable and inequitable here, as the SP value is very sensitive with the length of sample.The explanation is as follows.
Suppose   and   are zero; we have two distance series  1 = { 1  1 , where SP 1 is the SP value of  1 ; SP 2 is the SP value of  2 ;  1 denotes the average distance of  1 ;  2 denotes the average distance of  2 .
As   and   are zero, this means that  ⋅  1 =  ⋅  2 .Therefore, it is obvious that the SP value is very sensitive with the length of distance series.
To overcome this problem, a novel index SPC is proposed in this paper.We convert the problem into calculating some indicator independent of sample length.
Then, the index SPC is defined as calculating the correlation coefficient between DIS and REF; the calculated equation is as The value of SPC can be used to measure the diversity of generated Pareto optimal solutions, and the larger the SPC is, the better the diversity of the set of Pareto optimal solutions is.

Numerical Experiments and Results
In this part, the MODE-CMCS algorithm is employed to solve 5 benchmark problems.And the performance is compared with other reported results.

Description of the Benchmark Test
Problems.ZDT1, ZDT2, ZDT3, ZDT4, and ZDT6, are adopted, and the features of these test problems are given in Table 1.

Parameter Settings of the MODE-CMCS Algorithm.
To make equal comparison with the results in [23], the parameters of MODE-CMCS algorithm are set as follows: dimension of the decision space equals the  listed in Table 1; the maximum number of function evaluations is set to 25000; the maximum iteration number of the local Chaos searching is set to 150; population size is 100; size of archive set is 100; the crossover rate is set to 0.8; the mutation rate is set to 0.02; diversity threshold  is set to 0.1; coefficient of Cauchy mutation  is 0.5; dimension of the objective space is 2.

Experimental Results
. The metrics GD and SP are selected to measure the performance of algorithms.The mean and variance of the values of the two metrics are calculated by MODE-CMCS on 30 individual tests, and the two metrics are shown in Table 2.The best values are marked by boldface.We can find that the MODE-CMCS algorithm performs better than NSGA-II on all benchmark problems.

Case Study
In this section, the performance of MODE-CMCS is compared with NSGA-II on calibration of a three-parameter SVM based daily streamflow forecasting model.In this case study, we are especially concerned with the ability of finding the approximate nondominated frontier.

Description of the Study Area.
Figure 4 shows the Yangtze River Basin, which originates in the Qinghai-Tibet Plateau and flows about 6300 km eastwards to the East China Sea.The drainage basin lies between 91 ∘ E-122 ∘ E and 25 ∘ N-35 ∘ N, covering a total area of 1808.5 × 10 3 km 2 , and its mean annual rainfall is approximately 1067 mm.As the Three Gorges Dam has interests in flood controlling, hydroelectric power generation, agriculture irrigation, and municipal and industrial water supply, the precise streamflow forecasting can make considerable economic benefit and meanwhile be helpful for flood controlling of downstream of the Yangtze River.

Description of the Streamflow Prediction Model.
The daily streamflow forecasting model used in this research is a threeparameter SVM model, which is similar with the SVM model in [20].The SVM for regression is employed in this paper to build the relationship between input data (measured daily streamflow) and output data (predicted daily streamflow).The kernel function of SVM for regression can map the nonlinearity data into high dimensional feature space.Then in this high dimensional feature space, the nonlinearity data can be described with linearity functions.Also the insensitive tube is introduced to deal with the existence of fitting errors, and the data located outside of the -insensitive tube is allocated with a penalty coefficient.There are three parameters, penalty coefficient , tolerance , and kernel parameter , of the SVM model needed to be determined through calibration.For one day ahead forecasting, the inputs to the model are the streamflow of the past five days: (), ( − 1), ( − 2), ( − 3), and ( − 4); the output is the where K is total number of flow data,   is the th measured value of flow, and Q is the kth forecasted value of flow.
Mathematical Problems in Engineering The MSLE function is more suitable for low flows due to the logarithmic transformation while the M4E is considered as an indicator of goodness-of-fit to high flows as larger deviations are given more contributions.The MSDE can be taken as an indicator of the fit of the shape of hydrograph.As the MSDE function does not take into account bias between observed and simulated value, so it cannot be used as the only objective function for calibration of hydrologic models; it should be used in combination with MSLE or M4E.

Results of Multiobjective Calibration and Discussion
. For NSGA-II, the control parameters are preset as follows: the size of the population is set to 100, the crossover rate is set to 0.8, the mutation rate is set to 0.02, and the maximum number of function evaluations is set to 100000.And for MODE-CMCS, the following values of control parameters are selected: the size of the population is set to 100, the size of the archive set is set to 30 (to lower computation load), the threshold  is set to 0.1, the maximum number of function evaluations is set to 100000, and the maximum iteration number of the local Chaos searching is set to 150.To diminish the influence of random factor, each algorithm runs 10 times independently.The approximate true Pareto set is gained by gathering all generated solutions obtained by the algorithms NSGA-II and MODE-CMCS.
The prediction results of the three-parameter SVM model are shown in Figures 5-8. Figure 5 presents the Pareto plots of the SVM model on the objective space MSLE and MSDE using NSGA-II algorithm and the proposed MODE-CMCS algorithm, whereas the Pareto solutions for combinations of MSDE and M4E are shown in Figure 6.
From Figures 5 and 6, it is apparent that there are conflicts existing between MSDE and the other two objective functions (MSLE and M4E).and vice versa.Likewise, it can be noted in Figure 6 that a good calibration of M4E leads to a bad calibration of MSDE and vice versa.The detailed convergence performance results are summarized in Table 3.
From Table 3, we can see that the MODE-CMCS algorithm can generate better Pareto solutions than the NSGA-II algorithm.When the objective functions are MSDE and MSLE, the MODE-CMCS generates significantly better solutions than NSGA-II.This indicates that there may be many local optimal solutions for this functions combination and the NSGA-II can be easily trapped into the local optimal solution.If the functions combination is M4E and MSLE, the MODE-CMCS generates slightly better results than NSGA-II.These better results are mainly due to the adaptive Cauchy mutation operator and the Chaos local searching operator, as the former operator can help the algorithm escape from local optimal solution while the latter can improve the searching accuracy of the algorithm.
However, in multiobjective optimization, we usually have two goals.The first one is to generate an approximation Pareto optimal set which are as close as possible to the optimal frontier, and the second one is to maintain the diversity of the solutions.We have to emphasize that the second goal is as important as the first one.This means that generating a better set of Pareto optimal solutions is not enough.Thus, to enhance the diversity of the generated approximation Pareto optimal set, the improved distance assignment operator, which has been discussed in Section 2.2.2(a), is proposed in this paper.
The SPC values obtained from NSGA-II and the proposed MODE-CMCS are given in Table 4. From Table 4, we can see that the performance of MODE-CMCS in maintaining diversity of the generated Pareto optimal solutions is better than NSGA-II.
From the above analysis, it can be noted that the performance of the proposed MODE-CMCS in this paper is better than the NSGA-II in terms of searching better Pareto optimal solutions and maintaining the diversity of the generated solutions.
Furthermore, the multiobjective calibration can help the modelers have a better understanding of the hydrology models.Figures 7 and 8 show the prediction uncertainty ranges associated with the Pareto optimal solutions generated by MODE-CMCS algorithm.
From Figures 7 and 8, it can be noted that the streamflow prediction ranges match the high flow and medium flow very well whereas the prediction accuracy turns to be worse during the low flow terms.Accordingly, it is suggested that the structure may be in need of improvement to enhance the prediction accuracy on low flow events.

Conclusions
This paper presents a novel multiobjective algorithm named MODE-CMCS for automatic calibration of hydrology model.Considering the drawback of the DE algorithm of being easily trapped in local minimum, the adaptive Cauchy mutation operator is introduced.And to generate better Pareto optimal solutions (which are closer to the true Pareto optimal frontier), we employ the local Chaos searching operator.Forward, the MODE-CMCS uses a more precise crowd distance assignment than that of the NSGA-II.Besides, as the traditional generalized spread metric SP is sensitive with the size of Pareto set, a novel diversity performance metric SPC, which is independent of Pareto set size, is proposed in this research.
The proposed MODE-CMCS algorithm is firstly employed to solve 5 benchmark problems.The results present that the MODE-CMCS is more powerful than NSGA-II in both providing an approximation of the true Pareto frontier and maintaining the diversity of Pareto set.Furthermore, the performance of the proposed MODE-CMCS algorithm for generating approximate Pareto optimal solutions is compared with the NSGA-II algorithm for calibration of a three-parameter SVM based daily streamflow forecasting model.And two pairs of objective functions, which are MSLE versus MSDE and MSDE versus M4E, are considered.The results of the case study also prove that the performance of the proposed MODE-CMCS is better than the NSGA-II in terms of both searching better Pareto optimal solutions and maintaining the diversity of the generated solutions.Moreover, the multiobjective calibration of hydrology models can help us understand the models better and then put forward improving proposal.

( 13 )
Second, a series REF with the length  is created by equal interval sampling of the uniform distribution on [0, 1]

Figure 4 :
Figure 4: Location of the study area (Yangtze River Basin).

Figure 5 :
Figure 5: Pareto plots of the prediction results when the objective functions are MSLE and MSDE.

Figure 6 :
Figure 6: Pareto plots of the prediction results when the objective functions are MSDE and M4E.

Figure 7 :
Figure 7: The prediction uncertainty ranges associated with the Pareto optimal solutions generated by MODE-CMCS algorithm when the objective functions are MSLE and MSDE.

Figure 8 :
Figure 8: The prediction uncertainty ranges associated with the Pareto optimal solutions generated by MODE-CMCS algorithm when the objective functions are MSLE and MSDE.
+1 is larger than the preset size  Begin Sort the individuals of  +1 with non-dominated sorting; Calculate the crowd distance of individuals in  +1 ; Select the first  better individuals with regard to their rank and crowd distance as the next population  +1 End if Algorithm 2 where   denotes the size of the population,   , is th dimension of the th individual,    is the average of jth dimension of the population, and   and   are the upper and lower bound of the th dimension, respectively.

Table 2 :
Statistics of results on metrics GD and SP.

Table 3 :
Convergence performance of two algorithms.

Table 4 :
Comparisons of SPC between NSGA-II and MODE-CMCS.