Application of Harmony Search to Design Storm Estimation from Probability Distribution Models

The precision of design storm estimation depends on the selection of an appropriate probability distribution model (PDM) and parameter estimation techniques. Generally, estimated parameters for PDMs are provided based on the method of moments, probability weighted moments, and maximum likelihood (ML). The results using ML are more reliable than the other methods. However, theML is more laborious than the other methods because an iterative numerical solutionmust be used. In the meantime, metaheuristic approaches have been developed to solve various engineering problems. A number of studies focus on using metaheuristic approaches for estimation of hydrometeorological variables. Appliedmetaheuristic approaches offer reliable solutions but usemore computation time than derivative-basedmethods.Therefore, the purpose of the current study is to enhance parameter estimation of PDMs for design storms using a recently developed metaheuristic approach known as a harmony search (HS). The HS is compared to the genetic algorithm (GA) and ML via simulation and case study. The results of this study suggested that the performance of the GA and HS was similar and showed more accurate results than that of the ML. Furthermore, the HS required less computation time than the GA.


Introduction
Floods that occurred due to severe rain or major storm generally result in damage to properties and negative impacts on human activity.Flood estimation is the process of minimizing property damage and reducing the threat to human activity.Design storm based on precipitation frequency analysis is one of the main procesess for flood estimation as well as a statistical representation of a precipitation event.
The primary purpose of precipitation frequency analysis in hydrometeorology is to estimate the magnitude of a storm event with a given frequency of occurrence.It can also be used to estimate the frequency of occurrence of a storm event with a given magnitude [1].The precision of precipitation frequency analysis depends on the selection of an appropriate probability distribution model (PDM) and parameter estimation techniques.A number of PDMs have been developed to describe the probability distribution of the hydrometeorological variables.In practice, it is often assumed that the correct PDM is a member of the developed PDMs.
However, the selection of an appropriate PDM is still one of the major problems in precipitation frequency analysis [2].
For each of the developed PDMs, estimated parameters are provided based on alternative estimation techniques, such as the method of moments (MOM), probability weighted moments (PWM), linear function of ranked observations (Lmoments), and maximum likelihood (ML) [1][2][3][4][5].The MOM is one of the most simple and commonly used methods for estimating the parameters.PWM and L-moments are discussed by Stedinger et al. [6].Estimates for PWM can be obtained from order statics.Additionally, Stedinger et al. [6] recommended a parameter estimator for PWM in regionalization studies.L-moments tend to produce less variable estimates for higher moments, when an unusually large or small observation happens to be present in a sample [1].The moment estimators, including MOM, PWM, and Lmoments, are a simpler way to obtain the PDM parameters but are less accurate than the ML estimators.The results using ML are generally more reliable than the other methods.However, the ML is more laborious than the other methods

Probability Distribution Models.
To test the performance of the proposed parameter estimation approach, the two-parameter lognormal (LN2), two-parameter gamma (GAM2), generalized extreme value (GEV), and Gumbel (GUM) distribution models are used.The general properties of the PDMs are given in Appendices.
Figure 1: Illustration of crossover and mutation.

Metaheuristic Approaches.
In the current study, two metaheuristic approaches (i.e., GA and HS) were used to estimate the parameters of PDMs.

Genetic Algorithms.
The GA is a stochastic search algorithm based on natural evolution and mechanisms of population genetics [8,9].The simple ideas of GA are based on the biological processes of survival and adaption.Only the best of the population is allowed to survive and propagate to successive generations.Therefore, the GA does not require derivatives of the objective function to solve complex and discontinuous optimization problems.A number of GAs are introduced, but the following general description encompasses most of the important features.
The analogy with nature is established by the creation of a set of solutions, called a population.The initial population of solutions is usually chosen at random and allowed to evolve over a number of generations.Each individual in a population is represented by a set of parameter values that completely describe a solution and undergo constant change by means of genetic operations of reproduction, crossover, and mutation.At each generation, the fitness of individuals with respect to the objective function is calculated for reproduction and propagated to the next generation.Based on the fitness, individuals with relatively high fitness (called parent) are selected for reproduction of the next generation.For example, as shown in Figure 1(a), there are two parents selected with binary cording.The strings, including the last three digits of two parents (i.e., 110 and 011), are recombined to produce offspring that will comprise the next generation.Then, the parents are replaced in the population by the offspring to keep a stable population size.The recombination operation is usually called the crossover.The offsprings (new generations) have a higher average fitness than their parents (previous generations).Occasionally, mutation is introduced into the population to prevent the convergence to a local optimum and help generate unexpected directions in the parameter space, as shown in Figure 1(b).A fixed fitness of generations has been created when a generation has reached the highest fitness and there is no further improvement with repeated iteration [23].
The GA method has been widely applied in a variety of engineering optimization problems.It has been established that the GA approach is an attractive alternative to solve optimization problems with nondifferentiable, nonlinear, and multimodal objective functions [9,25,26].

Harmony Search.
The HS developed by Geem et al. [24] is a phenomenon-mimicking algorithm inspired by the improvisational processes of musicians.The algorithm is based on natural musical performance processes in which a musician searches for a better state of harmony.Assume that the optimization problem is Minimize   () , where   () is an objective function;  is the set of each decision variable   ;  = 1, . . .,  (representing probability parameter);  is the number of decision variables (the number of parameters for a probability distribution); and Low  and Up  are the upper and lower limits, respectively, of the decision variable   .
To solve the optimization problem, the procedure of the HS is as follows.
(1) Generate the harmony memory (HM) randomly up to the harmony memory size (HMS) from a uniform distribution, denoted as where  = 1, . . ., ;  = 1, . . ., HMS; and [, ] represents the uniform distribution ranging from  to .The generated HM is (2) Improvise a new set (ã  ) by ã = { ã ∈ { where HMCR is the harmony memory considering rate for all the variables ã and  = 1, . . ., .Equation ( 4) implies that each decision variable of a new harmony set is sampled from the same variable of HM in (3) for the probability of HMCR.Otherwise, generate the harmony set from the uniform distribution as in (2).
(3) Adjust each variable of the improvised new set (ã  ) with the probability of the pitch adjusting rate (PAR) as where  is generated from [−ℎ, ℎ].Here, ℎ is the arbitrary distance bandwidth.If ℎ is large, the variability of the adjusted value ã *  is also large.
(4) Update the HM by replacing the worst harmony corresponding to the worst objective function with the improvised and adjusted new set.
Following the empirically based HS parameter range is recommended to produce a sufficient solution of 0.7-0.95 for HMCR, 0.2-0.5 for PAR, and 10-50 for HMS [24].

Optimization Function.
The derivative-based method usually uses the integral of the square of the error (ISE) as an optimization function to find a global solution because a derivative of ISE can be obtained relatively well.The metaheuristic approaches use a stochastic random search instead of a derivative search to find global optimization, so that various forms of the optimization function based on the integral of the absolute magnitude of the error (IAE), the integral of the time-absolute error (ITAE), and ISE are applied.
In this study, we use two metaheuristic approaches to estimate the parameters of PDMs.To attain the optimum result, an equivalent objective function (or target function) is used with two metaheuristic approaches.The objective function in this study is constructed using ISE with observed and estimated values and is expressed by where   is an objective function;   is the th ordered observation value; and   is the estimated value from the corresponding th empirical cumulative probability of the selected PDM.

Statistical Test Criteria.
Three test criteria are used to assess the adequacy of the proposed parameter estimation method: the correlation coefficient (CC), coefficient of efficiency (CE), and root mean square error (RMSE) [27,28].The mathematical forms of these criteria are as follows: where  and  are the mean of the observed data and the mean computed data from the fitted PDM, respectively.The range of CC is −1 ≤ CC ≤ 1.Here, the CC is also equivalent to the Filliben Q-Q correlation test, which was proposed by Filliben [29] as a test of normality.Vogel [30] extended this approach to lognormal and Gumbel distributions for the goodness-of-fit test and GEV [31]:  Note that higher CC and CE values and lower RMSE values represent better performance.Estimated statistics are described with boxplots.Boxes indicate the interquartile range (IQR), and whiskers extend to 1.5 IQR.The horizontal lines inside the boxes depict the median of the data.

Simulation Study
In the simulation study, we employ the GEV distribution model and estimate the parameters (i.e., scale (), shape (), and location ( 0 )) using the GA and HS techniques.For the simulation study, we produce 100 data sets using a GEV random number.An individual data set with a record length of 100 has the same parameters, which are  = 30,  = 0.1, and  0 = 100.The strategies for the GA and HS are shown in Table 1.
One hundred series are simulated for the parameter estimation of the GEV distribution model.The results of the parameter estimation based on the GA and HS methods are compared using boxplots, as shown in Figure 2. As shown in Figures 2(a) and 2(b), the  and  derived from the HS indicate a better performance than those of the GA.In addition, the variability of  and  derived from the GA is greater than that of the HS.However, as shown in Figure 2(c), the  0 derived from the HS and GA shows similar results.
The statistical test criteria in (7) and (8) were computed using the estimated parameters.In Figure 3, the test criteria for the proposed metaheuristic approaches are shown.Recall that higher CC and CE values and lower RMSE values represent better performance.As shown in Figure 3(a), the CCs for the HS are higher, while the variability is narrower than the CCs for the GA.The CE shows a similar pattern to the CC, as shown in Figure 3(b).The RMSE for the HS shows lower values than that of the GA, as shown in Figure 3(c).The RMSE indicates that the HS results have more accurate values than the GA results.
Although the metaheuristic approaches (e.g., the GA and ACO) are reasonable alternatives to solve hydrological problems, significant computation time is an obvious disadvantage [23].Therefore, we investigate the computation time of two metaheuristic approaches.The computation time for GEV parameter estimation using 100 simulations based on the GA and HS is compared using boxplots, and the results are shown in Figure 4.The computation time for the GA is in the range of 10 to 20 minutes.However, the computation time for the HS is very short and is less than 1 minute.To determine computation time by varying conditions, the population size for the GA is changed to 100, 500, and 1000, while the harmony size for HS is also changed to 100, 500, and 1000.As shown in Figure 5, the computation time for the GA  rapidly increases as the population size increases.Meanwhile, the fitness values of the objective function are only slightly improved.In contrast, the computation time for the HS rarely increases despite increasing harmony size, and the fitness values of the objective function remain consistent.Note that the fitness values of the objective function for HS with a small harmony size are better than the fitness values of the GA.The HS approach for estimation of the GEV parameter produces more reliable results than the GA approach and uses less computation time.
In addition, three PDMs (i.e., LN2, GAM2, and GUM) are used in a simulation study.The same procedure as the simulation study using GEV is employed for the parameter estimation of each PDM.The target values for the parameters of the applied PDMs and the estimated parameters by the GA and HS methods are reported in Table 2.In addition, the results of the three test criteria and the computation time for each PDM are summarized in Table 3.The results of additional simulation studies show that the two metaheuristic approaches are appropriate methods for the parameter estimation of PDMs.Furthermore, the difference between the GA and HS methods is that the HS method requires less computation time than the GA method while still providing reliable results.

Case Study
In the current case study, we carried out precipitation frequency analysis for design storm with annual maximum hourly rainfall data recorded at 74 rainfall gauges in South Korea.The annual maximum hourly rainfall data were extracted from the Korea Meteorological Administration website, http://www.kma.go.kr/.The record lengths of extracted rainfall data range from 20 to 100 years (average record length is approximately 40 years), and the locations of the 74 rainfall gauges are presented in Figure 6.
Four PDMs (i.e., LN2, GAM2, GEV, and GUM) were applied to the annual maximum hourly rainfall data.Parameters corresponding to the four PDMs were estimated using the three parameter estimation approaches: ML, GA, and HS.To compare the three parameter estimation approaches, the  quantiles for different return periods ( = 10, 25, 50, and 100 years) were estimated at the 74 rain gauges in South Korea.The variability of the quantiles for each PDM is shown in Figure 7.Note that quantiles imply the statistically estimated design storm at a given return period.As shown in Figure 7, the results of frequency analysis for the employed rainfall data show that the quantiles estimated by the two metaheuristic approaches present similar distributions, except for the quantiles of  = 50, 100 for LN2, as shown in Figure 7(a).However, the quantiles based on the ML method show a slightly different distribution when compared with the two metaheuristic approaches.
Table 4 summarizes the results of test criteria derived from the three parameter estimation approaches.The CCs for ML, GA, and HS represent similar values, and the CE represents a similar pattern to that of CC.However, RMSE for GA and HS shows lower values than the results of ML.The RMSE indicates that the two metaheuristic approaches for the parameter estimation of PDMs have a more accurate performance than ML.
In addition, the results of parameter estimation for Seoul, Daejeon, Daegu, and Busan, which are four major cities of South Korea, are summarized in Table 5.Furthermore, values of the quantiles for the four cities are estimated, and the results are summarized in Table 6.

Summary and Conclusion
The HS was developed by Geem et al. [24] and has been applied to solve a variety of engineering problems in a number of previous studies.Because the HS adopts a stochastic random search to find the optimized best solution, initial value settings of decision variables and derivative information to reach global optimum are not required.Furthermore, after considering all of the existing vectors based on the HMCR and PAR, the HS generates a new vector, whereas the GA only considers two vectors.These features increase the flexibility of the HS and produce a better solution.Therefore, we applied the HS method to enhance design storm estimation in the current study.
The results of the simulation study based on the four PDMs (i.e., LN2, GAM2, GEV, and GUM) showed that the HS approach for the parameter estimation of PDMs produced reliable results when measuring test criteria (i.e., CC, CE, and RMSE).Furthermore, the fitness values of the objective function for HS were better than the fitness values of the GA with a small harmony size.Additionally, the computation time for the HS approach was less than that of the GA method.
Precipitation frequency analysis for design storm was conducted to assess the performance of the proposed method with the four PDMs (i.e., LN2, GAM2, GEV, and GUM) and the annual maximum hourly rainfall data recorded at 74 rainfall gauges in South Korea.The results of precipitation frequency analysis were compared with those of the ML and GA methods.The results in this study suggested that performances of the GA and HS were similar and presented more accurate results than the ML in precipitation frequency analysis for annual maximum hourly rainfall data.Accordingly, we conclude that the proposed parameter estimation method based on the HS approach is a useful alternative for the parameter estimation of PDMs, particularly when conventional methods cannot be applied estimate the parameters of PDMs.where 0 ≤  < ∞ for  > 0 and −∞ <  ≤ 0 for  < 0.
The parameters  and  are the scale and shape parameters.
The shape parameter  is restricted to  > 0, while  may be positive or negative.Γ() is the complete gamma function, which is the integral function.From (B.1), the CDF of GAM2 is derived to be The function (, ) in (B.2) is the incomplete gamma function.Therefore, the quantile for GAM2 must be estimated numerically.The quantile   of return period  for GAM2 corresponding to the nonexceedance probability  is alternatively estimated using the following: where   and   are the mean and standard deviation, respectively.  is the frequency factor, which is a function of return period  and the skewness coefficient [2].

C. Generalized Extreme Value (GEV) and Gumbel (GUM) Distribution
The PDF and CDF of the GEV for a random variable  are expressed as follows: where , , and  0 are the scale, shape, and location parameter, respectively. plays an important role such that if  = 0, the distribution tends to resemble a type-1 or Gumbel distribution; if  < 0, the resulting distribution is type-2 or Log-Gumbel distribution; and if  > 0, it is a type-3 distribution or Weibull distribution.The quantile functions for GEV and Gumbel corresponding to the non-exceedance probability q are given in the following: where   is a quantile of return period  for GEV and Gumbel.

Figure 3 :
Figure 3: Boxplots of test criteria for estimated GEV parameters using 100 simulations based on GA and HS; (a) correlation coefficient; (b) coefficient of efficiency; and (c) root mean square error.

Figure 4 :
Figure 4: Boxplot of computation time for GEV parameter estimation using 100 simulations based on GA and HS techniques.

Figure 5 :
Figure 5: Comparison of computation time for GEV parameter estimation based on GA and HS methods.

Figure 6 :
Figure 6: The locations of the employed 74 rain gauges in Republic of Korea.

Table 1 :
Strategy for the GA and HS.

Table 2 :
Target values for the parameters of employed PDMs and estimated parameters by the GA and HS.
*The values are presented in the range of min to max.* * () shows mean of values.

Table 3 :
Comparison of test criteria for the three PDMs.

Table 4 :
Comparison of test criteria for the annual maximum rainfall data for the 74 rain gauges stations in Republic of Korea.

Table 5 :
The results of the parameter estimation for the four major cities in Republic of Korea.

Table 6 :
Quantiles for different return periods ( = 10, 25, 50, and 100 years) of the four major cities in Republic of Korea.