On Metaheuristics for Solving the Parameter Estimation Problem in Dynamic Systems : A Comparative Study

This paper presents an experimental study that aims to compare the practical performance of well-knownmetaheuristics for solving the parameter estimation problem in a dynamic systems context. The metaheuristics produce good quality approximations to the global solution of a finite small-dimensional nonlinear programming problem that emerges from the application of the sequential numerical direct method to the parameter estimation problem. Using statistical hypotheses testing, significant differences in the performance of the metaheuristics, in terms of the average objective function values and average CPU time, are determined. Furthermore, the best obtained solutions are graphically compared in relative terms by means of the performance profiles. The numerical comparisons with other results in the literature show that the testedmetaheuristics are effective in achieving good quality solutions with a reduced computational effort.


Introduction
The present paper addresses the problem of finding a set of parameter values in a dynamic system model that calibrates the model so that it can reproduce the existing experimental data in the best possible way [1].This is performed by minimizing an objective function that measures the goodness of the fit.Thus, an objective function, which gives the sum of the squared errors between the model predicted state values and the observed values (at certain time instants of a fixed interval), is minimized, subject to a system of ordinary differential equations (ODE).The dynamic model defined by the system of ODE simulates the time varying processes that take place in a certain time interval.This problem is known in the literature as the dynamic model based parameter estimation (DMbPE) process.Solving this problem is crucial in systems biology and medicine, with a great impact on both pharmaceutical and biotechnological industries [2].The problem is also very common in the chemical engineering area [3] and it has been extensively used to describe physical phenomena [4].The DMbPE problem may involve nonlinear differential-algebraic equations and a multiplicity of local solutions may exist.Nonconvexity and ill-conditioning issues of the problem may be addressed with efficient global optimization (GO) methods and proper regularization techniques [5].The problem is even more difficult than that of an algebraic model.The nonlinear dynamic behavior makes the analytical approach rather complicated, if not impossible, for most of the real phenomena.Numerical direct methods are therefore good alternatives to solve the DMbPE problem.Like any parameter estimation problem, the main advantage is that the intended global minimum is known in advance.The most well-known methods, like Levenberg-Marquardt, gradient descent, and the Nelder-Mead method, which have been extensively used in parameter estimation problems, are local optimization methods and cannot guarantee convergence to a global solution.They are able to exploit a specified search region around a given initial value, but the global solution is difficult to find when the initial value is far from the required solution.Algorithms for GO can be roughly divided into two categories: exact algorithms and heuristics [6].
Sörensen and Glover [7] have defined metaheuristic as follows: "A metaheuristic is a high-level problemindependent algorithmic framework that provides a set of guidelines or strategies to develop heuristic optimization algorithms."Further, an implementation of a heuristic optimization algorithm that follows the strategies laid out in the metaheuristic framework is also referred to as a metaheuristic.This term appeared for the first time in a publication by Glover [8].Metaheuristics are approximate methods or heuristics that are designed to search for good solutions with less computational effort and time than the more classical algorithms.While heuristics are designed to solve a specific problem, metaheuristics are general-purpose algorithms that can be applied to solve almost any optimization problem.The metaheuristics evaluate a set of potential solutions of an optimization problem and perform a series of operations on those solutions in order to find different and hopefully better solutions.Depending on the way these operations are carried out, a metaheuristic can be classified as a local search, constructive, or population-based metaheuristic.For details, the reader is referred to [7].
Most metaheuristics use random procedures that invoke artificial intelligence tools and simulate nature behaviors and their performance does not depend on the properties of the problem to be solved.They are alternative methods to find good approximations to optimal solutions of GO problems.In addition, they are derivative-free and easy to implement [9].Metaheuristics can locate the vicinity of global solutions with relative efficiency although global optimality cannot be guaranteed in a finite number of iterations.On the other hand, exact methods for GO can guarantee global optimality for certain problems but the required computational effort increases often exponentially with the problem size [10,11].
Previous use of metaheuristics, mainly those that use metaphors based on natural evolution and on behavior of animal swarms, to solve some DMbPE problems showed that they are able to provide good quality solutions when real experimental data and noisy artificial data are considered.In [15], a hybrid metaheuristic algorithm that introduces evolutionary operations, namely, mutation and crossover, into the firefly algorithm (FA), is presented.Two variants of the differential evolution (DE) method-a trigonometric and a modified version-have been implemented in [16,17].The authors in [2] developed a new procedure based on the scatter search (SS) methodology.A slightly different version of the SS, which uses the fmincon solver from Matlab (Matlab is a registered trademark of MathWorks, Inc.) as an improvement method, is tested in [12] to solve three well-known DMbPE problems, including the chemical isomerization of -pinene.This latter problem is analyzed and solved by the FA in [13].Alternatively, exact methods for GO have also been used to solve DMbPE problems.These include the spatial branch and bound (BB) algorithm to compute a global solution of the partially discretized DMbPE problem [18], the -BB method to obtain a global solution within an -precision of the completely discretized DMbPE problem [10], and a method based on interval analysis and on the partially discretized DMbPE [19].In [14], a deterministic outer approximation approach is applied to the reformulation of the DMbPE problem as a finite NLP by applying a complete discretization using orthogonal collocation on finite elements.
Since the problem of estimating the parameters of a dynamic model is important, the contribution of this study is concerned with the implementation and practical comparison of five very simple and easy to implement metaheuristics, hybridized with a local intensification phase, when solving the finite nonlinear programming (NLP) problem that arises when a numerical direct method of a sequential type is used to locate a global optimal solution to the DMbPE problem.The unknown parameters are the decision variables of the NLP problem.
The selected metaheuristics are very popular and have been used to solve a variety of real-world applications.The selection includes the FA [20], the Harmony Search (HS) algorithm [21], the DE algorithm [22], the Artificial Bee Colony (ABC) algorithm [23], and the Directed Tabu Search (DTS) algorithm [24].During the local intensification phase, the well-known Hooke-and-Jeeves (HJ) local search [25] is implemented with all the metaheuristics.These local search enhancements lead to much faster convergence and provide good quality solutions.The practical comparison carried out in the present study aims to analyze the performance of each metaheuristic in terms of the quality of the obtained solutions.To test the five metaheuristics, nine DMbPE problems were selected from the literature [4,10,13,19,26], yielding 12 instances due to different experimental data.The problems are described in the Appendix.
The remainder of the paper proceeds as follows.Section 2 addresses the DMbPE problem and the sequential numerical direct method, and Section 3 introduces the selected metaheuristics.Section 4 contains the results of all the numerical experiments and the conclusions are summarized in Section 5.

Sequential Direct Method to Solve the DMbPE Problem
To solve the DMbPE problem, the sequential numerical direct method is used.For completeness, the DMbPE problem reads as follows.Find  * such that where  : R 푛 → R represents the objective function that depends on  = ( 1 ,  2 , . . .,  푛 ), a vector of the  decision variables (the parameters that need to be estimated);  = ( 1 ,  2 , . . .,  퐷 ) is the vector (with length ) of the state variables that depend on  (the independent variable); and  耠 represents the vector with the derivatives of  with respect to .  represents the number of experimental data available for each  푗 ;  obs 푗,푖 ,  = 1, . . ., ,  = 1, . . ., , are the experimentally observed values;  耠 = (, , ) is the system of first-order ODE; ( 10 , . . .,  퐷0 ) is the vector of initial values to the variables  푗 ,  = 1, . . ., ; and  L and  U represent the vectors with the lower and upper bounds on the parameters, respectively. 0 and  푓 are the initial and final values, respectively, of the independent variable .
To solve problem (1), there are indirect methods and direct methods [27].Indirect methods use the first-order necessary conditions from Pontryagin's minimum principle to reformulate the problem as a two-point boundary value problem.The latter may become difficult to solve especially if the problem contains state variable constraints.
In direct methods, the optimization present in ( 1) is performed directly.This type of method discretizes the problem and applies NLP techniques to the resulting finitedimensional optimization problem.There are two types of direct methods.In the sequential direct method, the DMbPE problem is transcribed into a small finite-dimensional optimization problem through the discretization of the decision variables only.Hence, the optimization is carried out in the space of the decision variables only.Given a set of values for the decision variables, the system of ODE is accurately integrated (over the entire time interval) using a specific numerical integration formula (with error control mechanisms to enforce the accuracy of the state variable values), so that the objective function value can be evaluated [3].Thus, the ODE are satisfied at each iteration of the NLP solver.The method is called sequential, since the processes of minimizing the sum of the squared errors and solving the ODE are done in a sequential manner.This method may lead to a slow convergence process since the system of ODE is solved again and again each time an objective function value is required.In the simultaneous direct method, approximations to the solution of the system of ODE are computed instead.The optimization is carried out in the full space of the discretized decision and state variables, which may lead to very-large-dimensional finite NLP problem with equality constraints (emerging from the discretization of the system of ODE), in particular if a fine grid of points is required to obtain a high level of integration accuracy.Thus, very efficient NLP solvers are required when solving the finite problem.
We use the sequential direct method and the system of ODE is numerically integrated by the Matlab function ode15s.The default value for the scalar relative error tolerance is set to 1.0−08 and all the components of the vector of absolute error tolerance are set to 1.0 − 08.The sequentially discretized NLP problem is then solved by a stochastic metaheuristic.This is further elaborated in the next section.

Stochastic Metaheuristics
To compute global optimal solutions to NLP problems, stochastic or deterministic methods are available.Stochastic GO methods are able to provide a near-optimal solution in a short CPU time, although it may not be globally optimal.The brightest firefly  NP  The less brighter firefly In contrast, deterministic GO methods provide an interval within which the global optimal solution falls, although they require very large computational efforts [14].Stochastic metaheuristics, such as FA, HS, DE, ABC, and DTS, include randomly generated numbers from probability distributions to define a set of solutions on the search region, as well as to move those solutions to hopefully better positions.
They ensure convergence to a global solution in probability, while deterministic methods tend to guarantee asymptotic convergence.Global search techniques use exploration and exploitation search procedures that aim to diversify the search so that a global optimal solution is located and to intensify the search so that a good approximation is computed in a promising area of the search space.
Most stochastic metaheuristics are classified in terms of source of inspiration as nature-inspired algorithms.The well-known swarm-intelligence-based algorithms belong to a wider class called the bioinspired algorithms, and these are a subclass of nature-inspired algorithms [28].The most popular are the algorithms based on swarm intelligence, with the FA and the ABC algorithm being two examples.On the other hand, the HS is not a bioinspired algorithm and has its inspiration from music.Although the DE algorithm is not based on any biological behavior, it can be classified as a bioinspired algorithm due to the keyword "evolution," but it is not swarm-intelligence-based [28].Finally, the DTS algorithm is not a nature-inspired algorithm, but it has the particularity of using past information of the search to record it in memory.The majority of the metaheuristics define a set of approximate solutions at each iteration, called population, and generate new trial solutions that may be accepted as the solutions for the next iteration, if some improvement is detected relative to the current set of solutions.

The FA.
The FA is a bioinspired metaheuristic algorithm that is capable of converging to a global solution of an optimization problem.It is inspired by the flashing behavior of fireflies at night [20,29].The main rules used to construct the FA are as follows: (i) all fireflies are unisex, meaning that any firefly can be attracted to any other brighter one; (ii) the brightness of a firefly is determined from the encoded objective function; (iii) attractiveness is directly proportional to brightness but decreases with distance.
Table 1 lists the most relevant nomenclature used in the basic FA.At the beginning, the positions of the fireflies in where the second term on the right-hand side of ( 2) is due to the attraction while the third term is due to randomization with  ∈ (0, 1),  0 is the attraction parameter when the distance is zero,  푖 is a number uniformly distributed in [0, 1], and  is a vector that aims to scale the movement to the set Ω.The parameter  is crucial to speed the convergence of the algorithm and it can take any value in the set [0, ∞).
When  → ∞, the attractiveness tends to the minimum constant value  min .To accelerate convergence and at the same time prevent premature convergence, the sequence of  values is made to slowly decrease along the iterative process.If a component  of the final position of a firefly falls outside the set [ L 푗 ,  U 푗 ], it is shifted onto the boundaries.

The HS Algorithm.
The HS algorithm was developed to solve GO problems in an analogy with the music improvisation process where music players improvise the pitches of their instruments to obtain better harmony [21,30].At each iteration , the basic HS algorithm provides a set of solution vectors, available in the harmony memory (HM), from which the best and the worst solutions, in terms of objective function values, are identified.Table 2 lists the most relevant nomenclature used in the HS algorithm.
The HM contains HMS solution vectors that are maintained in memory throughout the iterative process.First, the solutions in the HM are randomly generated in the search space Ω,  푖 ,  = 1, . . ., HMS.Then, they are evaluated and the best harmony,  best , and the worst,  worst , in terms of objective function value are identified.Hereafter, at each iteration, a new harmony is improvised; that is, a new vector V is generated, using three improvisation operators.When the HM operator is used, the component  of V is chosen from the HM with probability HMCR; otherwise, the random selection operator is used and the component is randomly generated in Ω: for  = 1, . . ., , where  is a random number uniformly distributed in [0, 1].Finally, the pitch adjustment operator is subsequently applied with a probability 0 < PAR() < 1, which varies with the iteration counter , to refine only the components  produced by the HM operator, as follows: where 0 < BW() ≤ 1 is the distance bandwidth that depends also on  [31].Finally, the components of V are checked against the bounds and projected onto the boundaries if they fall outside.In the final stage, if (V) < ( worst ), the HM is updated since the new harmony is included in the HM, replacing the worst one.

The DE Algorithm.
The DE is a bioinspired populationbased algorithm that relies on three strategies-mutation, crossover, and selection-to define the NP solutions/points for the next iteration [22].The most important nomenclature and parameters are shown in Table 3.
The initial population of points,  푖 ∈ R 푛 ,  = 1, . . ., NP, is randomly generated in the search space Ω.The most commonly used mutation strategy defines the mutant point , V 푖 , as follows: with uniformly chosen random indices  1 ,  2 , and  3 from the set {1, 2, . . ., NP}, mutually different, and  is a real parameter in [0, 2] which controls the amplification of the differential variation,  푟 2 −  푟 3 .The indices  1 ,  2 , and  3 are also chosen to be different from the index . 푟 1 is called the base point.
The components of the mutant vector are then mixed with components of the  vector to generate the trial point,  푖 .This strategy is referred to as crossover and can be described as follows: Limit for abandonment for  = 1, . . ., , where  denotes a random number uniformly generated within the interval [0, 1] and aims to perform the mixing of the component  of the points, CR ∈ [0, 1], and  푖 , an index randomly selected from {1, . . ., }, aims to ensure that  푖 gets at least one component from V 푖 .Finally, the components of  are projected onto the boundaries of [ L ,  U ] if they fall outside.Then, a selection strategy compares each trial point  푖 with  푖 in terms of objective function values and the best one is selected for the population of the next iteration.The variant described above is referred to as DE/rand/1/bin, where "rand" specifies the vector that is mutated, "1" defines the number of difference vectors, and "bin" means that crossover is based on independent binomial experiments [22].There are other frequently used DE variants, for instance, the DE/best/1/bin which uses the best point of the population as the base point to define the mutant point and the DE/rand-to-best/1/bin with two differential variations: where  best is the best point in the current population [32].

The ABC Algorithm.
The ABC algorithm is an optimization algorithm based on the intelligent behavior of honeybee swarms [23].The colony of artificial bees of size NP includes the employed bees, the onlooker bees, and the scout bees.The first half of the colony consists of employed bees and the second half constitutes the onlookers and scouts.The position of a food source represents a possible solution of the optimization problem and the amount of nectar in the food source gives the quality of that solution.The number of food sources, Nf, is taken to be equal to the number of employed bees.The most important nomenclature and parameters in the ABC algorithm are shown in Table 4.
At the initial stage, a set of food source positions are randomly selected by the bees; that is, the positions are randomly generated in the search space Ω, and their nectar amounts are determined in terms of fitness values.During the employed bee phase, new candidate food positions, called mutant solutions, are produced as follows: for  = 1, . . ., Nf, where  (different from ) and  are randomly chosen indexes from the sets {1, . . ., Nf} and {1, . . ., }, respectively, and  is a random number uniformly generated within the interval it is shifted to the boundaries.Afterwards, the mutant solution V 푖 is compared with  푖 and a greedy selection is applied to choose the one with better nectar, that is, with better fitness, as described in (9).If the current solution  푖 is maintained (meaning that it has not been improved), its trial counter is increased.
On the other hand, during the onlooker bee phase, the food sources are randomly chosen according to probability values,  푖 (associated with the food sources), that depend on their fitness  푖 evaluated by The mutant solution is produced from the chosen old one as previously shown in (8) and the greedy selection is applied between the current  푖 and its mutant.Finally, in the scout bee phase, if the trial counter of the solution that has not been improved mostly exceeds a predetermined value, called "limit," that solution is abandoned and replaced by a position randomly generated in the search space Ω [33].
3.5.The DTS Algorithm.The tabu search (TS) algorithm, introduced to continuous optimization in the paper [34], is capable of guiding the search out of local optima and exploring new regions for the global optimum.This is a point-by-point iterative procedure that maintains a list of the most recent movements, denoted by "tabu list" (TL), so that the movements that lead to solutions previously visited are avoided.
The DTS method developed in [24] is hybridization of the TS with a direct search method that aims to search the neighborhood of a local minimum.The method comprises three main search procedures: exploration, diversification, and intensification.The main loop consists of the exploration and the diversification search procedures.The exploration search aims to explore the search space Ω and uses direct search methods to be able to stabilize the search, in particular in the vicinity of a local minimum.Cycling is prevented by the TL and by another four TS memory elements: the multiranked tabu list, the tabu region, the semitabu region, and the visited region list (VRL).The VRL works as a diversification tool and is used to direct the search towards regions that have not been visited in the search space.The most important nomenclature and parameters of the DTS algorithm are listed in Table 5.
During the exploration procedure, DTS uses an adaptive pattern search strategy to generate an approximate descent direction (ADD) for the objective function  at the current approximation.First, based on an approximation ,  pattern directions parallel to the coordinate axes are constructed and  trial points  푖 ,  = 1, . . ., , are generated, along these directions with a given step length.Thus, the ADD V is obtained by Second, two local trial points  푛+1 and  푛+2 are then generated along V with two different step lengths aiming to explore the region along V [24].The diversification procedure aims to randomly generate a new trial point p ∈ Ω outside the previously visited regions.The VRL contains the centers of the visited regions and the frequency with which these regions are visited.If the shortest distance of p to a region's center, weighted by its frequency, exceeds the predefined region radius, given by  dVR, the point is accepted; otherwise, a new trial point is generated.When one of the best obtained trial solutions is sufficiently close to a global minimum, or its value has not been changed for a specified number of iterations, the DTS algorithm leaves the exploration and diversification search procedures and invokes an intensification procedure.Here, the HJ local search is used to compute a solution still closer to the global minimum (see the next section).

Hooke-and-Jeeves Local Search.
A pattern search method directs the search towards a minimizer using a pattern of specific number points.At least +1 points must be provided by the pattern, where  is the number of variables.Based on a current approximation, the HJ method uses a pattern search strategy to define a trial solution that gives an improvement in the objective function value, when compared with the current point [25].Let the current approximation be denoted by  (푘  ) , where  푖 represents the iteration counter in this iterative process.Then, a trial point, V 푖 , is generated along a search direction (starting from the current point) with a certain step size Δ (푘  ) > 0 as follows: where  푖 is the search direction chosen from a finite set D of positive spanning directions in R 푛 .The most used set contains the 2 coordinate directions, defined as the positive and negative unit coordinate vectors D = { 1 , . . .,  푛 , − 1 , . . ., − 푛 }.The most important property of this set is that at least one of the coordinate directions is a descent direction for the objective , when  (푘  ) is not a stationary point of .When the search fails to generate a trial point that is better than  (푘  ) , the iteration is called unsuccessful, the step size Δ (푘  ) is halved so that a refined search can be carried out, and  (푘  +1) ←  (푘  ) .If Δ (푘  ) falls below a given stopping tolerance, the algorithm terminates and the current  (푘  ) is considered an approximate minimizer of .However, when at the end of each iteration the objective function value has reduced, then the iteration is called successful, Δ (푘  ) is not changed, and For a fair comparison, and to improve the quality of the produced solutions, we propose the implementation of the HJ intensification phase with all the above-mentioned metaheuristics.Based on the final solution provided by the metaheuristic, the HJ local search algorithm is invoked and allowed to run for 5 iterations.The number of function evaluations required by the HJ local search is added to the function evaluations of the metaheuristic, herein denoted by NFEval, to produce the total number of evaluations of the run.

Numerical Experiments
This section aims to present and analyze the statistical significance of the numerical results that were obtained when the metaheuristics are used to solve the DMbPE problems in the sequential direct method context.The dynamic system models were coded in the Matlab programming language and the computational application of the sequential direct method with the FA, HS, DE, ABC, and DTS codes was developed in Matlab programming environment.The computational tests were performed on a PC with a 2.2 GHz Core i7-2670QM and 8 GB of RAM.The parameter values for the tested algorithms are set as shown in Table 6.Other values have been tested, in particular for the parameter NP in the FA, DE, and ABC and the parameter HMS in HS, but those used seem adequate with reduced computational effort.
Nine case study problems have been selected to analyze the performance of the described stochastic algorithms.A comparison is made considering the quality of the solutions and the time spent to reach the solution after a threshold number of function evaluations.The problems and the experimental data can be found in the Appendix.
Table 7 shows  avg (the average of the obtained final solutions), St.D. (the standard deviations of the solutions), and  avg (the average time in seconds) after running each instance (inst.) 10 times with each metaheuristic (MH).For these experiments, each algorithm is terminated solely with the condition NFEval.> NF max , where we have tested three values of NF max : 50, 100, and 200.The goal here is to analyze the quality of the obtained solutions.Since the target objective value is 0, the lower the value of  avg the better.
To analyze the statistical significance of the results, we use the Matlab function friedman that performs a nonparametric statistical test for multiple comparisons.The Friedman test aims to determine significant differences in the mean for one independent variable, for instance,  avg or  avg , with different levels that correspond to the five metaheuristics, and a dependent variable (corresponding to matched groups, herein taken as the 12 instances) [35].The null hypothesis in this test is that the mean ranks assigned to the results of the metaheuristics under testing are not statistically different.Let  be the number of metaheuristics to be ranked and  be the number of groups.We note here that the distribution of the Friedman statistic, here denoted by  퐹 , approaches the ordinary  2 distribution as  increases.The exact distribution of  퐹 has been obtained for special cases of  and  (the reader is referred to [35] and the references cited in [36]).It has been concluded in [35] that "when the number of groups is moderately large (say greater than 5 for four or more ranks) the significance of the Friedman statistic can be tested by reference to the available  2 tables."However, in [36], it is argued that "the usual  2 approximation is grossly inaccurate for most commonly used combinations of  and ."Thus, hereafter, we use another statistic that has been recommended in [36,37] and depends on  퐹 , as well as on both  and : which is distributed according to the -distribution with −1 and ( − 1)( − 1) degrees of freedom.When applied to  avg for NF max = 50,  퐹 = 25.27 and the value of  퐹 is 12.229.With  = 5 and  = 12,  퐹 is distributed according to the -distribution with 4 and 44 degrees of freedom.For a significance level of 5%, the critical value  0.05 (4,44) is 2.59 (computed by linear interpolation between  0.05 (4,40) and  0.05 (4,50) ).Since 12.229 >  0.05 (4,44) , we have enough evidence to reject the null hypothesis of "no significant differences on mean ranks"; that is, the observed differences between the five distributions of  avg values are statistically significant.Pairwise comparisons may be carried out to determine which mean ranks are significantly different.The Matlab function multcompare is applied.The estimates of the 95% confidence intervals are shown in Figure 1(a) for each case under testing.The distributions of  avg are significantly different if their intervals are disjoint and are not significantly different if their intervals overlap.Hence, we conclude that the mean rank produced by DE is significantly different from that of ABC.For the other pairs of comparisons, there are no significant differences on the mean ranks.
When the test is applied to  avg for NF max = 50,  퐹 = 14.97 and  퐹 = 4.985 which exceeds the value of  0.05 (4,44) , indicating that the null hypothesis is rejected at a significance level of 5%, and we conclude that the distributions of  avg values have statistically significant differences.Figure 1(b) shows the estimates of the 95% confidence intervals.We conclude that the mean rank of ABC is significantly different from those of DE and DTS.
A similar analysis is made for  avg and  avg , when NF max = 100.We obtain for the statistics  퐹 = 25.82 with  퐹 = 12.805 and  퐹 = 7.47 with  퐹 = 2.027, respectively.Hence, we may conclude that there is enough evidence to reject the null hypothesis of "no significant differences on mean ranks" of the distributions of  avg , at a significance level of 5%.From Figure 2(a), we conclude that the mean ranks of (the distribution of  avg values) ABC and DTS are significantly different from the mean ranks of DE.For the  avg distributions, we conclude that there is not enough evidence  to reject the hypothesis of "no significant differences on mean ranks," at a significance level of 5%. Figure 2(b) shows the corresponding estimates of the 95% confidence intervals and it is noticed that all pairs have their intervals overlapping.
When the statistical analysis is extended to the distributions of  avg and  avg values, for NF max = 200, the statistic  퐹 gives 16.34 (with  퐹 = 5.677) and 10.64 (with  퐹 = 3.133), respectively.Hence, we have evidence to reject the null hypothesis relative to the  avg values, at a significance level of 5%. Figure 3(a) shows the corresponding estimates of the 95% confidence intervals (ABC and DTS have mean ranks significantly different from DE).As far as the distributions of  avg values are concerned, we conclude that there is enough evidence to reject the hypothesis of "no significant differences on mean ranks" at a significance level of 5%.From Figure 3(b), we conclude that the differences are on the mean ranks of DE and DTS.We note that, at a significance level of 1%, a smaller probability of making a wrong decision, where the critical value  0.01 (4,44) is 3.79 (computed by linear interpolation between  0.01 (4,40) and  0.01 (4,50) ), there is no evidence to reject the null hypothesis.
We now show in Table 8 the best solution obtained by each metaheuristic,  best (after the 10 runs), and the time required to reach that value,  best , for the three values of NF max .Based on the metrics  best and  best , we use a graphical procedure to visualize the performance differences among the results produced by the five metaheuristics, in relative terms on the 12 instances, known as performance profiles [38].Each plot reports (on the vertical axis) the percentage of problems solved with each metaheuristic that is within a certain threshold,  (on the horizontal axis), of the best result.The performance profiles of the five distributions of  best values for NF max = 50 are shown in Figure 4(a).Figure 4(b) corresponds to the metric  best .
The higher the percentage, the better.A higher value for  = 1 means that the metaheuristic achieves the lowest  best value (or the smallest  best ) mostly.
Therefore, the metaheuristic DE is the most successful since it has the highest probability of achieving the lowest  value, when NF max = 50.From the plot, the probability that DE wins on a given instance is 0.5 since it produces the lowest  best in 6 of the 12 instances.It is followed by HS that  has a probability of winning (prob w) of 0.25, by DTS with prob w = 0.17, and by FA with prob w = 0.08.We note that the differences on the produced  best are so small that the plots of the cumulative distribution are packed around  = 1.
From Figure 4(b), it is possible to conclude that the fastest metaheuristic is ABC since the probability of achieving the smallest value of  (time in seconds) is 0.58.Similarly, Figures 5(a) and 5(b) contain the performance profiles of the five distributions of  best and  best , respectively, when NF max = 100.From the figure on the left, we may conclude that DE is the most successful since it has prob w = 0.58, followed by FA with prob w = 0.25 (achieving the best results in 3 out of 12 instances), by HS with prob w = 0.17, and by ABC and DTS with prob w = 0.08.As far as  best is concerned, it is possible to conclude that the metaheuristic with the highest probability of requiring the lowest time is ABC followed by DE.
Finally, Figures 6(a) and 6(b) display the performance profiles of the metrics  best and  best , respectively, when NF max = 200.From these comparisons, we may conclude that DE is the most successful in reaching the lowest  best , with prob w = 0.5, followed by HS and ABC with prob w = 0.25 and finally the metaheuristics FA and DTS with prob w = 0.08.When the profiles of  best are analyzed, it is possible to conclude that the metaheuristic DE is in general faster than the others but for some problems is outstripped by FA and HS.
Tables 9-19 contain comparative results relative to 11 instances.(The results obtained for instance VOL A are not used in the comparison since the experimental data have been herein generated for this study.)Tables 9-19 show the best and the average  values (computed from the 10 runs), as well as the average CPU time and function evaluations,  avg and NFE avg , respectively.Our results always appear along the first row of each table and are taken from the metaheuristic that produced the lowest  best with the least function evaluations.This is identified in the table caption.(We note that the implemented metaheuristics have been enhanced with the HJ local search, and the reported NFE avg considers the sum of NF max with the function evaluations required by the HJ search.)In the tables, "-" means that the information is not available in the cited paper and "n.a." means "not applicable."The comparisons use the results reported in the papers [2, 4, 10, 12-14, 16-19, 26].Results in [16] are based on a trigonometric version of the DE, and in [17], a modified version of the DE is implemented.The authors in [12] use a modified version of the metaheuristic scatter search (SSm) with a local search as an improvement method.The results shown in the table use the function fmincon from Matlab.SSm uses a population of 100 points and a penalty technique with a weight of 1000 to penalize infeasible solutions.In [19], the sequential approach is implemented with an interval-based method for the NLP optimization.The therein obtained solution is an -global solution.Global solutions are produced by the method of orthogonal collocation on finite elements presented in [26].This strategy is implemented within Matlab code Dynopt after 100 runs of a multistart approach.(We note that a direct comparison of the CPU time cannot be done since the computational platforms are different.) Overall, after all the numerical comparisons, it is possible to conclude that the selected metaheuristics when enhanced with the local intensification phase are effective in achieving good quality solutions with a reduced computational effort.Furthermore, we have also shown that the sequential direct method combined with the selected metaheuristics competes

Conclusions
In this paper, we have analyzed the performance of five wellknown metaheuristics when solving parameter estimation problems in dynamic system models.The sequential numerical direct method is applied to the DMbPE problem and the resulting optimization is performed directly making use of a numerical integration formula for solving the system of ODE.
Using nine DMbPE problems and different experimental data, error-free and random error-added data, a total of 12 instances have been used in the comparative experiments.The solutions produced by the metaheuristics have been analyzed in terms of quality, by stopping the algorithms after a specified number of function evaluations, and compared using statistical hypotheses testing.The statistical tests show that the average obtained solutions and the average CPU time are considered to be mostly significantly different, at a significance level of 5%.The best solutions produced by the metaheuristics are analyzed by means of the performance profiles.After the graphical comparisons for different numbers of allowed function evaluations, we are able to conclude that the DE is the metaheuristic that has the highest probability of giving the lowest value of the objective function  mostly.It is followed by HS.The FA, ABC, and DTS metaheuristics are, in this sequence, the least effective.Thus,     our recommendation for a reader that favors good quality solutions falls in the DE and HS metaheuristics.On the other hand, ABC and DE are in general the metaheuristics that require the least computational effort, meaning that, for a limited computational budget, our recommendation is to choose one of them.Finally, from the comparisons with other results in the literature, involving both sequential and simultaneous direct methods and stochastic as well as deterministic global optimization methods, we state that the sequential direct method combined with one of the above recommended metaheuristics is a good alternative to solve DMbPE problems.Good quality solutions with a reduced computational effort are thus provided.using  = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and  1 = 0.3 and  2 = 0.1 with added small error and are available in [26].Only the concentrations of components  (= 1 ) and  (= 2 ) were measured.
Lotka-Volterra Predator-Prey Model (VOL).This is a system of ODE with two dependent variables and two parameters (see Section 6.4 in [19] and Section 7.6 in [10]):  30, is available in [10] and has also been tested, corresponding to the herein denoted instance VOL B.

Figure 1 :
Figure1: Estimates of the 95% confidence intervals for  avg and  avg , using NF max = 50.

Figure 2 :Figure 3 :
Figure 2: Estimates of the 95% confidence intervals for  avg and  avg , using NF max = 100.

Table 1 :
Nomenclature for the FA.

Table 2 :
Nomenclature for the HS algorithm.

Table 3 :
Nomenclature for the DE algorithm.

Table 4 :
Nomenclature for the ABC algorithm.

Table 5 :
Nomenclature for the DTS algorithm.

Table 7 :
Comparison of avg , St.D., and  avg .

Table 8 :
Best results ( best and  best ).

Table 29 :
Error-added generated experimental values for 10 time instants.