Hybridization of Adaptive Differential Evolution with an Expensive Local Search Method

Differential evolution (DE) is an effective and efficient heuristic for global optimization problems. However, it faces difficulty in exploiting the local region around the approximate solution. To handle this issue, local search (LS) techniques could be hybridized with DE to improve its local search capability. In this work, we hybridize an updated version of DE, adaptive differential evolution with optional external archive (JADE) with an expensive LS method, Broydon-Fletcher-Goldfarb-Shano (BFGS) for solving continuous unconstrained global optimization problems. The new hybrid algorithm is denoted by DEELS. To validate the performance of DEELS, we carried out extensive experiments on well known test problems suits, CEC2005 and CEC2010. The experimental results, in terms of function error values, success rate, and some other statistics, are compared with some of the stateof-the-art algorithms, self-adaptive control parameters in differential evolution (jDE), sequential DE enhanced by neighborhood search for large-scale global optimization (SDENS), and differential ant-stigmergy algorithm (DASA). These comparisons reveal that DEELS outperforms jDE and SDENS except DASA on the majority of test instances.


Introduction
Optimization is concerned with finding best solution for an objective function.In general, an unconstrained optimization problem can be stated as follows: Find global optimum x * of an objective function (x), where x = ( 1 ,  2 , . . .,   ) ∈   and  is the dimension of the problem.
Evolutionary algorithms (EAs) are inspired from Darwinian theory of evolution [1].They are very efficient for finding global optimum of many real world problems, including problems from mathematics, engineering, economics, business, and medicines.EA family consists of a variety of stochastic algorithms, like Genetic Algorithms (GAs) [2], Particle Swarm Optimization (PSO) [3,4], Evolutionary Strategies (ES) [5], and differential evolution algorithm (DE) [6,7].
Among EAs, DE is the most recent algorithm and is efficient in solving many optimization problems.DE has many advantages.For example, it is simple to understand and implement, has a few control parameters, and is robust [8].There is no doubt that DE is a remarkable optimizer for many optimization problems.But it has few limitations, like stagnation, premature convergence, and loss of population diversity [9,10].Being a global optimizer, DE suffers from searching the neighborhood of the approximate solution to the given problem.This makes room for hybridizing DE with other techniques to improve its poor exploitation (exploring the neighborhood of the approximate solutions).On the other hand, the role of LS methods is to stabilize the search especially in the environs of a local optimum.Thus, they can be combined with global search algorithms to enhance their local searching.
The main aim of this paper is to experiment with and validate the performance of our newly proposed hybrid algorithm, DEELS, which combines JADE [11,12] and BFGS [13].As a result, we want to see whether this hybridization will improve the performance of JADE further.Contrary to our published preliminary work [14], this paper presents DEELS in full depth.It also comments on the performance of DEELS for large-scale global optimization problems with dimension 1000.Moreover, in contrast to our previous published comparison with JADE only [14], this time DEELS is compared with jDE [15], SDENS [16], and DASA [17] on problems from CEC2005 and CEC2010 test suits to further explore the capabilities of DEELS for handling small and large dimension problems.
The rest of this paper is organized as follows.Section 2 describes the basic DE, JADE, and the BFGS algorithms.Section 3 presents literature review.Section 4 presents proposed algorithm.Section 5 gives the experimental results, and finally Section 6 concludes this paper and discusses future research direction.

Some Relevant Existing Methods
As mentioned earlier, DEELS depends upon JADE and BFGS.Thus, this section presents the basic operators of DE, JADE, and BFGS.[6,7] is a recently developed bioinspired scheme for finding the global optimum x * of an optimization problem.This section briefly reviews the DE algorithm.More details about it can be found in [18][19][20][21][22].The working of DE can be described as follows.

Reproduction.
To generate an offspring, DE incorporates two genetic operators, mutation and crossover.They are detailed as follows: (1) Mutation.After selection, mutation is applied to produce a mutant vector k  , by adding a scaled difference of the two already chosen vectors to the third chosen vector; that is, where  ∈ (0, 1) is the scaling factor.
(2) Crossover.After mutation, the parameters of the parent vector x  and mutant vector k  are mixed by a crossover operator and a trial member u  is generated as follows: where  ∈ {1, 2, . . ., }.

Survival Selection.
At the end, the trial vector generated in ( 2) is compared with its parent vector on the basis of its objective function value.The best of the two will get a chance to become a member of the the new generation; that is, x  , otherwise. (3) 2.2.JADE.JADE [11] is an adaptive version of DE which modifies it in three aspects.

DE/Current
/to-best Strategy.JADE utilized two mutation strategies: one with external archive and the other without it.These strategies can be expressed as follows [11]: where x  best is a vector chosen randomly from the top % individuals and x  , x  1 , and x  2 are chosen from the current population , while x 2 is chosen randomly from ∪, where  denotes the archive of JADE and  is a constant chosen as 0.5.In DEELS, we will utilize the strategy given in (4).

Control Parameters Adaptation.
For each individual x  , control parameter   and the crossover probability CR  are generated independently from Cauchy and normal distributions, respectively, as follows [11]: CR  = rand (CR, 0.1) .
These are then truncated to (0, 1] and [0, 1], respectively.Initially, both  and CR are set to 0.5.They are then updated at the end of each generation as follows: where mean  denotes the Lehmer mean and mean  denotes the arithmetic mean and   is the set of successful   's while  CR is the set of successful CR  's at generation .

Optional External Archive.
At each generation, the failed parents are sent to the archive.If the archive size exceeds   , some solutions are randomly deleted from it to keep its size equal to   .The archive inferior solutions play a roll in JADE's mutation strategy with archive.The archive not only provides information about direction but improves the diversity as well.

BFGS.
The BFGS method, also known as the quasi Newton algorithm, employs the gradient and Hessian in finding a suitable search direction.BFGS is considered as a good LS method due to its efficiency.The detailed algorithm of BFGS is presented in Algorithm 1.
end if (10) Compute the search direction s  by using the current Hessian matrix s  = −H  ∇(x  ); (11) Calculate   by golden section method [23]; (12) x +1 = x  +   s  ; (13) end while Output: x +1 is the output of the algorithm.

Brief Review of Variants of DE and Hybridization of DE with Local Search Methods
To improve the performance of DE, many researchers devised modifications to the classic DE and proposed different variants.Some researchers modified the selection scheme [24], while others varied mutation and crossover operators [25].Recently, in [26], orthogonal crossover was used instead of binomial and exponential crossover.Some have introduced new variants like opposition based DE (ODE) [27], centroid based initialization (ciJADE) [28], jDE [15], and genDE [8], while others introduced adaptation and self-adaptation of control parameters  and CR as in [29,30], SaDE [31], JADE [11,12], SHADE [32], and EWMA-DECrF [33].Some introduced cooperative coevolution into DE for large-scale optimization [34].A group of researchers applied it to discrete problems [35,36], while others take advantage of its global search ability in continuous domains [26,[37][38][39][40].
In recent years, the hybridization of DE with LS methods has gained much attraction due to their individual merits.Many hybrid algorithms have shown significant performance improvement.Here, we review some of the methods in this category.
A new differential evolution algorithm with localization around the best point (DELB) is proposed in [41].In DELB, the initial steps are the same as those in DE except that the mutation scale factor  is chosen from [−1, −0.4] ∪ [0.4,1] randomly for each mutant vector.DELB also modifies the selection step by introducing reflection and contraction.The trial vector is compared with the current best and the parent vector.If the parent is worse than the trial vector, it is replaced by a new concentrated or reflected vector.In DELB, the trial vector can be replaced by its parent vector or reflected vector or contracted vector, while in classic DE only the trial vector replaces the parent.
Recently in [42], DE is hybridized with nonlinear simplex method.This method is known as NSDE.The authors of [42] applied nonlinear simplex method with uniform random numbers to initialize DE population.Initially,   individuals are generated uniformly and then next   are generated from these   points by application of Nelder-Mead Simplex (NMS).Now from 2  population, the fittest   are selected as DE's initial population and the rest of DE is unaltered in NSDE.Thus, NSDE modifies DE in the population step only.It has shown good performance in reducing function evaluations and CPU time.
In another experiment, Brest et al. [43] hybridized DE with Sequential Quadratic Programming (SQP), an efficient but expensive gradient-based LS method.Their hybrid applies the DE algorithm until function evaluations reach 30% of the maximum function evaluations.It then applies SQP for the first time to the best point thus obtained.Afterwards, SQP is applied after every 100 generations to the best solution of the current search.Expensive local search iteration number is set to ⌊ √ dimension/5⌋.In their hybrid, the population size keeps reducing and the process ends with minimum population size.DE provides the users with flexible offspring generation strategies [44].Hence, hybridization of DE will continue to remain an active field of multidisciplinary research in the years to come.
Thus, we present a new algorithm, DEELS, which utilizes an expensive local search for refining the solutions.The details of DEELS are presented in the following section.

A New Hybrid Algorithm: DEELS
In this section, we present our new proposed algorithm, DEELS, which is the combination of two methods with contrasting features.First, we will discuss the main features of the algorithm.Then, we will describe it explicitly.

Main Idea.
Though JADE, due to its adaptive parameter control strategy, performs better than classic DE on many optimization problems, however, its performance worsens with the increase in dimension.BFGS is a LS technique which has a strong self-correcting ability [45] in searching the optimal solution, but it is not globally as good as JADE.The important question is how to reconcile two different aspects to solve the minimization problem.
A very natural way would be to hybridize these two techniques, JADE and BFGS, together for solving unconstrained optimization problems.The issue is how to combine them in a way which is easy to understand and implement.Many hybrid approaches incorporate expensive methods to find the best solution.But, here, the new algorithm incorporates the robust and costly method not only for refining good solutions, but for locating them in the population during the search process.
DEELS begins with JADE and allows it to search for  generations.It then selects the  best individuals from this population and applies to them the expensive LS, that is, BFGS, for the first time.The objective of applying efficient search is to make them potential individuals to produce better offspring and lead the search in promising directions.These are then introduced into the population and the worse  solutions are removed from it.
The purpose of calling BFGS after  generations is to concentrate the population and add local search ability to the overall scheme and thus help it avoid getting trapped in the local optimal solutions.For these reasons, BFGS is invoked two more times in the evolution, with an interval of  generations.If function value is less than a threshold , this means that it is in the neighborhood of the value to reach and this current best solution might lead the search to the desired optimal solution.Hence, it is desirable to apply the efficient LS by more than one iteration to this best solution.Thus, BFGS is applied by  iterations when the best solution is in the vicinity of a local optimum.If the output solution of BFGS is the best known solution, then the algorithm stops; otherwise, it continues until the allowed maximum number of function evaluations is met.
In [43], the population size is reduced dynamically, while in our hybrid algorithm, we keep the population size fixed, since reducing the population size might result in losing population diversity, which is very important for DE.DEELS has got much inspiration from the state-of-the-art paper [46].We apply expensive LS in combination with an EA (DE) instead of their inexpensive LS.In [46], both methods are LSs, while DEELS combines BFGS with JADE to investigate the effect of combing an EA with a LS method.In [46], a restart is also incorporated, while this is not necessary in DEELS.

Algorithmic Framework of DEELS.
The details of DEELS are given in Algorithm 2. Here, we explain the different strategies used in DEELS.

Global Search. JADE improves the population of solu-
tions by updating it from generation to generation with the help of genetic operators, mutation, and crossover.These operators help the search by producing promising solutions.JADE also possesses global search ability and thus adds it to DEELS.Moreover, JADE being a population based method can keep the diversity of the population and thus decreases the chances of DEELS getting trapped in local optima.

LS.
The BFGS method has very strong self-correcting properties (when the right line search is used).If, at some iteration, the Hessian matrix contains bad curvature information, it has the ability to correct these inaccuracies by only few updates [45].For this reason, BFGS generally performs very well, and once in the neighborhood of a minimizer it can attain superlinear convergence [45].Though BFGS is efficient, it is a costly method, since it computes the gradient at the given point, which utilizes 2 function evaluations per gradient in DEELS.Further, it approximates the Hessian matrix H, which is an ( × ) matrix of second-order partial derivatives [47], the computational cost of which is ( 2 ) per iteration [47].BFGS needs ( 2 ) function evaluations per iteration [45].Thus, the overall overhead of BFGS is also ( 2 ) per iteration.
The BFGS method plays two roles in DEELS; first, it is employed for generating promising solutions in the population after specified intervals of evolution.Secondly, it improves the quality of the best solution found so far by JADE and BFGS together.
Next, we explain what we mean by the terms concentration and refinement.As said earlier, the issue is to have an easy-to-understand and easy-to-implement search process.To achieve this, we need to rely on the fact that the problem is to distinguish between ordering points of which we have a lot and good ones (local optima) of which we have relatively few and the best points (global optima) of which we, potentially, may have only one or none.
Let us draw a diagram (see Figure 1) of the main process which is to rely on LS to do a course clustering (i.e., bring towards the basin of local optima of the majority of the good points in the population) and a refinement step in which hopefully the local optimum will be identified.It is clear that, initially, this process will be rather ineffective because of the sheer randomness of the population of solutions as shown in Figure 1(a); unless we are very lucky, it is unlikely to generate good points in the first population.But the important thing is that the process will become more and more effective as concentration takes its toll, on the population (see Figure 1(c)).

Updating the Population.
Adding promising solutions to the population of DEELS and removing the worst points (1) Inputs: Generate   uniform and random points, x 1 , x 2 , . . ., x   from the search space to form population ; (2) : the number of points selected for LS; (3) : the number of iterations of LS for concentration; (4) : the number of iterations of LS for refining solution; (5)   : population size; (6) FES: number of function evaluations; (13) while FES < V do (14) Start the algorithm with JADE by using (4) for generating mutant vector, (2) for trial vector, (3) for best solution selection and ( 6) and ( 7) for adaptation of control parameters; (15) Explore the population for  generations.(16) Sort the objective values; (17) Select  best points; (18) for  = 1 to  do (19) Apply  iteration of BFGS to these  points; (20) if V < (x * ) +  0 then from it can improve the quality of offspring in the next generations.As if good parents can produce good offspring, worse parents also have the chance of producing worse solutions.Hence, their removal can have a good effect on the entire population.New potential solutions can also increase the convergence rate.

Stopping Condition
. DEELS stops when one or both of the following conditions are met: (1) The maximum number of function evaluations is reached.
(2) |(x) − (x * )| <  0 , where x is the best individual found in a run and x * is the known value to reach of the test instance.
The maximum number of function evaluations is set to 3 × 10 +06 for CEC2010 test instances with dimension 1000, while for 30-dimensional problems (CEC2005), these are chosen as 3 × 10 +05 .

Comparison Studies
This section reports on two sets of experiments.In Experiment 1, DEELS is compared with jDE, while in Experiment 2, DEELS is compared with SDENS and DASA.For comparison with SDENS and DASA, the experimental results for the best, median, mean, and standard deviation values are obtained from [17].Moreover, all the experiments are conducted in MATLAB environment.

Experiment 1.
In our preliminary results [14], DEELS was compared with JADE only, which is its internal optimization technique.However, here we compare DEELS with another state-of-the-art algorithm jDE [15], which is a self-adaptive DE variant for 30-dimensional problems.

Test Instances for Experiment 1.
To study the performance of DEELS, we use CEC2005 test suit (see Table 1).This test suit was especially designed for single-objective unconstrained continuous optimization.Further, it was developed for low dimensions, for example, 30 and 50 dimensions.That is why we selected these instances for our experimental study.More details about these instances can be found in [48].The instances of CEC2005 can be divided into the following: (i) Unimodal test instances ( 1 - 5 ).
(iii) Hybrid composition test instance ( 15 ).The 15th test instance,  15 is designed by combining ten different benchmark functions, that is, two Rastrigin's functions, two Weirstrass's functions, two Griewank's functions, two Ackley's functions, and two Sphere functions.Its value to reach is 120.

Parameter Settings for Experiment 1. The population size
is set to 75, because   should be between 2 and 4 as suggested in [49].Here, the problem dimension  is set to 30 for all the test instances in both jDE and DEELS.The other two parameters  and CR are initially set to 0.5, since this initial setting works well for all the test instances [11].Later, the parameter values used in JADE are adopted.The number of elite solutions that undergo LS  is chosen as 3.The intensity of LS for concentration  is set to 1 and the number of iterations of LS for refining the solution  is set to 3.       The interval between the LS calls  is selected to be 300 generations which is equivalent to 300 ×   function evaluations.

Evaluation Metrics.
Thirty independent runs were conducted for DEELS and jDE.The mean and standard deviation of the function error |(x) − (x * )| values are recorded for each run.We also record the success rate (SR) [31], for each test instance.A run is considered as successful if it achieves the desired accuracy within the maximum allowed function evaluations.The SR for a particular function is calculated as follows:

Comparison of DEELS with jDE.
The experimental results for function error values and SR of jDE and DEELS are presented in Table 2.The convergence graphs of both algorithms are obtained by plotting the number of function evaluations against the objective function values.DEELS outperforms jDE on 10 out of 15 test instances, while on the remaining 5 test instances the performance of both algorithms is comparable.In the following, we comment on the DEEL's behavior in each category of test instances.

Unimodal Test Instances (𝐹 1 -𝐹 5
).As can be observed from Table 2, DEELS performed well for three out of five test instances,  3 - 5 in terms of function error values.For the remaining two instances,  1 and  2 , both algorithms are considered to be comparable.
Considering SR, here again DEELS performed well for two unimodal test instances,  4 and  5 .jDE only showed a higher SR in the case of  2 .Overall, on unimodal test instances, DEELS is better than jDE, which can be observed in the last column of Table 2.

Multimodal Test Instances (𝐹 6 -𝐹 14
).In the case of multimodal test instances, DEELS performed very well on six test instances,  7 and  10 - 14 in achieving a good solution (see Table 2).The graphs presented in Figure 2 for these multimodal test instances also prove that the yellow curve (jDE) is above the green curve (DEELS).This means that DEELSs obtained solutions are smaller than those obtained by jDE.This proves that DEELS outperforms jDE.Both algorithms showed equal performance on the rest (i.e.,  6 ,  8 , and  9 ) of multimodal test instances: ≈ symbol in Table 2 shows this fact.
For  9 , both algorithms attained the 100% accuracy level as given in Table 2.For the remaining multimodal test instances, neither of the algorithms could reach the desired accuracy in any run except for  6 , on which DEELS obtained 86.7% SR over zero SR of jDE.Thus, one can conclude here again that DEELS is better than jDE in the case of multimodal test instances.

A Hybrid Composition Test Instance (𝐹 15
).This test instance, being the combination of other test functions, is a challenging test function.Hence, it is not an easy task to find its global optimum or attain 100% SR for it.DEELS is successful in getting a better local optimum for it than jDE (please see the convergence graphs of Figure 2).DEELS also obtained a 3.3% SR for this test instance against 0 SR of jDE.This good performance of DEELS may be due to the fact that DEELS benefits from global search and LS, while jDE is only a global search method and so may not be good in exploiting better solutions.
In general, it is interesting to note that jDE, though equivalent to DEELS on five out of 15 test instances, could not get a better function error value than DEELS for any test instance.

Experiment 2.
In this section, we compare DEELS first with SDENS [50] and then with DASA [17] on the CEC2010 test instances with problem dimension 1000.

Test Instances for Experiment 2.
We further investigate the behavior of DEELS on ten new and complex test instances with problem dimension  = 1000, used in CEC2010 Special Session and Competition on Large-Scale Global Optimization [51].The test instances used in our experiments are the first ten test instances of CEC2010, which can be divided into two categories as follows: (i) Unimodal test instances ( 1 ,  4 ,  7 and  9 ).

Parameter
Setting for Experiment 2. The parameters settings are kept the same as demanded in the original paper [51] for CEC2010 instances.For this experiment, the population size   = 50 is chosen and the problem dimension  is set to 1000.The maximum function evaluations are chosen as 3 × 10 +06 .The value to reach is set to 10 −2 .Twenty-five independent runs of DEELS have been performed for all test instances.

Comparison with SDENS.
The best, median, mean, and standard deviation of function error values obtained in 25 runs of DEELS are presented in Table 3.
As can be seen from Table 3, overall DEELS performed well as compared with SDENS in reaching the best solution for seven out of ten test instances,  2 ,  4 ,  5 ,  7 , to  10 .Surely, this better performance is due to the additional exploitation abilities of DEELS.For the remaining three test instances,  1 ,  3 , and  6 , SDENS dominated the best solutions of DEELS. 1 and  3 are separable functions, while  6 is a single-group nonseparable multimodal function.The mean value obtained by DEELS on  6 in Table 3 is substantially larger than that of SDENS.Therefore, it may be reasonable to think that the failure of DEELS is due to the BFGS, which may get trapped at a local optimum.
It is interesting to note from Table 3 that, based on median and mean values, DEELS found consistently better median and mean of the average error values than SDENS for the seven out of ten test instances,  2 ,  4 ,  5 , and  7 - 10 .However, for the remaining three test instances,  1 ,  3 , and  6 , SDENS maintained its dominance over DEELS.This poor performance of DEELS might be due to one of the abovementioned reasons.
Both algorithms, DEELS and SDENS, achieved 50% success based on standard deviation values as illustrated in Tables 3.That is, for five test instances,  4 ,  5 , and  7 to  9 , DEELS performed well in terms of standard deviation values, while for the other five test instances,  1 to  3 ,  6 , and  10 , SDENS outperforms DEELS.
Overall, DEELS performance was better than SDENS on best, median, and mean values.However, in case of standard deviation values, the performance of both algorithms is 50%.The worse performance of DEELS against DASA can be seen only in case of standard deviation values, where DEELS is superior only in three test instances,  4 ,  5 , and  8 , while DASA surpasses DEELS on seven test instances,  1 to  3 ,  6 ,  7 ,  9 , and  10 .

Comparison with DASA.
Thus, one can conclude that based on the median and mean function error values DASA and DEELS have similar performance on nonseparable test functions except standard deviation and the minimum objective values where DASA is better than DEELS.The latter failed totally on separable test functions,  1 to  3 .It also remained poor on two nonseparable test functions,  7 and  9 .

Conclusion
In this paper, we described DEELS, a new hybrid algorithm that combines two well known algorithms, JADE and BFGS, to keep a balance between exploration and exploitation.DEELS showed efficient performance on majority of the tested test instances against jDE and SDENS except DASA.Based on the experimental results, it can be concluded that LS method can improve the local tuning of the solutions provided that these are hybridized at a proper gap with the global optimizer; otherwise, it can cause early termination of the algorithm and results in premature convergence.It is also observed that DEELS fails on separable functions."−," "+," and "≈" denote that the performance of the SDENS DASA algorithms is worse than, better than, and similar to that of DEELS, respectively.

Figure 1 :
Figure 1: Concentration and refinement of solutions in a population.

Table 2 :
Experimental results of jDE and DEELS on 15 test instances of 30 variables with 3 × 10 5 FES.Mean error and std.dev. of the function error values obtained in 30 independent runs.
Table 3 presents the best, median, mean, and standard deviation values for DASA, which are obtained from [17].This table shows that DEELS is superior to DASA on four test instances,  4 ,  6 ,  8 , and  10 , in terms of best function error values.These test instances are mainly -group nonseparable except  10 , which is /2-group nonseparable; all are multimodal except  4 .For the remaining six test instances, DEELS performed poorly against DASA in achieving the best function error values.Please note that, among these six functions,  1 ,  2 , and  3 are separable.Table 3 shows that DEELS outperforms DASA in finding good median and mean of the function error values for the five test instances,  4 ,  5 ,  6 ,  8 , and  10 , while it is inferior to DASA on the other five test instances,  1 to  3 ,  7 , and  9 ; here,  7 is -group nonseparable and  9 is /2group nonseparable.

Table 3 :
Experimental results of SDENS, DASA, and DEELS on 10 test instances of 1000 variables with 3 ⋅ 10 +06 FES.Best, median, mean, and std.dev. of the function error values obtained