Memetic Differential Evolution with an Improved Contraction Criterion

Memetic algorithms with an appropriate trade-off between the exploration and exploitation can obtain very good results in continuous optimization. In this paper, we present an improved memetic differential evolution algorithm for solving global optimization problems. The proposed approach, called memetic DE (MDE), hybridizes differential evolution (DE) with a local search (LS) operator and periodic reinitialization to balance the exploration and exploitation. A new contraction criterion, which is based on the improved maximum distance in objective space, is proposed to decide when the local search starts. The proposed algorithm is compared with six well-known evolutionary algorithms on twenty-one benchmark functions, and the experimental results are analyzed with two kinds of nonparametric statistical tests. Moreover, sensitivity analyses for parameters in MDE are also made. Experimental results have demonstrated the competitive performance of the proposed method with respect to the six compared algorithms.


Introduction
In 1989, the name of "memetic algorithms" (MAs) [1] was introduced for the first time. In the last two decades, MAs gradually became one of the recent growing areas of research in evolutionary computation. They combine various evolutionary algorithms (EAs) with different LS methods to balance exploration and exploitation. Existing examples of memetic algorithms are NM-BRO [2], MA-LSCh-CMA [3], LBBO [4], IMMA [5], and MPSO [6]. In the framework of MAs, LS operators are used to execute further exploitation for the individuals generated by common EA operations, which is helpful to enhance the EA's capacity of solving complicated problems.
Differential evolution was first proposed by Storn and Price [7] in 1995 to solve global numerical optimization problems over continuous search spaces. It shares some similarities with other EAs. For example, DE works with a population of solutions, called vectors; it uses recombination and mutation operators to generate new vectors and, finally, it has a replacement process to discard the less fit vectors. DE represents solutions with real coding. Some of the differences with respect to other EAs are as follows: DE uses a special mutation operator based on the linear combination of three individuals and uses a uniform crossover operator. It has several attractive features. DE is relatively simple to implement and was demonstrated to be very effective on a large number of cases. In the past few decades, DE has been successfully used in many real-world applications, such as space trajectory design [8][9][10], hydrothermal optimization [11], underwater glider path planning [12], and vehicle routing problem [13].
Despite its successful applications to different classes of problems in different fields, DE was demonstrated to converge to a fixed point, a level set [10], or a hyperplane not containing the global optimum [14]. Furthermore, in some cases it was shown to have slow local convergence.
In order to overcome these shortcomings, some authors have proposed a hybridization of DE with some local search heuristics. dos Santos Coelho and Mariani [15] proposed a version of memetic DE which combines DE with the generator of chaos sequences and sequential quadratic programming technique (DEC-SQP). In this memetic algorithm, DE with chaos sequences is the global optimizer and SQP is applied to the best individual to find the local minimum. Noman and Iba [16] proposed an adaptive hill-climbing crossover-based local search operation for enhancing the performance of standard differential evolution (DEahcSPX). Muelas et al. [17] developed MDE-DC which combines DE with multiple trajectory search algorithm (MTS). Neri and Tirronen [18] proposed the scale factor local search differential evolution (SFLSDE). In SFLSDE, golden section search and unidimensional hill-climb local search are applied to detect an optimal value of the scale factor and generate a higher quality offspring. Wang et al. [19] proposed an adaptive MA framework called DE-LS. In DE-LS, selfadaptive differential evolution (SaDE) [20] is the global search method, while covariance matrix adaptation evolution strategy (CMA-ES) [21] and self-adaptive mixed distribution based univariate EDA (MUEDA) [22] are employed as the local search methods. Vasile et al. [10] proposed an inflationary differential evolution algorithm (IDEA), which hybridizes DE with the restarting procedure of Monotonic Basin Hopping (MBH), to solve space trajectory optimization problems. Minisci and Vasile [9] and Di Carlo et al. [8] proposed an adaptive version of inflationary differential evolution algorithm (AIDEA) and a multipopulation version of AIDEA (MP-AIDEA) which automatically adapt the values of four control parameters. Locatelli et al. [23] proposed a memetic differential evolution for disk-packing and spherepacking problems. In this algorithm, two kinds of local searches (MINOS and SNOPT) are used to detect local minima. Asafuddoula et al. [24] proposed an adaptive hybrid DE algorithm (AH-DEa) which has three features. The first is its use of adaptive crossover rates from a given set of discrete values. The second is an adaptive crossover strategy at different stages of the evolution. The last is the inclusion of a local search strategy to further improve the best solution. Qin et al. [25] proposed an advanced SaDE, which incorporates two different local search chains (Lamarckian and Baldwinian) into SaDE to enhance exploitation capability. Trivedi et al. [26] hybridized DE and GA to solve the unit commitment scheduling problems, in which GA was used to handle the binary unit commitment variables while DE was employed to optimize the continuous power dispatch related variables. In the same year, Li et al. [27] proposed a new hybridization, named DEEP, based on DE framework and the key features of CMA-ES, which generates a trial vector by first using a DE/rand/1/bin strategy followed by an Evolution Path (EP) mutation of CMA-ES.
The focus of this paper is to optimally combine DE global search operators with Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm to improve local search in continuous optimization. A new contraction criterion, which is based on the maximum distances in objective space and decision space, is proposed. When the contraction criterion is satisfied, BFGS starts from the best solution at the current generation. Furthermore, a restart mechanism is employed. If the best solution is not improved during the course of the local search, the population is reinitialized to increase the chance to find the global optimum.
The paper is organized as follows: DE is briefly introduced in Section 2. The proposed DE algorithm with local search and reinitialization is presented in Section 3. The design of the experiments, the results, and the corresponding discussions are included in Section 4. The last section, Section 5, is devoted to conclusions and the future work.

A Short Introduction to Differential Evolution
DE is a population-based stochastic parallel optimization method. Each vector (or individual) of the population at generation is called the target vector, and it will generate one offspring called the trial vector. For example, the th vector of the population will generate one trial vector . Trial vectors are generated by adding weighted difference vectors to the target vector. This process is referred to as the mutation operator where the target vector is mutated. A crossover step is then applied to produce an offspring which is only accepted if it improves on the fitness of the parent individual. Many variants of standard DE have been proposed, which use different learning strategies and/or recombination operations in the reproduction stage. A general DE variant may be recorded as DE/a/b/c, where "a" denotes the mutation strategy, "b" specifies the number of difference vectors used, and "c" specifies the crossover scheme which may be binomial or exponential. The DE/rand/1/exp is described in Algorithm 1.

Proposed Algorithm
In this section, we describe four major operations of the proposed MDE algorithm in detail, including contraction criterion, BFGS search, reinitialization scheme, and boundary constraint handling. The detailed description of MDE is given in Algorithm 2.

Contraction Criterion.
In order to design an effective and efficient hybrid algorithm for global optimization, we need to take advantage of both the exploration capabilities of EA and the exploitation capabilities of LS and combine them in a wellbalanced manner. To incorporate BFGS into DE successfully, a triggering condition, called contraction criterion, is needed to decide when the local search has to start. There are several kinds of methods to define a contraction criterion. Qin and Suganthan [20] applies local search method after a fixed number of generations (every 200 generations). Sun et al. [5] starts the LS if the promising solution is not updated in tconsecutive generations. Simon et al. [4] use the minimum fitness in the objective space as the contraction criterion; Di Carlo et al. [8][9][10] perform LS when the maximum distance in decision space is below a given threshold.
In MDE, we propose a new contraction criterion which combines two criteria: (a) 1 is the improved maximum distance in objective space and (b) 2 is the maximum distance in decision space. The idea of 1 is derived from [28] where 1 is a measure of the diversity of the population in objective space.
Computational Intelligence and Neuroscience

BFGS Search.
In MDE, the local search utilizes the better solutions obtained by the global search to update the population of MDE and thus enhances MDE's exploitation ability to find the best solution. In MDE, we use the BFGS algorithm as the local search method. BFGS is one of the quasi-Newton methods which do not need the precise Hessian matrix and is able to approximate it based on the individual successive gradients. BFGS is considered as the most effective and popular quasi-Newton method and has been proven to have good performance even for nonsmooth optimizations. The details can be found in [29].

Reinitialization Scheme.
If the best solution has not been improved after local search, a reinitialization of the whole population is used to give the algorithms more opportunities to find the global optimum. Simon et al. [4] proposed a partial reinitialization of the population. Every 20 generations, the algorithm selects the best individuals from a temporary population of 2 + 2 individuals as the reinitialization pool. Sun et al. [5] chose the individuals, which have the largest distances from the local optima, from a temporary population to form the next population. Zamuda et al. [30] proposed a population size reduction method as the reinitialization scheme. In MDE, we apply a simple reinitialization scheme described in Algorithm 3. If the result of the local search does not improve the best individual in the population, a reinitialization of the population is triggered. A counter keeps track of the number of restarts. For < max , where max is user-defined, individuals are generated randomly in the search space, drawing samples from a uniform distribution. For ≥ max , 2 /3 individuals in the population are initialized randomly in the search space, while the rest are initialized by a normal distribution which takes the best individual as the center and ( − )/50 as the standard deviation. Algorithm 3 summarises the reinitialization procedure. The function randreal draws samples from a uniform distribution while function Gaussian draws samples from a normal distribution and [ , ] are the lower and upper boundaries on .

Boundary Constraint
Handling. After mutation and crossover, each generated trial vector undergoes boundary constraint check. If some variables of are out of the boundary, a repair method is applied as follows: where randreal ( , ) can generate a random real number from [ , ].

Experimental Results
In order to verify the performance of MDE, we select the 21 nonnoisy benchmark functions from CEC2005 special 4 Computational Intelligence and Neuroscience (1) Generate the initial population , define as the th vector in , best is the best vector in the current generation, min is the global best vector in the generation. is the population size, NFEs is the number of function evaluations in each run, Max_GEN is the maximum generation, Max_NFEs is the number of max function evaluation, is the number of decision variable, is the mutation factor, CR is crossover rate, 1 and 2 are the contraction criterion, Flag is the restart mark. (2) NFEs = 0 (3) Evaluate the fitness ( ) for the each individual in . NFEs = NFEs + . (4) Flag = 0, count = 0 (5) while ≤ Max_GEN and NFES ≤ Max_NFEs do (6) Global search using DE (7) for = 1 to do if NFES < 0.2 × Max_NFEs then (12) Using DE/rand/1/bin to generate (13) else (14) Using DE/rand/1/exp to generate (15) end if (16) Evaluate the trial vector . NFEs = NFEs + 1 (17) if is better than then end if (20) end for (21) if ( best ) < ( min ) then (22) Replace min with best . (23) end if (24) Local search using BFGS (25) Calculate the contraction criterion as described in Section 3.1 (26) if ( 1 < 1,max or 2 < 2,max ) then (27) Pick up the best as the initial point of the local search. (28) Apply BFGS ( best ) to find the resultant new local optimum local as described in Section 3.2. (29) if ( local ) < ( best ) then (30) Replace best with local .
Computational Intelligence and Neuroscience Algorithm 3: Pseudocode of reinitialization scheme.

Experimental Setup.
For each algorithm on each benchmark problem, we conduct 25 independent runs and limit each run to 10000 * max function evaluations, where is the problem dimension ( = 10, = 30, and = 50). The performance of the algorithms is evaluated in terms of function error value [31], defined as ( ) − ( * ), where * is the global optimum of the test function. The mean and the standard deviation of the function error values are recorded. The parameters of MDE are set as = 30, 1,max = 2.0, 2,max = 2.0, max = 3, ∈ (0.8, 0.1), and ∈ (0.5, 0.1); the mutation and crossover strategies are the same as those in [24]. For the other six algorithms, we use the same parameter settings in their original papers.

Performance Criteria.
To effectively analyze the results, two nonparametric statistical tests, as similarly done in [35,36], are used in the experiments. (i) Wilcoxon's signed-rank test at = 0.05 is performed to test the statistical significance of the experimental results between two algorithms on both single-problem and multiproblem. (ii) Friedman's test is employed to obtain the average rankings of all the compared algorithms. Wilcoxon's signed-rank test on single-problem is calculated by Matlab, while Wilcoxon's signed-rank test on multiproblem and Friedman test are calculated by the software of KEEL [37].  which means that in those cases MDE is significantly better than CLPSO, GL-25, and CMA-ES. In addition, Friedman's test is employed to evaluate the significant differences of all the compared algorithms. As shown in Figure 1, MDE gets the second average ranking value, while L-SHADE gets the first average ranking values on the 10-dimensional problems. Table 2 shows that MDE performs significantly better than the other six compared algorithms in the majority of the test functions. For example, MDE wins in 12 cases, only loses in 3 cases, and ties in 6 cases, compared with SFLSDE. We can also find that MDE obtains much better solutions than LBBO and CLPSO. As for L-SHADE, MDE wins in 8 cases and loses in 10 cases. According to the results of the multiple-problem statistical analysis shown in Table 5 Figure 2 shows that MDE performs the first average ranking value and 6 Computational Intelligence and Neuroscience  Computational Intelligence and Neuroscience 7  L-SHADE obtains the second average ranking values on the 30-dimensional problems by Friedman's test. Table 3 shows that MDE also performs significantly better than five compared algorithms in the majority of the test functions except L-SHADE. For example, MDE wins in 15 cases, only loses in 3 cases, and ties in 3 cases, compared with LBBO. When comparing with L-SHADE, MDE wins in 8 cases and loses in 12 cases. Table 6 shows that MDE can perform higher + values than − values in five cases. According to Wilcoxon's test at = 0.05 and = 0.1, there are significant differences in two cases (MDE versus GL-25 and MDE versus LBBO), which means that in those cases MDE is significantly better than GL-25 and LBBO. Figure 3 shows that L-SHADE obtains better average ranking values than the other six algorithms on 50-dimensional problems by Friedman's test.

Comparison between the Other Six Algorithms and MDE.
In general, according to the analysis above, we can conclude that MDE and L-SHADE have better average rankings among the seven algorithms on 21 benchmark problems for all of three different dimensions. The performance of MDE is comparable to L-SHADE on 10-and 30-dimensional problems, while L-SHADE is better than MDE on 50-dimensional problems because of the larger initial population and linear population size reduction.

Influence of Contraction Criterion.
In the previous experiments the recommended initial 1,max = 2.0 and 2,max = 2.0 are used. In order to test the influence of different initial contraction criterion values to the enhanced performance of MDE, in this section, MDE is tested with different initial  Table 7.
From Table 7, MDE owes the best average ranking value at 2.0,2.0 than the other 8 groups on both 10-dimensional and 30-dimensional test functions. On 50-dimensional test functions, 1.0,1.0 is the better choice to MDE. In general, we can conclude that it is better to set smaller contraction criterion values as the dimension of test functions increases.

Influence of Parameter max .
The experiment is to test the influence of parameter max in MDE. Friedman's test results are shown in Table 8, where the values of max are set as max = {3, 5, 7, 9, 11, 13, 15} in Table 8. All other parameters are kept unchanged as described in Section 4.1. In addition, all experiments are conducted for 25 independent runs for each function.
It can be seen from Table 8 that MDE with max = 3.0 gets the better average ranking value than the other six cases at = 10. At = 30, max = 5.0 is the best choice and max = 3.0 is the second best choice to MDE. On 50-dimensional test functions, max = 3.0 is the better choice parameter to MDE. Generally speaking, the small max value such as 3 or 5 is good to enhance the performance of the MDE algorithm.  Table 9, where the values of are set as {30, 60, 90, 120, 150}. All other parameters are kept unchanged as described in Section 4.1.
From Table 9, MDE with = 30 ranks the first both at = 10 and = 30, while MDE with = 90 performs the best at = 50. From the results, it can be concluded that, as the dimension increases, a properly increased population size can enhance the search capability of MDE.

Conclusion
A memetic differential evolution, MDE, has been introduced in this paper. MDE uses a new contraction criterion to decide when the local search starts. In addition, MDE includes the global and local search operators, along with reinitialization, to improve performance.
To evaluate the performance of MDE, 21 benchmark functions with different characteristics are chosen for test. The results show that (i) MDE can obtain better or at least comparable results, compared with the other six algorithms; (ii) small contraction criterion values and max can enhance the performance of MDE in terms of the quality of the final results; (iii) a large population size is good to MDE as the dimension increases.
In this paper, some preliminary experiments have been performed to verify its effect on the performance of MDE. In our future work, MDE will be tested on some real-world applications problems. Moreover, we believe that some other local search algorithms and adaptive population size strategy can also be used in MDE.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.  / "−," "+," and "=" denote that the performance of this algorithm is, respectively, worse than, better than, and similar to MDE according to the Wilcoxon signed-rank test at = 0.05.

10
Computational Intelligence and Neuroscience