An Enhanced Differential Evolution Algorithm Based on Multiple Mutation Strategies

Differential evolution algorithm is a simple yet efficient metaheuristic for global optimization over continuous spaces. However, there is a shortcoming of premature convergence in standard DE, especially in DE/best/1/bin. In order to take advantage of direction guidance information of the best individual of DE/best/1/bin and avoid getting into local trap, based on multiple mutation strategies, an enhanced differential evolution algorithm, named EDE, is proposed in this paper. In the EDE algorithm, an initialization technique, opposition-based learning initialization for improving the initial solution quality, and a new combined mutation strategy composed of DE/current/1/bin together with DE/pbest/bin/1 for the sake of accelerating standard DE and preventing DE from clustering around the global best individual, as well as a perturbation scheme for further avoiding premature convergence, are integrated. In addition, we also introduce two linear time-varying functions, which are used to decide which solution search equation is chosen at the phases of mutation and perturbation, respectively. Experimental results tested on twenty-five benchmark functions show that EDE is far better than the standard DE. In further comparisons, EDE is compared with other five state-of-the-art approaches and related results show that EDE is still superior to or at least equal to these methods on most of benchmark functions.


Introduction
Optimization problems are ubiquitous in the various areas including production, life, and scientific community. These optimization problems are usually nonlinear and nondifferentiable. Particularly, the number of their local optima may increase exponentially with the problem size. Thus, evolutionary algorithms (EAs) only needing the value information of objective functions have many more advantages and have drawn more and more attention of many researchers all over the world. In this way, a lot of researchers have developed a great number of evolutionary algorithms, such as genetic algorithms (GAs), particle swarm optimization (PSO), ant colony optimization (ACO), and differential evolution (DE) algorithm. Among them, differential evolution is one of the most powerful stochastic real-parameter optimization algorithms [1]. It was originally developed by Storn and Price [2,3] in 1995.
According to the aforementioned statements, it can be seen that DE has been very successful in solving various optimization problems. As far as the type of optimization problems is concerned, more researches mainly focus on continuous function optimization. However, the convergence precision and convergence speed over function optimization are still to be improved. That is, the exploration ability and exploitation ability of DE cannot be well balanced. To overcome the shortage of imbalance of the two abilities, more and more researchers have developed a large number of DE variants. For example, Noman and Iba [11] proposed a kind of accelerated differential evolution by incorporating an adaptive local search technique. Rahnamayan et al. [13] proposed an opposition-based differential evolution (ODE for short), in which a novel opposition-based learning (OBL) technique and a generation-jumping scheme are employed.
Qin et al. [14] proposed a self-adaptive differential evolution algorithm, called SaDE, in which both trial vector generation strategies and their associated parameter values are dynamically self-adapted during the process of producing promising solutions. Zhang and Sanderson [15] proposed a novel differential evolution referred to as JADE, in which a novel self-adaptive parameters scheme and a new mutation strategy with optional archive are proposed. And these improvements made JADE achieve a very fast convergence speed and highquality solutions. Subsequently, Gong et al. [22,23] proposed a few enhanced DE versions based on JADE by introducing adaptive strategy selection schemes or control parameters adaption mechanisms. In summary, all these state-of-the-art DE variants have achieved better convergence performance than the traditional DE.
Unfortunately, up to now, there exists no specific DE version to substantially achieve the best solution for all optimization problems because the exploration and the exploitation often mutually contradict in reality. Hence, searching for better approaches is very necessary. In order to solve continuous optimization problems more efficiently, an enhanced differential evolution algorithm based on multiple mutation strategies, called EDE for short, is presented in this paper.
The structure of the paper is organized as follows. The standard differential evolution algorithm is described briefly in Section 2. In Section 3, an enhanced differential evolution algorithm is presented and described in detail. Subsequently, Section 4 employs a set of benchmark functions to comprehensively investigate the performance of the proposed algorithm through experimental results of these functions and comparisons with other well-known evolutionary algorithms. Finally, conclusions and further study directions are given in Section 5.

Differential Evolution Algorithm
Differential evolution algorithm was first proposed by Storn and Price [2,3]. Like other evolutionary algorithms, an initialization phase is its first task. In addition, it also consists of three major operations: mutation, crossover, and selection. Meanwhile, there exist a few mutation strategies proposed in the work [3]. In order to distinguish the different DE versions with various mutation strategies or different crossover schemes, the famous notation DE/ / / was introduced in the literature [3], where represents the vector to be mutated, is the number of differential vectors used, and denotes the crossover scheme employed. DE/rand/1/bin was applied most commonly and it was also usually considered as the canonical DE version. To be specific, the canonical DE version can be described as follows.

Initialization.
At the first step, a population of NP individuals is generated randomly by the following form: where = 1, 2, . . . , NP, = 1, 2, . . . , ; min and max are the lower bound and upper bounds of the parameter , respectively. Then, the cost function of each solution is evaluated.

Mutation. Mutation strategy is very important in DE.
At the step, a mutant vector V is generated by the following formula for each -dimensional target vector : where = 1, 2, . . . , NP, , , ∈ {1, 2, . . . , NP} are mutually different random integer number, and they are such that ̸ = ̸ = ̸ = . The mutation scale factor is a real and constant factor ∈ [0, 2] which controls the amplification of the differential variation ( − ) [3].

Crossover.
In order to exchange information between a mutant vector V and the current target vector , crossover operation is introduced. At this time, a trial vector = ( 1 , 2 , . . . , ) is produced by the following form: where = 1, 2, . . . , , rand[0, 1] is a random real number between [0, 1], and rand ∈ {1, 2, . . . , } is a randomly chosen index, which ensures that the trial vector obtains at least one parameter from the mutant vector V . Crossover rate Cr is a predefined constant within the range [0, 1] and it controls the fraction of parameter values copied from the mutant vector.

Selection.
After crossover operation, the trial vector is compared to the target vector through a greedy selection mechanism. The winner is retained and it will become a member of next generation. For a minimization problem, the selection process can be described according to the following equation: where ( ) denotes the objective of solution and ⋆ is an offspring corresponding to the target vector . In a word, except for the initialization phase, the aforementioned steps will be repeated in turn until a stopping criterion is reached.

Initialization Based on Opposition-Based Learning.
Recently, Rahnamayan et al. [12,13] proposed a new scheme for generating random numbers, called opposition-based learning (OBL), which can effectively make use of random numbers and their opposites. Moreover, the ability of OBL accelerating the optimization, search, or learning process in many soft computing techniques has been reported in (1) for = 1 to NP do (2) for = 1 to do (3) , = min + rand(0, 1) ⋅ ( max − min ) (4) , = min + max − , //opposition-based learning (5) end for (6) end for Algorithm 1: Initialization based on opposition-based learning. the literatures [12,13]. At first, a state-of-the-art algorithm, named ODE, was proposed by applying the OBL scheme to accelerate DE [13]. After that, the OBL scheme has been successfully used in other evolutionary algorithms such as artificial bee colony algorithm [40], harmony search algorithm [41], particle swarm optimization [42,43], and teaching learning based algorithm [44]. A comprehensive survey about the OBL scheme can be found in [45].
In order to improve the solution quality of initial population, the OBL scheme is employed to initialize the population individuals of EDE in the work. The initial process can be described as shown in Algorithm 1. In Algorithm 1, two sets, that is, sets and , are generated, where = { 1 , 2 , . . . , NP } and = { 1 , 2 , . . . , NP }. The initial population consists of the top NP individuals chosen from the set ∪ according to their fitness values.
In order to better take advantage of the guiding information of best individual, a new version of DE/best/1/bin, DE/ best/1/bin, proposed by Zhang and Sanderson [15], is further employed in the work to speed up the convergence speed of the proposed approach EDE. That is, a mutant vector V is produced as follows: where ∈ {1, 2, . . . , } ⊆ {1, 2, . . . , NP} is a random number and it denotes the top individuals according to the fitness values of individuals. It should be noted that of DE/ best/1/bin in JADE [15] is a proportional number between [0, 1].
More specifically, according to the first mutation strategy, it can be seen that new generated mutant vectors will be scattered around the respective target vectors, which can not only keep good population diversity but also avoid the overrandomness of classic mutation strategy DE/rand/1/bin. According to the second mutation strategy DE/ best/1/bin, owing to the guidance of one of several better individuals ( best ) rather than the only best individual best , the used mutation strategy can drive population towards better individuals so as to enhance the convergence speed. In addition, it can also prevent EDE from congregating the vicinity of global best individual to some extent.
In the meantime, a probabilistic parameter 1 is time varying and designed to control which of the two mutation strategies is to be executed at the mutation step. The parameter 1 can be described as follows: where max and min denote the maximum probability value and the minimum probability value, respectively. FEs is an iterative variable. max FEs represents the maximum number of fitness function evaluations. As a matter of fact, the probability parameter 1 plays an important role in balancing the exploration ability and the exploitation ability. That is, it is hoped that good population diversity is kept at the beginning of evolution and fast convergence speed is achieved at the end of search.

Perturbation.
After repeating all operations (mutation, crossover, and select operations) of differential evolution, a perturbed scheme is conducted over the best individual in order to further trade off the searching ability of the aforementioned solution search equations. During the process, two perturbed equations are introduced and the best individual is perturbed dimension by dimension according to them, which are described by (8) and (9), respectively. One has is a temporary copy of the best individual, best represents the index of best individual in current population, ∈ {1, 2, . . . , NP}∧ ̸ = best is a uniform random number, and ∈ {1, 2, . . . , } is also a random number. One has = best, + (2 ⋅ rand (0, 1) − 1) ⋅ ( best, − , ) , (9) where all the notations are the same as those in (8).
From (9), it can be observed that perturbation operation occurs on the current component of best individual, and the differential variation ( best, − , ) acts as perturbed scales.
Notice that dimension may be different from , which is helpful to enrich perturbation scales to some extent. That is, it may increase the probability of getting out of local minima trap.
What is more, the term best, of (9) is different from the first term on the right hand side of formulation (8).
The reason for (8) introduced is that information between different dimensions of best individual could be shared. Thus, the EDE algorithm could get out of local optimal trap with a larger probability.
Like the aforementioned tradeoff scheme, a probability parameter 2 is employed. The parameter 2 is linear timevarying during the evolution process as follows: where max and min denote the maximum probability value and the minimum probability value, respectively. The rest of these parameters are the same as those in (7). Concretely speaking, (8) is executed with a probability value 2 , but (9) is executed with a probability value (1 − 2 ).

Boundary Constraints Handling Technique.
In order to keep solutions subject to boundary constraints, some components of a solution violating the predefined boundary constraints should be repaired. That is, if a parameter value produced by solution search equations exceeds its predefined boundaries, the parameter should be set to an acceptable value. The following repair rule used in the literature [17] is employed in this work: 3.5. The Proposed Approach. In order to effectively take use of the guidance information of best individual, mutation strategy DE/best/1/bin is considered. In order to prevent a large number of individuals from clustering around the global best individual, inspired by JADE [15], mutation strategy DE/ best/1/bin is actually used. In addition, another mutation strategy DE/current/1/bin is employed to further trade off the exploitation ability of DE/ best/1/bin. At the same time, a selective probability 1 with linear time-varying nature is introduced to decide which mutation strategy works at the mutation phase of DE. Subsequently, a perturbation scheme for the best individual is incorporated into the modified DE version. In short, the pseudocode of EDE can be given in Algorithm 2 based on the above explanation.

Benchmark Functions and Parameter Settings.
To verify the optimization effectiveness of EDE, twenty-five benchmark functions with different characteristics taken from Yao et al. [46], Gong et al. [23], and Gao and Liu [40] are employed here.
These benchmark functions are listed briefly in Table 1, in which designates the dimensionality of test functions. All the functions are scalable and high-dimensional problems. Functions 01 -05 , 14 , and 15 are unimodal. Function 06 , that is, the step function, has one minimum and is discontinuous. Function 07 is a quartic function with noise. Functions 08 -13 and 16 -19 are difficult multimodal functions where the number of local minima increases exponentially as the dimension of test function increases. In addition, six shifted functions are chosen to evaluate the performance of EDE. Namely, functions 20 -25 are shifted functions and = ( 1 , 2 , . . . , ) representing a shifted vector is generated randomly in the corresponding search range.
In our experimental study, all benchmark functions are tested in 30 dimensions and 100 dimensions. The corresponding maximum number of fitness function evaluations (max FEs) is 15 4 and 50 4, respectively. Moreover, the other specific parameters of DE and EDE are set as follows.
DE Settings. In canonical DE/rand/1/bin, the scale factor is set to 0.5, the parameter of crossover rate Cr is set to 0.9, and the population size SN is 100. It should be noted that the values of three parameters are the same as those of the stateof-the-art algorithm ODE [13].
EDE Settings. In our proposed algorithm, the scale factor is set to 0.5. The parameter of crossover rate Cr is set to 0.9. And the population size SN is 20. A few other parameters are set as follows: max = 1, min = 0.1, max = 0.2, min = 0, and = 4.
For the set of experiments tested on 25 benchmark functions, we use the aforementioned parameter settings unless a change is mentioned. Furthermore, each test case is optimized thirty runs independently. Then, experimental results for these well-known problems as well as some comparisons with other famous methods are reported as follows.

Comparison between DE and EDE.
For the purpose of validating the enhancing effectiveness of EDE, EDE is first compared with canonical DE in terms of best, worst, median, mean, and standard deviation (Std.) values of solutions achieved by each algorithm in 30 independent runs. The corresponding results are listed in Table 2. Furthermore, the Wilcoxon rank sum test is conducted to compare the significant difference between DE and EDE at = 0.05 significance level. The related test results are also reported in Table 2. And then, some representatives of convergence curves of DE and EDE are shown in Figure 1 in order to show the convergence speed of EDE more clearly.
According to the aforementioned analyses, it can be concluded that EDE is better than or approximately equal to DE on almost all the functions. In other words, multiple

Comparison between EDE and Other Three DE Variants.
In this subsection, EDE is further compared with some representatives of state-of-the-art DE variants, such as SaDE [14], JADE [15], and SaJADE [23]. Here sixteen test functions are used for the comparison. The related comparison results are listed in Table 3. For a fair comparison, except for the proposed algorithm EDE, the rest of the results reported in Table 3 are directly taken from Gong et al. [23]. From Table 3, it can be seen that EDE is obviously better than JADE on twelve functions, that is, 01 , 02 , 04 , 05 , 06 , 08 , 09 , 10 , 12 , 13 , 19 , and 21 . JADE works better than EDE on four functions. Notice that EDE is just slightly inferior to JADE on the three functions 03 , 07 , and 18 . When compared with SaDE, EDE performs better than it does on thirteen functions. And the results found by EDE are very close to those found by SaDE on other two functions 07 and 18 . When compared with SaJADE, SaJADE is better than EDE on four functions, but the superiority of SaJADE is not obvious on the three functions 05 , 07 , and 18 except for function 11 . Yet EDE is better than or equal to SaJADE on other twelve functions.
It should be pointed out that the results are summarized as / / in the last line of Table 3, which means that EDE wins in function cases, ties in cases, and loses in cases when compared with its competitor. For JADE, SaDE, and SaJADE, they are 12/0/4, 13/0/3, and 11/1/4, respectively. The results show that EDE is superior to or similar to other three approaches on the majority of benchmark functions.

Comparison among EDE and Two Artificial Bee Colony
Algorithms. Artificial bee colony algorithm introduced by Karaboga and Basturk is a relatively new swarm-based optimization algorithm [47]. And it has become a promising technique [48]. Particularly, a modified artificial bee colony algorithm, named MABC, proposed by Gao and Liu [40], is an outstanding representative of many enhanced ABC versions. In order to further demonstrate the superiority of EDE, EDE is compared with standard ABC and MABC on twenty-one functions again. In the experimental study, the maximum number of fitness function evaluations (max FEs) is set to 15 4 for all compared algorithms as recommended by Gao and Liu [40].
The further comparison results are given in Table 4. For convenience, besides the data achieved by the EDE algorithm, the rest of the results are gained by Gao and Liu [40] directly.
From Table 4, it is clear that EDE is better than or at least even with ABC on nineteen functions, but ABC only works better than EDE on two functions. EDE is better than or equal to MABC on eighteen functions. MABC also only surpasses EDE on three functions. In addition, the accuracy of solution obtained by EDE is far better than that obtained by ABC on many benchmark functions such as 01 , 02 , 08 , 14 , 15 , and 19 . Meanwhile, the accuracy of solution obtained by EDE is far better than that obtained by MABC on some test functions including 01 , 02 , and 14 . In summary, EDE is superior to both ABC and MABC.

Conclusion
In order to achieve a better compromise between the exploration ability and the exploitation ability of DE, in this work, an enhanced differential evolution algorithm, called EDE, is presented. In EDE, first, an initialization technique, opposition-based learning initialization, is employed. Next, inspired by JADE [15], a mutation strategy DE/ best/1/bin is introduced in EDE. At the same time, a new mutation strategy DE/current/bin/1 is also introduced. That is, there are multiple mutation strategies composed of the two mutation strategies in EDE to better balance the exploration and the exploitation of DE. When performing the EDE algorithm, one of the two mutation strategies is chosen randomly with a linear time-varying scheme. Last, a perturbation scheme for the best individual is presented in order to get out of local minima, where the perturbation scheme is also composed of two solution search equations. Specifically, the best individual is perturbed dimension by dimension in two modes. All these modifications make up the proposed algorithm EDE.       / / 12/0/4 1 3 /0/3 1 1 /1/4 - † indicates that EDE is better than its competitor. ‡ means that EDE is worse than its competitor. ≈ means that the performance of the corresponding algorithm is even with that of EDE. Bold entities mean the best results.
To testify the convergence performance of EDE, twentyfive benchmark functions with different characteristics from literatures are employed. The first experimental results demonstrate that EDE significantly enhances the performance of standard DE in terms of the best, worst, median, mean, and standard deviation (Std.) values of final solutions in most cases. Moreover, other two comparisons also show that EDE performs significantly better than or at least highly competitive with other five well-known algorithms, that is, JADE, SaDE, SaJADE, ABC, and MABC, on the majority of the corresponding benchmark functions. Therefore, it can be concluded that EDE is an efficient method and it may be a good alternative for solving complex numerical optimization problems.
Last but not least, it is desirable to further apply the EDE algorithm to deal with other optimization problems such as the training of neural networks, system parameter identification, and data clustering. Here "a" means that the results obtained by EDE are set to zero on the function 8 when the results are less than 1 − 308. This is the reason that the coefficient −418.98288727243369 with low precision in function 8 may result in the negative results. As a matter of fact, the results should be zero.