Research Article Differential Evolution with Population and Strategy Parameter Adaptation

Differential evolution (DE) is simple and effective in solving numerous real-world global optimization problems. However, its effectiveness critically depends on the appropriate setting of population size and strategy parameters. Therefore, to obtain optimal performance the time-consuming preliminary tuning of parameters is needed. Recently, different strategy parameter adaptation techniques, which can automatically update the parameters to appropriate values to suit the characteristics of optimization problems, have been proposed. However, most of the works do not control the adaptation of the population size. In addition, they try to adapt each strategy parameters individually but do not take into account the interaction between the parameters that are being adapted. In this paper, we introduce a DE algorithm where both strategy parameters are self-adapted taking into account the parameter dependencies by means of a multivariate probabilistic technique based on Gaussian Adaptation working on the parameter space. In addition, the proposed DE algorithm starts by sampling a huge number of sample solutions in the search space and in each generation a constant number of individuals from huge sample set are adaptively selected to form the population that evolves. The proposed algorithm is evaluated on 14 benchmark problems of CEC 2005 with different dimensionality.


Introduction
Like most of the stochastic numerical optimization algorithms, differential evolution (DE) [1] starts with randomly sampled solution vectors which evolve over the generations with the help of genetic operators such as mutation, crossover, and selection.Due to its effectiveness, DE has been successfully employed to solve numerous optimization problems in various fields of engineering as communication [1], optics [2], and power systems [3].
However, experimentally [4] and theoretically [5] it has been demonstrated that the performance of DE is sensitive to the selection of mutation, crossover strategies, and their associated parameters such as, crossover rate (CR) and scale factor () and the population size ().In other words, the optimal combination of population size, strategies, and their associated control parameters can be different for different optimization problems.In addition, for the same optimization problem the optimal combination can vary depending on the available computational resources and accuracy requirements [6].Therefore, to successfully solve a specific optimization problem, it is necessary to perform trial-and-error search for the most appropriate combination of population size, strategies, and their associated parameter values.However, the trial-and-error search process is time consuming and incurs high computational costs.Therefore, to overcome the time-consuming trial-and-error procedure different adaptation schemes such as SaDE [7] and JADE [8] have been proposed in the literature.
From the literature on adaptive/self-adaptive parameter control techniques, it is clear that it can be observed that even a moderate parameter adaptation scheme is much better than a well-tuned combination of individual parameters on a given set of benchmark problems.In addition, adaptive/self-adaptive technique can enhance the robustness by dynamically adapting the parameters to the characteristic of different fitness landscapes.In other words, a well-designed parameter adaptation technique can effectively solve various optimization problems without the need for the trial-anderror process of parameter tuning.In addition, the convergence rate can be improved if the control parameters are adapted to appropriate values at different evolution stages of a specific problem.

Mathematical Problems in Engineering
Unlike most parameter adaptation techniques in DE [7] that employ explorative mutation strategies to obtain better performance, the authors in [8] proposed a parameter adaptation method with a greedy mutation strategy and binomial crossover strategy as search basis.The proposed greedy mutation strategy "DE/current-to-pbest" utilizes the information present in multiple best solutions to balance the greediness of the mutation and diversity of the population.In addition, the parameter adaptation technique is based on evolving the mutation factors and crossover probabilities based on their historical record of success.
In DE literature, most of the parameter adaptation techniques proposed consider the adaptation of the two different parameters, crossover probability and the scale factor, individually but do not consider the interaction between the two parameters.In other words, they do not take into account the side effects introduced by changing the values of the parameters individually.In addition, unlike the adaptation of strategies and their associated parameters, the adaptation of the population size in enhancing the performance of the DE algorithm has not been given significant consideration [9].
In this paper, we propose a parameter adaptation technique based on Gaussian Adaptation (GaA) [10], an estimation of distribution algorithm (EDA), to manage the dependencies between the two parameters (mutation scale factor, , and the crossover probability, CR) considered.In addition, we propose a population adaptation scheme where the algorithm has a large set of sampled solutions which evolve over the generations.In each generation, a fixed number of individuals from the large set become the population members of the DE algorithm.The fixed number of individuals from the large set can be selected randomly or based on the objective value depending on the stage of evolution.
The reminder of this paper is organized as follows.Section 2 presents a literature survey on (1) DE and different adaptive DE variants and (2) Gaussian Adaptation.Section 3 presents the proposed algorithm.Section 4 presents the experimental results and discussions while Section 5 concludes the paper.[11], a parallel real-coded global optimization algorithm over continuous spaces, utilizes , -dimensional parameter vectors, X , = { 1 , , . . .,   , },  = 1, . . .,  uniformly sampled within the search space to encode the candidate solutions.

Differential Evolution. Differential Evolution (DE)
During every generation , corresponding to each individual X , in the current population, referred to as target vector, a mutant vector V , is produced by one of the following mutation strategies: "DE/best/1" [12]: "DE/best/2" [12]: "DE/rand/1" [12]: "DE/current-to-rand/1" [13]: The indices   1 ,   2 ,   3 ,   4 ,   5 are mutually exclusive integers different from the index  and are randomly generated within the range [1, 𝑁𝑃].The scale factor  is a positive value, while  is randomly chosen within the range [0, 1].X best, is the solution vector with the best fitness value in the population at generation .
After mutation, crossover operation is applied to each pair of the target vector X , and its corresponding mutant vector V , to generate a trial vector U , .In DE, the most commonly used crossover is the binomial (uniform) crossover defined as follows [11]: The crossover rate CR is a user-specified constant within the range [0, 1], while  rand is a randomly chosen integer in the range [1, 𝐷].
After the crossover, the trial vectors are evaluated to obtain the objective function and selection operation is performed.The objective function value of each trial vector (U , ) is compared to that of its corresponding target vector (X , ) in the current population.If the trial vector is better than the corresponding target vector, the trial vector will replace the target vector and enter the population of the next generation.In a minimization problem, the selection operation can be expressed as follows: In DE, mutation, crossover, and selection are repeated generation after generation until a termination criterion is satisfied.The algorithmic description of DE is summarized as follows.

Differential Evolution Algorithm
Step 1. Set the generation number  = 0 and randomly initialize a population of  individuals.
Step 2. WHILE stopping criterion is not satisfied.DO.
As mentioned earlier, the performance of the conventional DE algorithm depends on the population size, chosen strategies, and their associated control parameters.In addition, as complexity of the optimization problem increases the performance of DE algorithm becomes more sensitive to the parameter settings [4].Therefore, inappropriate choice of population size, mutation and crossover strategies, and their associated parameters may lead to premature convergence, stagnation, or wastage of computational resources [6].In literature, various empirical guidelines were suggested for choosing the appropriate population size, strategies, and control parameter settings depending on the characteristics of the optimization problems [6].However, depending on the complexity of the optimization problem, choosing an appropriate population size, strategies, and control parameters is not straight forward due to the complex interaction of control parameters with the DE's performance [14].In addition, the manual setting and/or tuning of DE strategies and parameters based on the guidelines result in various conflicting conclusions, which lack sufficient justifications.Therefore, to avoid the tuning of parameters by trial-anderror procedure, various adaptive techniques have been proposed.Among the three parameters (, , and CR), most of the parameter adaptive techniques except [9], set the population size () to a predefined value based on the dimensionality of the problem.
Among the different adaptive DE variants, adaptive differential evolution proposed in [8], referred to as JADE, demonstrates good performance in terms of convergence speed and robustness on a variety of optimization problems.JADE [8] implements a mutation strategy "DE/current-to-best" as a generalization to the classic "DE/current-to-best" strategy.Unlike the classic mutation strategy which uses the current best individual, "DE/current-to-best" utilizes the information present in  fitter individuals of the current population.The use of multiple solutions helps in balancing the greediness of the mutation and the diversity of the population.In JADE, the control parameters ( and CR) are updated in an adaptive manner in order to alleviate the trial-and-error search.In JADE, using the "DE/current-to-best", a mutation vector corresponding to the individual X  in generation  is generated as where X 1, , X 2, , and X  best, are selected from the current population.At each generation, the scale factor   and crossover probability CR  of each individual X  are independently generated as As shown in ( 8) and ( 9), the parameters  and CR corresponding to each individual are sampled using Cauchy and Normal distributions, respectively.Then mean values   and  CR are initialized to 0.5 and are updated at the end of each generation as where  is a positive constant between 0 and 1.The terms mean  (⋅) and mean  (⋅) denote the arithmetic mean and Lehmer mean [8], respectively.  and  CR denote the sets of mutation factors and crossover probabilities, respectively, that produced successful trial vectors in the previous generation.
During the past decade, hybridization of EAs has gained significance, due to ability to complement each other's strengths and overcome the drawbacks of the individual algorithms.In [15], the authors proposed a DE parameter adaptation technique based on harmony search (HS) algorithm in which a group of DE control parameter combinations are randomly initialized.The randomly initialized DE parameter combinations form the initial harmony memory (HM) of the HS algorithm.Each combination of the parameters present in the HM is evaluated by testing on the DE population during the evolution.Based on the effectiveness of the DE parameter combinations present in HM, the HS algorithm evolves the parameter combinations.At any given point of time during the evolution of the DE population, the HM contains an ensemble of DE parameters that suits the evolution process of the DE population.
As mentioned above, among the different adaptive DE variants [16][17][18][19][20][21][22], there exist only a few significant works considering the adaptation of the population size in DE algorithm [9].In addition to the adaptation of control parameters  and CR, the author in [9] considered the selfadaptation of populations and the algorithm is referred to as DESAP.In DESAP the population size () is automatically adapted from initialization to the completion of the evolution process.DESAP is proposed with two encoding methodologies: absolute encoding methodology and relative encoding methodology.

Gaussian Adaptation.
Optimization algorithms that rely on probabilistic models, such as Covariance Matrix Adaptation (CMA) [23] and Gaussian Adaptation (GaA) [10], belong to the class of Estimation of Distribution Algorithms (EDAs) and do not rely on any variation operators such as crossover or mutation.In EDAs, the most promising solutions of the previous generation are used to update the probability distribution model.The updated probability distribution model is used to sample the solutions for the next generation.
In other words, EDAs rely on the iterative random sampling and updating the probability distribution model in order to approximate the problem at hand.Therefore, the process in which the random samples are generated plays a crucial role.In continuous spaces, typical EDAs employ a multivariate Gaussian distribution as the probability density model [24].Continuous optimization methods, such as GaA [10] and Evolution Strategies (ES) [23], use Gaussian sampling to generate candidate solutions from the target distribution and evaluates the target distribution at the sample points.
Covariance Matrix Adaptation (CMA-ES) [6] and GaA [10] constantly adapt the covariance matrix of the sampling distribution based on the previously accepted samples.In CMA-ES covariance adaptation is employed to increase the likelihood of generating successful mutations, while GaA adapts the covariance to maximize the entropy of the search distribution under the constraint that acceptable search points are found with a predefined, fixed hitting probability.
Gaussian Adaptation (GaA) is a stochastic process that adapts a Gaussian distribution to a region or set of feasible points in parameter space.As a result of the adaptation, GaA becomes a maximum dispersion process extending the sampling over the largest possible volume in parameter space while keeping the probability of finding feasible points at a suitable level.GaA is based on the principle of maximum entropy and tries to maximize the entropy  of a multivariate Gaussian distribution (m, C) given the mean (m) and the covariance (C) information: The entropy is maximized by maximizing the determinant of the covariance matrix.The GaA algorithm starts with mean of a multivariate Gaussian distribution (m (0) ) and an initial point (x (0) ).In iteration ( + 1), a new solution is sampled as where  () ∼ (0, I).Q () is the normalized square root of C ()  and is obtained by following decomposition: where r is the scalar step size.
In order to minimize a real-valued objective function (x), GaA uses a fitness dependent acceptance threshold   which is monotonically lowered until some convergence criteria are met.If the objective value of the newly sampled solution in (13) is less than   , then the mean (m), covariance (C), and the scale factor (r) are updated as follows: where   > 1 is the expansion factor.  and   are the weighting factors.Δx = (x (+1) − x () ).
If the objective value of the newly sampled solution x (+1) is greater than the threshold then the mean and covariance are not adapted but the step size is reduced as where   < 1 is the contraction factor.In order to use GaA for optimization, the acceptance threshold   is continuously lowered as follows: where   is the weighting factor.The fitness-dependent update of   makes the algorithm invariant to the linear transformations in the objective function.
Step 2. WHILE stopping criterion is not satisfied.DO.
Step 2.2.Evaluate and Check if the objective value of newly sampled solution is less the threshold   .

Proposed Differential Evolution Algorithm
As highlighted in the previous section, depending on the nature of problem (unimodal or multimodal) and available computation resources, different optimization problems require different population sizes, mutation, and crossover strategies combined with different parameter values to obtain optimal performance.In addition, to solve a specific problem, different mutation and crossover strategies with different parameter settings may be better during different stages of the evolution than a single set of strategies with unique parameter settings as in the conventional DE.Motivated by these observations, many adaptive and self-adaptive parameter adaptive techniques have been proposed [16][17][18][19][20][21][22].As mentioned earlier most of the adaptive DE algorithms consider adapting scaling and crossover rate parameters only.In addition, most of the adaptive techniques which consider adapting the  and CR values adapt them individually.For instance, in JADE [8], the mutation factors and crossover probabilities are evolved based on their historical record of success. and CR values corresponding to the individuals in the current generation are generated from corresponding mean values using Cauchy and Gaussian distributions, respectively.After the selection process, the  and CR values that were able to produce successful trial vectors are collected.Then the respective mean values of  and CR are updated using Lehmer and arithmetic means, respectively.In other words, the  and CR are generated (see (8) and ( 9)) and adapted (see (10) and ( 11)) individually.Therefore, JADE does not consider the intercorrelation between the two parameters.However, in [6], it has been demonstrated that performance of DE depends on the combination of  and CR.In other words, the parameters  and CR on which the performance of DE depends are intercorrelated.Therefore, adapting the two parameters individually may not result in the optimal performance of the DE algorithm.
In this paper, we present a population size, crossover rate, and scale factor adaptation.The proposed algorithm considers the intercorrelation between the two parameters, mutation scale factor, and crossover probability, during the process of adaptation.In other words, the parameters evolve based on the Gaussian adaptation process which is used for parameter optimization.

Parameter Adaptation Based on Gaussian Adaptation.
Unlike most DE adaptation algorithms, the proposed algorithm adapts  and CR by employing GaA on the bidimensional continuous space composed of  and CR.Therefore, the data structures employed by the proposed algorithm are the mean vector m and the covariance matrix C. The mean vector (m) comprises the mean values of  and CR while the covariance matrix (C) comprises the interdependencies between the two parameters.
As in JADE [8], every DE individual is assigned with a personal version of the parameters; that is, there is a couple   , CR  for each individual  sampled using (13).In other words, every time that these parameters are needed (for mutation and crossover in DE), they are sampled from the multivariate Gaussian Distribution identified by m and C. In the current work, the mean vector m is initialized to [0.5, 0.5] and covariance matrix (C) is set to an identity matrix.
During every generation of the DE evolution, the   and CR  values corresponding to the individuals in the population are generated using the mean (m) and the covariance matrix (C) using (13).Each individual in the DE algorithm uses the   and CR  values to produce the mutation vectors and consequently trial vectors.The combination of   and CR  values that resulted in an offspring that produces maximum improvement is used to update the mean (m) and the covariance (C).The continuous updating of m and C by the parameter combinations that produced better solutions will help the parameter search to move to the regions where more suitable combination of the parameters can be generated.The limits of the  and CR are set to be (0, 1.0] and [0, 1.0], respectively.

Population Adaptation.
In this work, the proposed algorithm starts by sampling a huge set of solutions within the search space.During every generation, the number of solutions equal to the population size () is selected to evolve with the help of mutation, crossover, and selection operations.After the generation, during which the evolution of the individuals selected takes place, the members of the population are replaced into respective places in the huge set.
In the proposed algorithm, the selection of individuals from the huge set during every generation should be done in an appropriate manner to balance the exploration and exploitation.In this paper, we use a simple criterion to select the population individuals that can evolve based on objective value distribution of the individuals in the huge set.
The mean objective value corresponding to a set of individuals that are uniformly and randomly sampled from a search space will always be less than the median objective value of the set of individuals.As long as the mean objective value is less than the median objective value in the large set the population members for the next generation are selected using a tournament selection.The selection of individuals based on the tournament selection allows the fittest individuals to allow the population and evolve.However, when the mean objective value is greater than the median objective value it gives an indication that most of the solutions in the large set are within a basin.The solutions within a basin may lead to premature convergence and therefore to balance the exploration capability the population members for the next generation are taken at random from the large set.

Outline of Proposed DE Algorithm
Step 1. Set the generation count  = 0, and randomly sample 20 *  solution vectors.Initialize m, C, r, and   .
Step 2. WHILE stopping criterion is not satisfied.DO.
Step 2.1.Select population members from the large set.
Step 2.6.Replace the solution vectors in the large set with the members of the current population at the respective indices.
Step 2.7.Check if the improvement by the best parameter combination is greater than the threshold   .

Experimental Setup and Results
We evaluated the performance of the proposed algorithm on a set of 14 test problems of CEC 2005 [25].Each of the 14 test problems is scalable and the algorithm is tested on the 10D, 30D, 50D, and 100D versions of the test problems.

Mean
Std.

Mean
Std.

Mean
Std.

Mean
Std.

Mean
Std.The maximum number of function evaluations considered is 10000 *  (100000 for 10D; 300000 for 30D; 500000 for 50D; 1000000 for 100D).The algorithm is terminated when the maximum number of function evaluations is reached.

Mean
In the current work, each algorithm is run 30 times independently on each problem.To compare the performance of different algorithms, we employ two types of statistical tests, namely -test and Wilcoxon rank-sum test.The -test being a parametric method can be used to compare the performance of two algorithms on a single problem.To compare the performance of two algorithms over a set of different problems, we use a nonparametric test such as the Wilcoxon rank-sum test.
The proposed algorithm employs "DE/current-to-pbest" mutation strategy along with the binomial crossover.As mentioned above, the proposed adaptation scheme works in the bidimensional parametric space ( and CR).In the proposed algorithm, we initially sample 20D solution vectors, out of which 100 individuals are selected as the population members at the start of every generation.After the generation the solution vectors are replaced.In addition to the parameters of the DE algorithm, the parameters of the GaA algorithm are set to the same values that are proposed in [10].

Effect of Population Size and Population Adaptation in DE
Algorithm.At first, we demonstrate the effect of population size () on the performance of the adaptive DE algorithms.To demonstrate the effect, we consider the JADE algorithm as the base algorithm and evaluated the 30D versions of 14 benchmark problems of CEC 2005 with different population sizes such as 50, 100, 200, 400, 600, 800, and 1000.The results are summarized in Table 1.
In Table 1, corresponding to each problem, statistically significant results (based on the statistical t-test) are marked in bold font.From the results, it can be observed that JADE with a single population size setting is not apt on all benchmark problems.JADE with  values less than 400 perform better on almost all the benchmark problems except 7.In addition,  = 50 and  = 100 perform equally better on most of the benchmark problems compared to  = 200 and  = 400, in terms of solution quality.However, there  = 400 shows a significantly better performance compared to  = 50 and  = 100 on 3, 4, and 5.During the experimentation, a similar observation was made with the benchmark problems with dimensionality 10D, 50D, and 100D.Therefore, from the results it can be observed that a single setting of the population even when the strategy parameters ( and CR) are self-adapted may not result in optimal performance on a given set of benchmark problems.In other words, these results demonstrate the need for the self-adaptation of the population in DE algorithm.
In Table 2, we compare the performance of the proposed population adaptation scheme with JADE using fixed population size.In Table 2, the best results corresponding to each benchmark problem obtained using JADE with fixed population size is present in Table 2.In Table 2, the population size in the first column indicates the population size at which the results (reported in columns 2 and 3 are taken from Table 1) are taken.The results in columns 4 and 5 correspond to mean and standard deviation values obtained using JADE with the proposed population adaptation scheme.
From the results in Table 2, it can be observed that the proposed population adaptation scheme is slightly better or equal (based on -test) to the best results obtained using fixed population size.In addition, the adaptation of population alleviates the need to tune the parameter depending on the characteristics of the benchmark problems.In other words, the proposed population adaptation scheme is robust and can overcome the need to tune the population size parameter using trial-and-error method which is time consuming.In addition, Figure 1 shows the convergence characteristic of JADE with fixed population of 400 and JADE with the proposed adaptive population techniques.From the figure, it is clear that the proposed population adaptation can better balance between the exploration and exploitation capabilities.In addition, the Wilcoxon rank-sum test results demonstrate   that the adaptive population is better than each of the individual population sizes (50, 100, 200, 400, 600, 800, and 1000).

Adaptation of 𝐹 and 𝐶𝑅 Using Gaussian Adaptation.
In this section, we evaluate the performance of the proposed algorithm which includes the population adaptation and adapts the strategy parameters ( and CR) using Gaussian adaptation.The proposed algorithm is evaluated using 10D, 30D, 50D, and 100D versions of the 14 benchmark problems of CEC 2005 and is compared with the JADE with population adaptation.The results are summarized in Table 3 and statistically significant (based on -test) results are marked by bold font.
From the results in Table 3, it can be observed that the proposed algorithm is always better or equal but not worse, statistically in terms of the solution quality.The significant difference in the performance between JADE and the proposed GaA based parameter adaptation is in the linked problems such as 3 and 10.In these problems, the superiority of the proposed algorithm can be attributed to the modelling of the interdependency between the two strategy parameters ( and CR).Even in the unimodal problems such as 1 and 2, the difference in the performance becomes significant as dimensionality increases.
In Table 3, the last column contains the results of Wilcoxon rank-sum test results.Number 1 below the proposed algorithm indicates that the proposed algorithm is statistically better than JADE with adaptive  alone.

Conclusion
In this paper, we propose a DE algorithm where the population is adapted in addition to the strategy parameters.Unlike in most of the adaptive DE algorithms, the proposed algorithm adapts the DE strategy parameters by taking into account the interactions between them.The interactions are modelled using a Gaussian distribution in the bivariate parameter space, referred to as Gaussian adaptation.With the help of simulations, we were able to demonstrate the superiority of the proposed population adaptation technique compared to the fixed population size in terms of balancing the exploitation and explorations stages of the evolution.In addition, the Gaussian Adaptation of the strategy parameters demonstrates the importance of considering the interactions between the parameters while designing parameter adaptation techniques in evolutionary algorithms.

Figure 1 :
Figure 1: Convergence characteristics of JADE with fixed and adaptive population for 3.

Table 1 :
Effect of population size on the performance of JADE algorithm (30D benchmark problems).

Table 2 :
Comparison between JADE with proposed population adaptation and JADE with different population sizes.

Table 3 :
Comparison between JADE with adaptive population and proposed algorithm.