An External Selection Mechanism for Differential Evolution Algorithm

The procedures of differential evolution algorithm can be summarized as population initialization, mutation, crossover, and selection. However, successful solutions generated by each iteration have not been fully utilized to our best knowledge. In this study, an external selection mechanism (ESM) is presented to improve differential evolution (DE) algorithm performance. The proposed method stores successful solutions of each iteration into an archive. When the individual is in a state of stagnation, the parents for mutation operation are selected from the archive to restore the algorithm's search. Most significant of all, a crowding entropy diversity measurement in fitness landscape is proposed, cooperated with fitness rank, to preserve the diversity and superiority of the archive. The ESM can be integrated into existing algorithms to improve the algorithm's ability to escape the situation of stagnation. CEC2017 benchmark functions are used for verification of the proposed mechanism's performance. Experimental results show that the ESM is universal, which can improve the accuracy of DE and its variant algorithms simultaneously.


Introduction
Differential evolution (DE) algorithm [1], introduced by Storn and Price in 1995, is one of the most popular evolutionary algorithms. DE is easy to be understood, and the principle is simple. Still, it demonstrates good optimization ability and is used in many single-objective optimization problems successfully, including continuous optimization [2], discrete optimization [3], constrained optimization [4], and unconstrained optimization [5].
Different from other meta-heuristic algorithms [6], the procedures of differential evolution algorithms can be summarized as population initialization, mutation, crossover, and selection. After each iteration, promising solutions with better fitness values are selected to survive to the next iteration. ese solutions can be called successful solutions. DE repeats this procedure until a predefined termination criterion is reached. However, DE may suffer stagnation during the evolution process, which means stop generating successful solutions and converging to a fixed point [7].
When the population stagnates, taking appropriate strategies to restore the search should theoretically further improve the algorithm's performance.
Successful solutions are generally located in valleys of the fitness landscape. As the iteration progresses, successful solutions containing useful information may be discarded and underutilized. We, therefore, propose an external selection mechanism (ESM). First, ESM stores successful solutions of each iteration into an archive. When the individual is in a state of stagnation, the parents for mutation operation are selected from the archive. In this way, the population can regain diversity and restore searchability. Second, the diversity and superiority of the archive are relatively significant because they can directly affect the performance of the algorithm. erefore, a rule is proposed to update the archive based on novel crowding entropy diversity measurement in the fitness landscape and fitness rank.
To verify and analyze the effectiveness of the ESM, we performed a series of experiments and comparisons on the CEC2017 benchmark set [8], incorporating three classic DEs and five state-of-the-art DE variants. Results indicate that the ESM can effectively improve the algorithm performance without increasing the computational complexity. e rest of this study is organized as follows: Section 2 introduces the canonical DE algorithm, including its typical mutation, crossover, and selection operators. Section 3 reviews related work. e proposed ESM procedures are introduced in Section 4. e effectiveness of the proposed mechanism is discussed in Section 5 based on the computational results. Conclusions and future work are illustrated in Section 6.

Differential Evolution
In this section, we introduce the basic differential evolution algorithm, including the well-known mutation strategy DE/rand/1 [1] and other widely used mutation strategies.

2.1.
Initialization. An initial random population P consists of NP individuals, each represented by X t i � (x t i,1 , x t i,2 , . . . , x t i,D )|i � 1, 2, . . . , NP}, where t � 0, 1, 2, . . . , T max is the iteration number, and D is the number of dimensions in the solution space. In DE, uniformly distributed random functions are used to generate initial solutions x 0 i,j � x j, min + rand(0, 1) · (x j, max − x j, min ), where x j, max and x j, min are the maximum and minimum boundary values, respectively, on the corresponding j th dimension.

Mutation.
At iteration t, for each target vector X t i , a mutant vector V t i is generated according to the following: Other widely used mutations strategies include where r 1 , r 2 , r 3 , r 4 , and r 5 are mutually different integers randomly generated from the range (1, 2, . . . , NP), and they are different from i (r 1 ≠ r 2 ≠ r 3 ≠ i). X t best is the individual vector with the best fitness value in the population at iteration t.

Crossover.
We illustrate the binomial crossover, in which the target vector X t i and donor vector V t i exchange elements according to the following rules: where the crossover rate CR ∈ [0, 1] is a uniformly distributed random integer in [1, D] that ensures at least one element of the trial vector is inherited from the donor vector.

Selection.
e greedy selection strategy is generally used in DE. e variable, X t+1 i , is assigned when the fitness value of the trial vector, U t i , is equal to or better than X t i , and otherwise X t i is reserved: where f(x) is the fitness function.

Related Work
Most research on improving DE has focused on mutation operator [9]. In recent decades, there have been many mutation operators with distinct search performance that have been proposed. Fan et al. [10] proposed a triangular mutation strategy, which was considered a local search operator. Zhang et al. [11] introduced a new DE variant, JADE, improving optimization performance by a new mutation strategy, DE/current − to − pbest/1. DE/current − to − pbest/1 is one of the most successful mutation operators because of its relatively balanced performance between exploration and exploitation. Wang et al. [12] proposed a new mutation strategy called DE/current − to − lbest/1 based on the values near the current parameter to keep balance of exploitation and exploration capabilities during the differential evolution. A novel DE algorithm with intersect mutation operation called intersect mutation 2 Computational Intelligence and Neuroscience differential evolution (IMDE) was proposed [13] to further improve the performance of the standard DE algorithm. Mohamed et al. [14] proposed two mutation strategies, DE/current − to − ord best/1 and DE/current − to − ord pbest/1. Both of the proposed mutation strategies were based on ordering three selected vectors to achieve different search behaviors. Deng et al. [15] proposed a novel DE variant called DCDE based on a dynamic combination-based mutation operator to achieve an appropriate balance between global exploration ability and local exploitation ability.
In addition, novel neighborhood-based [16], dimensionbased [17], opposition-based [18], and regeneration-based [19] mechanisms also showed their effectiveness on improving the performance of DE and its variants. In this study, we focus on the related work on archive-based techniques for DE. In the multiobjective evolutionary algorithm, a subpopulation called the external archive is used to store nondominated solutions that have been found during the search [20,21], employing an archive is almost standard in multiobjective optimization [22]. e goal of single-objective global numerical optimization is to find a global optimum in decision space of a given objective function. Recently, archive is also introduced into global numerical optimization. In JADE [11], a set of recently explored inferior solutions are archived, and their difference from the current population is considered as a promising direction toward the optimum. e archive is applied to the second difference vector of the mutation strategy in JADE. Zhou et al. [23] proposed a DE framework with guiding archive (GAR-DE), GAR-DE chose the base vector of mutation strategy from the archive to help DE escape from the situation of stagnation, and Manhattan distance metric was used to maintain the diversity of the archive. Zeng et al. [24] proposed a new selection operator (NSO) for DE, which archived not only the successful updated solutions but also the discarded trial solutions. e NSO focused on selecting appropriate solution from the archive to survive to the next generation.
As can be seen from preceding explanation, the differences among archive-based mechanisms mainly consist in two aspects: what information/solutions should be stored in the archive; how to maintain and update the archive; and how to use the archive to guide the search. e most striking feature of the proposed ESM is the use of information entropy rather than distance metric [23,25] to maintain the diversity of the archive.
Shannon defined the information entropy theory [26] for the first time in 1948. In information theory, entropy is used to measure the expected value of a random variable. As the evaluation scale of information quantity of stochastic process, information entropy has been extended to a general and effective tool to solve difficult numerical problems and uncertain polynomial combinatorial optimization problems [27,28]. An information theoretic technique was adopted to analyze the ruggedness of a continuous fitness landscape in [29,30]. Petalas et al. [27] proposed a memetic particle swarm optimization algorithm that exploits Shannon's information entropy for decision-making in swarm level, as well as a probabilistic decision-making scheme in particle level, for determining when and where local search is applied. As can be seen from the previous studies, entropy actually reflects the degree of chaos of system. erefore, entropy principle can be a promising method applied to evolutionary algorithm to ensure population diversity.

The Proposed ESM
In this section, we discuss the characteristics of the successful solution archive and archive-updating procedure and describe the ESM implementation.

Successful Solution Archive.
From Section 2, we can know that DE selects one vector from the trial vector and parent vector to survive to the next generation, and the survivor is known as the successful solution. As shown in Figure 1(a), in the two-dimensional solution space of a multimodal problem, the population is evenly distributed at the initial stage. As the iteration continues, from Figure 1(b) we can conclude that superior successful solutions (red points) can be generated in possible optimal regions. However, in Figure 1(c), the random and greedy selection operation makes the population gradually converge to a local optimum. Some successful solutions may be discarded without being fully exploited because of the greedy selection operation as the iteration progresses. An appropriate mechanism to utilize the successful solutions with high diversity contribution can make the algorithm reexplore the missed potential optimal solution. To some extent, it is also a remedy for the greedy selection operation.
Considering the above analysis, as depicted in Figure 2, an external archive is designed to store successful solutions generated during each iteration. When an individual is in a state of stagnation, the archive is activated for parent selection of mutation operation. For instance, the classic mutation strategy DE/rand/1 can be modified to where ar 1 , ar 2 , ar 3 ∈ A t ∧ar 1 ≠ ar 2 ≠ ar 3 , A t is the successful solution archive at iteration t. A simple but efficient method is adopted at this point, when the number of steps for the individual's fitness value to stop updating reaches a predetermined value Q, the individual is considered in a stagnant state [31].

Archive Updating Procedure.
As the successful solution archive has been established, the current problem is how to update the archive to keep its diversity and superiority in the whole evolution process. An archive containing diverse successful solutions can drag the individual out of stagnation without crippling the algorithm's performance.

Individual Diversity Measure.
To our best knowledge, in single-objective optimization, diversity is usually defined by the spatial distribution of the whole population [32], but there is few diversity contribution measures for a single Computational Intelligence and Neuroscience   individual to whole population. Wang et al. [33] proposed an Euclidean distance-based diversity metric for each individual, but a huge computational cost is needed to calculate the distance between every two individuals. To utilize diversity information while reducing the computational cost, we can estimate the diversity by fitness value distribution on individual level. Fitness distance can represent the spatial distance to some extent [34].
In multiobjective optimization, Deb et al. [35] proposed a crowding distance measure method to get an estimation of the density of solutions surrounding a particular solution in the population, by calculating the average distance of two points on either side of this point along each of the objectives. However, the distribution of this point is not considered, and the only average distance may not accurately reflect the crowding degree. With utilizing the characteristic of fitness distance, we can extend the crowding distance measure to single-objective optimization. For example, in Figure 3, the crowding distance of the solution A can be calculated as 3L + 1L � 4L (according to crowding distance calculation method in [35]), which is the same as the solution B (2L + 2L � 4L), whereas solution B obviously has a higher diversity contribution than solution A.

Archive Update
Procedure. From the above statement, this study proposes a crowding entropy diversity measurement in the fitness landscape at individual level. As shown in Figure 4, we sort the archive population in the ascending order of fitness during each iteration, then, a fitness distance estimation operator can be formulated as follows: where f i is the fitness value of the i th individual. Dl(i) and Du(i) can be further normalized, then, the crowding entropy of the i th individual can be defined as follows: where CE i ∈ [0, 1], the larger CE means that the individual distribution is more uniform, so it has a higher diversity contribution. According to Figure 3, we can further calculate CE A � 0.8 and CE B � 1, this is also consistent with their diversity contribution. Since the best and worst individuals locate in the boundary scope, there is only one neighbor around them, respectively. erefore, we set CE 1 � CE 2 and CE NP � CE NP−1 for the ESM to function normally. To keep the superiority of the archive, fitness value rank is further introduced: where R i represents the priority of the individual to be updated. An individual in the archive with a higher rank (i.e., far from the global optimum) or with lower entropy (i.e., located in a dense area) should have a higher probability of being updated.

e Implementation of the ESM.
Based on the ESM described above, we present the ESM updating procedure in Algorithm 1 and the complete implementation based on classic DE with mutation strategy DE/rand/1 in Algorithm 2. Different from the existing alternative for the selection of parents [33,36], the proposed mechanism is an adaptive approach that considers the feedback of the recently successful solutions so that the robustness of the DE algorithm has the potential to be enhanced by dynamically adapting promising parents for the situation of stagnation. Moreover, the proposed ESM is computationally efficient. e time complexities of updating operations of the archive are O(NP). Additionally, the selection of parents, counting of the consecutive unsuccessful updates, and stagnation detection do not increase the overall time complexity.
is benchmark set is widely used in the performance testing of evolutionary algorithms [9]. It is worth mentioning that the ESM does not change existing mechanisms in algorithms Fitness rank  Table 1. e common parameters of the incorporated algorithms are set as follows: e population size NP is 100, the stagnation tolerance Q is 120, and the maximum allowed number of function evaluations Max FEs � 10000 * D, where D is the problem dimension. Moreover, H represents the historical memory size associated with the adaptive control of scaling factors (F) and crossover rates (CR), LP is the learning period.
Limited by the article space, this study presents results at D � 30. For the fairness of comparison, each algorithm was implemented in MATLAB 2017b and executed over 51 independent runs on a Windows 10 64 bit desktop PC with 32 GB of RAM and a 3.0 GHz Intel Core i7-9700 processor.

Improving the Performance of Classic DEs and State-ofthe-Art DE Variants.
e following tables show the mean value (Mean) and standard deviation (Std.Dev) of the error (1) Input: successful solution X i , the archive A; (2) Output: the archive A; Calculate R i of A i by equations (6)-(12); (5) END FOR (6) Get the index j of the smallest R; (7) Replace the solution A j with X i ; ALGORITHM 1: e pseudo code of ESM updating procedure.
Step 2: Evolution Iteration (8) WHILE FEs < Max FEs (9) DO (10) Step 2.1: Mutation Operation (11) FOR i � 1 to NP (12) Randomly select Generate donor vectors V t i by (1); (15) ELSE (16) Generate donor vectors V t i by (5); (17) END (18) END FOR (19) Step 2.2: Crossover Operation (20) FOR i � 1 to NP (21) Generate trial vectors U t i by (3); (22) END FOR (23) Step 2.3: Selection and Archive Update (24) FOR END IF (31) Generate new vectors X t+1 i by (4); (32) END FOR (33) Step 2.4: Increase the Count (34) FEs � FEs + NP; (35) t � t + 1 (36) END WHILE ALGORITHM 2: e pseudo code of DE/rand/1 with the proposed ESM. 6 Computational Intelligence and Neuroscience representing the global optimum. e "+," "−," and "�" signs at each row of the table indicates that the ESM-based algorithm is, respectively, better than, equal to, or worse than the comparison algorithm. Tables 2-4 show the performance of three classic DEs. For DE/rand/1, which is widely recognized for its diversity maintaining capability and relative low search efficiency, the ESM significantly improves its performance on 24 test functions and only cause performance degradation on two test functions, as shown in Table 2. DE/best/1 is well known as its local exploitation capability and fast convergence speed, the ESM also makes progress on 19 functions, as presented in Table 3. Relatively, the performance of ESM-DE/best/1 decreases on five functions including F13, F16, F24, F27, and F29 and keeps the same as DE/best/1 on the rest six functions. DE/current − to − best/1 is more inclined to balance global exploration and local exploitation. From  Tables 5-9 show these variants and their competitors' performance. For ESM-j2020, as shown in Table 5, significant improvements are achieved on 23 functions. From Tables 6 to 9, we know that the number of functions with "+" decreases compared with the results shown in Tables 2-5. Meanwhile, from Tables 6 to  9, only a few functions show performance degradation after applying the ESM. We can therefore conclude that the ESM is still competitive even in recognized excellent DE variants.
We further conducted Wilcoxon signed-ranked test with a 0.05 significance [40] to statistically verify the effectiveness of the ESM. In Table 10, the p-value obtained by ESM-DE/rand/1, ESM-DE/best/1, ESM-DE/current − to − best/1, ESM-j2020, and ESM-BeSD are all less than 0.05, which indicates a significant improvement is achieved by the proposed ESM. It also can be found that ESM-EJADE, ESM-EBSHADE, and ESM-EBLSHADE still maintain their advantages in terms of the value of R+ and R-, which also reflects that the ESM achieves more promising results.
As mentioned before, the functions of CEC2017 can be divided into four categories: unimodal functions, simple multimodal functions, hybrid functions, and composition functions. To analyze whether the performance of the ESM is related to the type of function, the performance of the ESMbased algorithms for each type of function category are presented in Table 11. As shown in Table 11, the ESM performs best on hybrid functions. Following the adoption of the ESM, the algorithm has an 81.25% probability of finding a better solution but only a 16.25% probability of finding a worse solution on hybrid functions. Besides, the performance of the ESM is almost the same on simple multimodal functions and composition functions and relatively worst on unimodal functions. Table 11 demonstrates that if the optimization problem is multimodal function, hybrid function, or composition function, then the proposed selection mechanism can be considered in the DE algorithm.

Convergence Analysis.
To more clearly show the influence of the ESM on algorithm convergence and avoid diversion, four representative functions from CEC2017 are extracted, including F7, F16, F20, and F24, convergence curves for these functions with the three classic DEs and their relative ESM-based versions are presented. F7 is a simple multimodal function with a characteristic of convexity. From Figure 5(a), we can find that all six comparison algorithms have a similar convergence rate; this also reflects that the ESM does not cripple the convergence speed of the algorithm on F7. From Figures 5(b) to 5(d), it can be seen that when each algorithm adopts the proposed ESM, it converges to a better value than the original algorithm with faster convergence.
is is mainly because the algorithm can hardly escape when it suffers stagnation. e algorithm that uses the proposed ESM has a more excellent capability to keep search efficiency.

Population Diversity Analysis.
As the ESM adopts the entropy-based individual diversity measure, we further calculated the diversity of the main population and archive population on F7, F16, F20, and F24 for three classic DEs. It should be pointed out that we used the concept proposed in [41] to measure population diversity: As illustrated in Figure 6(a), the main population and archive population have an almost uniform diversity on F7. From Figures 6(b)to6(d), it can be concluded that the archive population can maintain a relatively higher diversity than the main population at the intermediate stage for DE/current − to − best/1 and DE/best/1. To sum up, the archive population can maintain relative diversity during the search process, which makes it possible for the algorithm to recover from stagnation. Computational Intelligence and Neuroscience

Parameter Sensitivity Analysis.
As illustrated in the previous sections, the ESM contains one parameter Q required adjustment. Q is a threshold which represents the stagnation steps. Table 12 shows the B-W values that represent the variation between better ("+") and worse ("−") number of the mean value after applying the ESM. It can be seen that the ESM-based algorithm performance is not so sensitive to the change of Q values over a finite range. For further comparison, A wider range of Q values and its corresponding B-W values are shown in Figure 7. In general, when the Q value is too large, the overall performance of the ESM begins to deteriorate. is is mainly because too large Q value reduces the probability of the ESM intervening in the algorithm; therefore, it is difficult for the ESM to function in limited iterations. In summary, considering the overall performance of the ESM, Q is set to 120.

Scalability Analysis.
e dimension of the test functions governs the difficulties on finding the global optimum. Higher dimensional functions are generally more difficult to    solve. To verify the relationship between the dimensionality and the performance of the proposed mechanism, we evaluate the average performance of eight ESM-based DE algorithms in CEC2017 benchmark set. Table 13 shows the results (+/�/−) of the considered algorithms at D � 10, 30, 50, and 100 on the test functions, for intuition, we convert the corresponding B-W values into Figure 8. We can find that the performance of the ESM-based algorithm fluctuates slightly but does not degrade significantly with the increase of problem dimension compared to the original algorithm. In Table 14, the Friedman test [40], a widely used nonparametric test in the EA community, is used to validate the performance of all algorithms based on the mean value. It is not difficult to see that the p-values from D � 10 to D � 100 calculated by the Friedman test are all less than 0.05. erefore, there are significant differences in the performance of the comparison algorithms on the corresponding dimensions, and ESM-EBLSHADE gets the first rank overall.

Conclusion and Future Work
In DE, making full use of the successful solutions generated in iterations is a meaningful work to improve algorithm performance. In this study, an external selection mechanism (ESM) is proposed to restore the searchability of the algorithm when an individual is in a state of stagnation. e ESM mainly contains a successful solution archive mechanism and a crowding entropy diversity control strategy. It can be easily integrated into the existing DE algorithms to improve its performance further. Experiments are conducted on the CEC2017 benchmark sets cooperated with three classic DEs and five state-of-theart DE variants. From experiment results, we can see that the ESM-based algorithm can significantly improve the original algorithm's solution accuracy; the ESM also does not increase the computational complexity of the original algorithm due to the introduction of entropy. Further, experiments also show that ESM has a fairly positive effect on multimodal function, hybrid function, and composition function, its scalability on different problem dimensions also has a certain universal.
Notably, experimental results proved that the ESM can effectively improve various DE variants' performance, more study is required. erefore, our future work will mainly focus on (1) improving the ESM to make it suitable for various single-objective DE; (2) researching the application of the proposed ESM in other evolutionary algorithms.
Data Availability e data that support the findings of this study are available from the corresponding author upon reasonable request.