Subspace Clustering Mutation Operator for Developing Convergent Differential Evolution Algorithm

Many researches have identified that differential evolution algorithm (DE) is one of the most powerful stochastic real-parameter algorithms for global optimization problems. However, a stagnation problem still exists in DE variants. In order to overcome the disadvantage, two improvement ideas have gradually appeared recently. One is to combine multiple mutation operators for balancing the exploration and exploitation ability. The other is to develop convergent DE variants in theory for decreasing the occurrence probability of the stagnation. Given that, this paper proposes a subspace clustering mutation operator, called SC qrtop. Five DE variants, which hold global convergence in probability, are then developed by combining the proposed operator and five mutation operators of DE, respectively. The SC qrtop randomly selects an elite individual as a perturbation’s center and employs the difference between two randomly generated boundary individuals as a perturbation’s step. Theoretical analyses and numerical simulations demonstrate that SC qrtop prefers to search in the orthogonal subspace centering on the elite individual. Experimental results on CEC2005 benchmark functions indicate that all five convergent DE variants with SC qrtop mutation outperform the corresponding DE algorithms.


Introduction
The classical optimization methods, frequently used in scientific application, consist of strategies based on Hessian Matrix [1] and based on Gradient [2].It can be probed that the solution obtained by using the classical methods is globally optimum [3].However, if the derivation of an objective function cannot be calculated, it gets difficult to search the optimal solution for classical optimization methods [4].So metaheuristic algorithms have been popularly used in the scientific application associated with solving nondifferentiable nonlinear-objective functions.The greatest interesting methods in metaheuristic algorithms include genetic algorithm (GA), particle swarm optimization algorithm (PSO), differential evolution algorithm (DE), artificial bee colony algorithm (ABC), and Cuckoo search (CK) algorithm.
Among those metaheuristic algorithms, DE has been identified as one of the most powerful optimizers.DE, proposed by Stron and Price in 1995 [5], is the only one which is able to still secure competitive ranking in optimization competitions of all IEEE International Conferences on Evolutionary Computation (CEC) [6][7][8] since 1996.The competitiveness of DE is also supported by many comparison researches [9][10][11][12].
However, the stagnation problem still exists in DE variants [13,14].In order to overcome the disadvantage, two kinds of ideas for improving DE algorithms have gradually appeared in the latest studies.One is to develop DE variants based on composite trial vector generation strategies.The other is to develop convergent DE variants in theory.

DE Variants Based on Composite Trial Vector Generation
Strategy.The classical mutation operators of DE algorithm prefer the exploration ability or the exploitation ability on some level, which easily results in a blind search over feasible region or insufficient diversity of a population.So the population is easy to be trapped in a stagnation when using single mutation operator to generate trial vector generation strategies.In order to solve this problem, a natural idea is to combine different learning strategies for the tradeoff between the exploration and exploitation ability.Wang et al. [10] proposed a composite DE, which generates trial vectors by combining three mutation operators, that is, DE/rand/1/bin, DE/rand/2/bin, and DE/current-to-rand/1. Rahnamayan et al. [15] proposed an opposition-based DE, which combines an opposition-based learning method and the classical mutation operators to generate trial vectors.Monamed et al. [16] proposed a directed mutation rule based on the weight difference vector between the best and the worst individuals and then developed an alternative DE by combining the directed mutation and the classical mutation strategies.

DE Variants Holding Convergence in Probability.
Taking into account that a convergent algorithm may have stronger robustness, so its probability of trapping in a stagnation can get smaller than those algorithms which cannot guarantee global convergence.With the progresses of the theoretical researches on DE, some convergent DE algorithms based on mathematical theory have been proposed.
In [17], Hu et al. proved that the classical DE cannot converge to the global optimal set with probability 1 and then proposed a convergent DE algorithm.In [18], Hu et al. summarized the theoretical works of DE on three aspects, that is, (1) theoretical researches on the timing complexity, (2) theoretical researches on the dynamical behavior of DE's population, and (3) researches on the convergence properties of DE.The paper then proved a sufficient condition for global convergence of DE and proposed a convergent DE algorithm framework.In [19], Hu et al. proposed a selfadaptive DE algorithm and then proved the algorithmic convergence by using the sufficient condition presented in [18].Ter Braak [20] proposed a differential evolution Markov chain algorithm (DE-MC) and proved that its population sequence is a unique joint stationary distribution.Zhao [21] presented a convergent DE using a hybrid optimization strategy and a transform function and proved its convergence by the Markov process.Zhan and Zhang [22] presented a DE-RW algorithm which applied a random-walk mechanism to the basic DE variants (the convergence of DE-RW was not proved, but it can be easily proved by Theorem 2 in Section 4 below).Li et al. [23] proposed a convergent DE algorithm by incorporating Gaussian mutation, a diversity-triggered reverse sampling into DE/rand/1/bin.
Based on the above two research lines, the main contributions of this paper can be summarized as follows.
(i) Firstly, this paper proposes a subspace clustering mutation operator, called SC qrtop, which randomly selects an elite individual as a perturbation's center and employs the difference between two randomly generated boundary individuals as a perturbation's step.Theoretical analyses and numerical simulations demonstrate that SC qrtop mutation prefers to search in the orthogonal subspace centering on the elite individual.
(ii) Secondly, this paper presents a convergent DE model by combining SC qrtop mutation and the classical mutation operators of DE and gives the theoretical proof of the algorithm convergence.
(iii) Finally, numerical experiments on CEC2005 benchmark functions validate that SC qrtop mutation operator has positive effect on the performance of all five classical mutation operators of DE.
The rest of this paper is structured as follows.Section 2 briefly introduces the basic DE algorithm.Section 3 presents and analyzes the subspace clustering mutation operator.Section 4 gives the DE variants based on the subspace clustering mutation and proves its convergence in theory.Numerical experiments on CEC2005 benchmark functions are then presented in Section 5. Section 6 discuses the theoretical significance of the proposed operator, followed by conclusion and future work in Section 7.

Classical Differential Evolution
DE is used for dealing with the continuous optimization problem.This paper supposes that the objective function to be minimized is ( ⃗ ), ⃗  = ( 1 , . . .,   ) ∈ R  , and the feasible solution space is [19,24,25] works through a simple cycle of operators including mutation, crossover, and selection operator after initialization.The classical DE procedures are described in detail as follows. . ., , where  = 0, 1, . . .,  max is the current generation and  max is the maximum number of generations.For the first generation ( = 0), the population should be sufficiently scaled to cover the optimization search space as much as possible.Initialization is implemented by using a uniformly sampling to generate the potential individuals in the optimization search space.We can initialize the th dimension of the th individual according to
If the element values of the donor vector ⃗ V  exceed the prespecified upper bound or lower bound, we can change the element values by the periodic mode rule as follows: Crossover Operator.Following mutation, the crossover operator is applied to further increase the diversity of the population.In crossover, a trial vector, ⃗ The pseudocode of the classical DE algorithm (DE/ rand/1) is illustrated in Pseudocode 1.

Subspace Clustering Mutation Operator
DE variants with global convergence are attracting more and more attention.A common convergent model of evolutionary algorithm (EA) is identified by two characteristics.One is that each population has the ergodic.The other is that the best solution of each generation will be reserved to the next generation.Since the greedy selection strategy of DE algorithm can reserve the best solution to the next generation, the ergodic of the population turns into the key problem for developing convergent DE variants.In addition, considering the balance of exploration and exploitation, we propose a subspace clustering mutation operator, called SC qrtop.It can be formulated as follows: SC qrtop: where ⃗   qrtop is an individual selected by randomly sampling from the top % of the th population.⃗   1 and ⃗   2 are two boundary individuals, each element of which is equal to the upper,   , or lower boundary value,   , with an equal probability.
The characteristics of SC qrtop can be summarized as follows.
(i) Employing SC qrtop mutation can make the population ergodic.In fact, SC qrtop mutation makes the probability of the donor individual locating in any small regions of the whole search space to be greater than 0. This paper calls the probability ergodic probability.
(ii) SC qrtop mutation can reproduce the ⃗   qrtop with a small probability, and the probability equals the ergodic probability.This characteristic benefits the balance of the exploration and exploitation ability on some level.
(iii) The individuals, which are generated by SC qrtop mutation, prefer to locate in the orthogonal subspace of the ⃗   qrtop .So the mutation operator improves the search capacity in the orthogonal subspace of the outstanding individuals.Given that, we call the operator subspace clustering mutation (short as SC qrtop).
(iv) The implementation of SC qrtop mutation is very simple.It is also easy to unite the SC qrtop mutation with the mutation operators of the classical DE.
The reasons that SC qrtop mutation has the above characteristics can be analyzed both theoretically and experimentally as the following three subsections, that is, probability analyses, statistical experiments, and implementation tips of SC qrtop mutation.Let   0 denote the event that an individual locates in a null space, and let   0 denotes the probability of the event   0 .

Probability Analysis of SC qrtop
(i) Supposing the Search Space Is One Dimensional.In this case  = 1, we establish a coordinate system with an origin ⃗  qrtop , and then the search space has two subspaces.One is a null space, the other is itself.As shown in Figure 1, the region A is the null space, which just includes a point ⃗  qrtop .That is to say, The B is a set including all points except for ⃗  qrtop That the SC donor vector locates in the subspace A means ⃗ V  = ⃗  qrtop .That is, the boundary individual ⃗   1 equals the other individual ⃗   2 .So we get In addition, from the above definition of    , the  1 1 is the probability of SC donor vector locating in the region [, ] except for the null space A. That is to say,  1  1 is the probability of the SC donor locating in the region B. So and we then get (ii) Supposing the Search Space Is Two Dimensional.In this case  = 2, a rectangular coordinate system with an origin ⃗  qrtop is established as shown in Figure 2. The regions C, D, E in Figure 2 can be represented as follows: Next, we analyze the probability of SC donor locating in each subspace., respectively.In the 3-dimensional space, " * , +, •, ×" are corresponding with the events  3 0 ,  3 1 ,  3 2 , and  3 3 , respectively.
Firstly, the probability of SC donor locating in the region C equals the concurrence probability of the independent events  1 0 on the  1 axis and  2 axis together.So Secondly, that SC donor locates in the region D means the concurrence of two events  1 0 on the  1 axis and  1 1 on the  2 axis, or the concurrence of two events  1 1 on the  1 axis and  1 0 on the  2 axis.So Similarly, we can get the probability of the SC donor locating in the region E (iii) Supposing the Search Space Is  Dimensional.According to the same procedure as the case  = 2, we can get the probabilities    , for  = 0, 1, . . ., , as follows: . . .

Statistical Experiment of SC qrtop Mutation.
The statistical experiments are conducted to give the distribution landscapes of sampling experiments associated with SC qrtop mutation.As shown in Figure 3, we show three distribution landscapes at the different cases, that is, (1) two dimensional space, one top individual, and 100 independent repetitions; (2) two-dimensional space, three top individuals, and 300 independent repetitions; and (3) three-dimensional space, one top individual, and 300 independent repetitions.The figure shows the clustering feature of subdonor points in the subspaces.Taking Figure 3(a) as an example, some subdonor points (marked by " * ") locate at the origin while some (marked by "+") locate on the two vertical axises, and the remainder (marked by "×") distribute uniformly in the two-dimensional search space.These three cases are corresponding with the occurrence of events, that is,  2 0 ,  2 1 , and  2 2 .As shown in Table 1, at 100 independent repetitions, the occurrence times are 28, 44, and 28 in order.The experimental proportions are close to the theoretical probabilities  2 0 ,  2 1 ,  2 2 , that is, 0.25, 0.50, 0.25.In Table 1, the "Pra sub" values of 2 , and  3 3 , respectively, which can be calculated as follows:

Implementation Tips of SC qrtop Mutation. It is easy to incorporate SC qrtop into other classical mutation operators.
Taking the DE/rand/1 mutation as an example, we increase the region of the random integer If  1 ≤ , then execute the classical DE/rand/1 or else execute SC qrtop mutation operator.That is to say, the modified algorithm will employ SC qrtop mutation operator with the probability % (considering the balance of exploration and exploitation, we suggest this probability is equal to the one generating the top individual ⃗  qrtop ).For other classical mutation operators, there is a random integer  1 we have to generate.So we can process the application of SC qrtop mutation operator in the same way.

Convergent DE Algorithm Based on Subspace Clustering Mutation
In this section, this paper proposes a convergent DE algorithm framework and proves its global convergence in probability.

Convergence Proof.
There are some different kinds of definitions of convergence for analyzing asymptotic convergence of random algorithms.A frequently used convergence definition, that is, convergence in probability, will be used in this paper.
Pseudocode 2: Pseudocode of CDE/SC qrtop rand/1.convenient theorem for proving the global convergence of DE variants is the one recently presented by Hu et al. [19].The theorem is based on the previous two.It just needs to check whether or not the probability, of the offspring in any subsequence population entering the optimum solution set, is big enough.The theorem can be described in detail as follows.
Theorem 2 (see [19]).Consider {(),  = 0, 1, 2, . ..} to be a population sequence of a DE variant with a greedy selection operator.In the  ℎ  target population (  ), there exists at least one individual ⃗ , which corresponds to the trial individual ⃗ , such that and the series ∑ ∞ =1 (  ) diverges; then the DE variant holds global convergence.
Where {  ,  = 1, 2, . ..} denotes any subsequence of natural number set, { ⃗  ∈  *  } denotes the probability that ⃗  belongs to the optimal solution set  *  , and (  ) is a small positive real which depends on   .
From Theorem 2, we can get that if the probability entering into the optimal set in a certain subsequence population is large enough, then the DE variant holds global convergence.
In fact, for each generation population of CDE/SC qrtop, the probability of each SC donor individual locating in any subspace can be calculated by the following expression: Let   min = min{   > 0,  = 1, 2, . . ., }; then whether or not the classical mutation operator generates an optimum, the probability that at least one donor individual ⃗ V locates in the optimal set  *  , can be estimated as follows: where (⋅) denotes the measure of a measurable set.If the crossover probability CR < 1, then So if  () takes we can get that ∑ ∞ =1 () diverges.Hence, we draw a conclusion that the CDE/SC qrtop algorithm holds global convergence.

Numerical Experiments
The main purpose of numerical experiments is to reveal that the proposed SC qrtop operator can enhance the search ability of all five classical mutation operators, that is, rand/1, best/1, current-to-best/1, best/2, and rand/2.So this paper compared CDE/SC qrtop algorithms and the corresponding classical DE algorithms with five classical mutations, respectively.The experiments were conducted on 25 test instances proposed in the CEC2005 special session on real-parameter optimization [28].These benchmark function set includes four classes: The number of decision variables, , was set to 10 for all the 25 test functions.The population size, , was set to be 60 for all the algorithms.The mutation factor, , was set to be 0.5 while the crossover probability, CR, was set to be 0.9.The probability using SC qrtop mutation was suggested to be 20%.For each algorithm and each test function, 25 independent runs were conducted with 150000 function evaluations (FEs) as the termination criterion.
Generally speaking, due to using the best solution of the current population, the DE variants with a mutation strategy based on the best solution, that is, DE/best/1, DE/cur-tobest/1, DE/best/2, have more powerful exploitation ability, while the other random mutation strategies make the DE variant possess more powerful exploration ability.Given that, this section will analyze the experimental results from the following two aspects, that is, the comparison on three mutation strategies based on the best solution and the comparison on the other two random mutation strategies.2, 3, and 4, respectively.The bottom right corner in every table summarized the statistical analyses of the experimental results.The priority of the comparison analyses is the best solution, the mean, and the standard deviation in turn.

Comparison on Three Mutation Strategies
From Tables 2, 3, and 4, the number of the benchmark functions, on which the three CDE/SC qrtop variants outperform the corresponding DE algorithms, is 14, 21, and 12 in turn.Meanwhile, the number of the benchmark functions, on which CDE/SC qrtop variants are below than the corresponding DE algorithms, is 3, 4, and 6 in turn.Figure 4 showed the evolution landscapes of the average error of the best function values on 25 runnings derived from all the six algorithms on the benchmark function 1-14.The results show that the SC qrtop mutation can greatly improve the search ability of the three mutation strategies based on the best solution.
achieve (or approach) the optimum solution, so the three CDE/SC qrtop variants just got the similar results with that achieved by the corresponding DE algorithms.5 and 6, respectively.As above, the bottom right corner in every table summarized the statistical analyses of the experimental results.The priority of the comparison analyses is the best solution, the mean, and the standard deviation in turn.

Comparison on the
As shown in Tables 5 and 6, the numbers of the benchmark functions, on which the two CDE/SC qrtop variants outperform the corresponding DE algorithms, are both 9.Meanwhile, the numbers of the benchmark functions, on which CDE/SC qrtop variants are below than the corresponding DE algorithms, are both 7.The results show that the SC qrtop mutation can weakly improve the search ability of the three mutation strategies based on the best solution.
In summary, the SC qrtop mutation can improve the search ability of all the classical mutation operators of DE.The improvement on the three mutation strategies based on the best solution is very significant while is small on the two random mutation strategies.The results also demonstrate that the SC qrtop mutation focuses on the exploration more than the exploitation.The experimental results support the theoretical conclusion that the CDE/SC qrtop algorithms can guarantee global convergence in probability.

Discussion
The previous theoretical analysis proved that the differential evolution algorithm incorporating the SC qrtop mutation operator holds convergence in probability.The numerical experiments showed that these convergent SC qrtop DE algorithms are significantly better than or at least comparable to the corresponding DE algorithms, respectively.
Generally, the populations of a convergent algorithm have more diversity, which can enhance the algorithmic exploration ability and make the algorithm hold stronger robustness.The DE-RW [22] uses a random-walk mechanism to enhance the population diversity until the individuals are ergodic, thereby making the algorithm hold global convergence in probability.Also, like the DE-RW algorithm, the convergent DE algorithm presented in the literature [23] utilizes a Gaussian mutation operator to enhance the algorithmic exploration ability.The researches of these convergent DE algorithms bring the DE field a significant step.However, the algorithmic performance depends on the balance between the exploration and exploitation ability.Just enhancing the exploration ability may decrease the convergence speed of a algorithm.Unlike the random-walk and the Guassian mutation operators, the proposed SC qrtop mutation operator takes account of the balance.As shown in Table 1, the occurrence probability of the event   0 always equals the occurrence probability of the event    .That is to say, the probability (  0 ) of reproducing elite individuals equals the probability (   ) of randomly sampling in the whole solution space.The reproduction of elite individuals benefits enhancing the exploitation ability; meanwhile, the randomly sampling in the whole solution space is conducive to enhancing the exploration ability.
In addition, the proposed SC qrtop mutation operator can be incorporated into any state-of-the-art DE algorithms, thereby developing convergent DE algorithms in theory.Due to the fact that the SC qrtop mutation operator takes account of the balance between the exploration and exploitation ability, the performance of the convergent algorithms based state-of-the-art DE is expected.
0.0000 + 00 0.0000 + 00 0.0000 + 00 0.0000 + 00 5.4443 − 12  [29], differential evolution using a neighborhood-based mutation operator [30], and so forth.A lot of numerical experiments have verified that these algorithms can get better performance for the majority of some benchmark problems.Following the study of this paper, we can incorporate the subspace clustering operator into these outstanding DE variants, and it is easy to prove that these incorporated algorithms can guarantee global convergence in probability.However, whether the subspace clustering operator can further enhance performance of these outstanding DE variants remains to be verified by numerical experiments.
(ii) Generalize the work to other similar evolutionary algorithms, such as particle swarm optimization (PSO), Cuckoo search algorithm (CK), and artificial bee colony algorithm (ABC).

.
The first step of DE is the initialization of a population of  -dimensional potential solutions (individuals) over the optimization search space.We will symbolize each individual by ), for  = 1, .

𝑔𝑖
, is generated by the binomial crossover, which combines the elements of the target vectors, ⃗    , and the donor vector, ⃗ V if rand (0, 1) ≤ CR or  =  rand   , otherwise,(8) where CR ∈ (0, 1) is the probability of crossover and  rand is a random integer on[1, ].Selection Operator.Finally, the selection operator is employed to maintain the most promising trial individuals in the next generation.The classical DE adopts a simple selection scheme.It compares the objective value of the target ⃗    with that of the trial individual ⃗    .If the trial individual reduces the value of the objective function then it is accepted for the next generation, otherwise the target individual is retained in the population.The selection operator is defined as

Figure 1 :
Figure 1: Subspace sketch on 1 dimension."L, U" denote the lower and upper boundary values, respectively.

Figure 2 :
Figure 2: Subspace sketch on 2 dimensions."  ,   ", for  = 1 or 2, denote the lower and upper boundary values of the th dimension, respectively.
Three dimensions, 300 points, and 1 top individual

Figure 4 :
Figure 4: Evolution figures of the average error of the best function values on 25 running derived from CDE/SC qrtop best/1, CDE/SC qrtop cur best/1, and CDE/SC qrtop best/2 and three corresponding DE algorithms.

Table 1 :
Statistical analysis of Figure3. top" denotes the number of top individuals; "S T, O T" denote the sampling times and occurrence times, respectively."Pro sub" denotes the proportion of individuals locating in each subspace in experiments; "Pra sub" denotes the probability of individuals locating in each subspace in theory. "

Table 6 :
Continued.denote that the results of CDE/SC qrtop are "better, " "approximate, " and "worse" than the corresponding DE, respectively.abilityand exploitation ability of DE variants.Taking into account that a convergent algorithm in theory has stronger robustness, this paper proposed a subspace clustering mutation operator for DE variants.By compositing the proposed mutation with the classical mutation operators, this paper developed five convergent DE variants.The experimental results on CEC2005 benchmark functions indicated that all five convergent DE variants with the subspace clustering mutation operator outperform the corresponding DE algorithms and also indicated that the effectiveness, of combining the subspace clustering mutation operator and any one of three mutation strategies based on the best solution (i.e., DE/best/1, DE/current-to-best/2, and DE/best/2), is more significant than those of combining the subspace clustering mutation and the other two random mutation strategies (i.e., DE/rand/1, DE/rand/2).Two possible directions for future work can be summarized as below.