Differential Evolution with Novel Mutation and Adaptive Crossover Strategies for Solving Large Scale Global Optimization Problems

This paper presents Differential Evolution algorithm for solving high-dimensional optimization problems over continuous space. The proposed algorithm, namely, ANDE, introduces a new triangular mutation rule based on the convex combination vector of the triplet defined by the three randomly chosen vectors and the difference vectors between the best, better, and the worst individuals among the three randomly selected vectors. The mutation rule is combined with the basic mutation strategy DE/rand/1/bin, where the new triangularmutation rule is applied with the probability of 2/3 since it has both exploration ability and exploitation tendency. Furthermore, we propose a novel self-adaptive scheme for gradual change of the values of the crossover rate that can excellently benefit from the past experience of the individuals in the search space during evolution process which in turn can considerably balance the common trade-off between the population diversity and convergence speed.Theproposed algorithmhas been evaluated on the 20 standard high-dimensional benchmark numerical optimization problems for the IEEE CEC-2010 Special Session and Competition on Large Scale Global Optimization. The comparison results between ANDE and its versions and the other seven state-of-the-art evolutionary algorithms that were all tested on this test suite indicate that the proposed algorithm and its two versions are highly competitive algorithms for solving large scale global optimization problems.


Introduction
In general, global numerical optimization problem can be expressed as follows (without loss of generality minimization problem is considered here): where  is the objective function, ⃗  is the decision vector ∈ R  space consisting of  variables,  is the problem dimension, that is, the number of variables to be optimized, and    and    are the lower and upper bounds for each decision variable, respectively.
The optimization of the large scale problems of this kind (i.e.,  = 1000) is considered a challenging task since the solution space of a problem often increases exponentially with the problem dimension and the characteristics of a problem may change with the scale [1].Generally speaking, there are different types of real-world large scale global optimization (LSGO) problems in engineering, manufacturing, and economy applications (biocomputing, data or web mining, scheduling, vehicle routing, etc.).In order to draw more attention to this challenge of optimization, the first competition on LSGO was held in CEC 2008 [2].Consequently, In the recent few years, LSGO has gained considerable attention and has attracted much interest from Operations Research and Computer Science professionals, researchers, and practitioners as well as mathematicians and engineers.Therefore, the challenges mentioned above have motivated the researchers to design and improve many kinds of efficient, effective, and robust various kinds of metaheuristics algorithms that can solve (LSGO) problems with high quality solution and high convergence performance with low computational cost.Evolutionary algorithms (EAs) have been proposed to meet the global optimization challenges.The structure of EAs has been inspired from the mechanisms of natural evolution.Due to their adaptability and robustness, EAs are especially capable of solving difficult optimization problems, such as highly nonlinear, nonconvex, nondifferentiable, and multimodal optimization problems.Generally, the process of EAs is based on the exploration and the exploitation of the search space through selection and reproduction operators [3].Similar to other evolutionary algorithms (EAs), Differential Evolution (DE) is a stochastic population-based search method, proposed by Storn and Price [4].The advantages are its simplicity of implementation, ease of use, speed, and robustness.Due to these advantages, it has been successfully applied for solving many real-world applications, like admission capacity planning in higher education [5,6], financial markets dynamic modeling [7], solar energy [8], and many others.In addition, many recent studies prove that the performance of DE is highly competitive with and in many cases superior to other EAs in solving unconstrained optimization problems, constrained optimization problems, multiobjective optimization problems, and other complex optimization problems [9].However, DE has many weaknesses as all other evolutionary search techniques.Generally, DE has a good global exploration ability that can reach the region of global optimum, but it is slow at exploitation of the solution [10].Additionally, the parameters of DE are problem dependent and it is difficult to adjust them for different problems.Moreover, DE performance decreases as search space dimensionality increases [11].Finally, the performance of DE deteriorates significantly when the problems of premature convergence and/or stagnation occur [11,12].The performance of DE basically depends on the mutation strategy and the crossover operator.Besides, the intrinsic control parameters (population size NP, scaling factor , and the crossover rate CR) play a vital role in balancing the diversity of population and convergence speed of the algorithm.For the original DE, these parameters are user-defined and kept fixed during the run.However, many recent studies indicate that the performance of DE is highly affected by the parameter setting and the choice of the optimal values of parameters is always problem dependent.Moreover, prior to an actual optimization process, the traditional timeconsuming trial-and-error method is used for fine-tuning the control parameters for each problem.Alternatively, in order to achieve acceptable results even for the same problem, different parameter settings along with different mutation schemes at different stages of evolution are needed.Therefore, some techniques have been designed to adjust control parameters in adaptive or self-adaptive manner instead of trial-anderror procedure plus new mutation rules have been developed to improve the search capability of DE [13][14][15][16][17][18][19][20][21][22].Based on the above considerations, in this paper, we present a novel DE, referred to as ANDE, including two novel modifications: triangular mutation rule and self-adaptive scheme for gradual change of CR values.In ANDE, a novel triangular mutation rule can balance the global exploration ability and the local exploitation tendency and enhance the convergence rate of the algorithm.Furthermore, a novel adaptation scheme for CR is developed that can benefit from the past experience of the individuals in the search space during evolution process.Scaling factors are produced according to a uniform distribution to balance the global exploration and local exploitation during the evolution process.ANDE has been tested on 20 benchmark test functions developed for the 2010 IEEE Congress on Evolutionary Computation (IEEE CEC 2010) [1].The experimental results indicate that the proposed algorithm and its two versions are highly competitive algorithms for solving large scale global optimization problems.The remainder of this paper is organized as follows.Section 2 briefly introduces DE and its operators.Section 3 reviews the related work.Then, ANDE is presented in Section 4. The experimental results are given in Section 5. Section 6 discusses the effectiveness of the proposed modifications.Finally, the conclusions and future works are drawn in Section 7.

Differential Evolution (DE)
This section provides a brief summary of the basic Differential Evolution (DE) algorithm.In simple DE, generally known as DE/rand/1/bin [4,23], an initial random population consists of NP vectors ⃗   , ∀ = 1, 2, . . ., NP, and is randomly generated according to a uniform distribution within the lower and upper boundaries (   ,    ).After initialization, these individuals are evolved by DE operators (mutation and crossover) to generate a trial vector.A comparison between the parent and its trial vector is then done to select the vector which should survive to the next generation [9].DE steps are discussed below.

Initialization.
In order to establish a starting point for the optimization process, an initial population must be created.Typically, each decision variable in every vector of the initial population is assigned a randomly chosen value from the boundary constraints: where rand  denotes a uniformly distributed number between [0, 1], generating a new value for each decision variable.

Mutation.
At generation , for each target vector    , a mutant vector V +1  is generated according to the following: with randomly chosen indices  1 ,  2 ,  3 ∈ {1, 2, . . ., NP}.  is a real number to control the amplification of the difference vector (   2 −    3 ).According to Price et al. [24], the range of In this work, if a component of a mutant vector violates search space, the value of this component is generated newly using (2).

Crossover.
There are two main crossover types, binomial and exponential.In the binomial crossover, the target vector is mixed with the mutated vector, using the following scheme, to yield the trial vector  +1
where  = .Otherwise, the old vector    is retained.The selection scheme is as follows (for a minimization problem): A detailed description of standard DE algorithm is given in Algorithm 1.

Related Work
As previously mentioned, during the past few years, LSGO has attracted much attention by the researches due to its significance as many real-world problems and applications are high-dimensional problems in nature.the local exploitation stage.The objective of the first stage is to shrink the searching scope to the promising area as quickly as possible by using an EDA based on mixed Gaussian and Cauchy models (MUEDA) [29], while, to achieve the second objective, CC-based algorithm, different from the previous CC-based methods, is adopted to explore the limited area extensively to find solution as better as possible.Compared with some previous LSGO algorithms, EOEA demonstrates better performance.Many CC-based algorithms have been developed during the past decade such as FEPCC [30], DECC-I, DECC-II [31], MLCC [32], DEwSaCC [33], and CPSO [34].On the other hand, there are many other approaches that optimize LSGO problems as a whole; that is, no divide-and-conquer methods was used.Actually, it is considered a challenging task as it needs to develop novel evolutionary operators that can promote and strengthen the capability of the algorithms to improve the overall optimization process in high-dimensional search space.In [35] [39] proposed a hybrid approach called (DMS-PSO-SHS) by combining the dynamic multiswarm particle swarm optimizer (DMS-PSO) with a subregional harmony search (SHS).A modified multitrajectory search (MTS) algorithm is also applied frequently on several selected solutions.In addition, an external memory of selected past solutions is used to enhance the diversity of the swarm.Generally, the proposed ANDE algorithm belongs to this category.

ANDE Algorithm
In this section, we outline a novel DE algorithm, ANDE, and explain the steps of the algorithm in detail.

Triangular Mutation Scheme.
Storn and Price [4,24] proposed the basic mutation scheme DE/rand/1.In fact, from the literature [9], it is considered as the most successful and widely used operator.Virtually, the main idea behind this strategy is that the three vectors are randomly selected from the population to form the mutation and the base vector is then chosen at random among the three.The difference vector is formed using the other two vectors added to the base vector.Obviously, it can be seen that the basic mutation scheme has excellent ability to maintain population diversity and global search capability as it is not directed to any specific search direction.However, the convergence speed of DE algorithms significantly slows down [15].In the same context, DE/rand/2 strategy, which is similar to the basic scheme with another difference vector that is formed by extra two vectors, has better perturbation than original mutation with one difference vector [15].Consequently, it is better than the The proposed mutation strategy is based on the convex combination vector of the triplet defined by the three randomly chosen vectors and three difference vectors between the tournament best, better, and worst selected vectors.The proposed mutation vector is generated in the following manner: where    is a convex combination vector of the triangle and 1, 2, and 3 are the mutation factors that are associated with   and are independently generated according to uniform distribution in (0, 1) and   best ,   better , and   worst are the three tournament best, better, and worst randomly selected vectors, respectively.The convex combination vector    of the triangle is a linear combination of the three randomly selected vectors and is defined as follows: where the real weights   satisfy   ≥ 0 and ∑ 3 =1   = 1, where the real weights   are given by   =   / ∑ 3 =1   ,  = 1, 2, 3, where  1 = 1,  2 = rand(0.75,1), and  3 = rand(0.5,(2)), rand(, ) is a function that returns a real number between  and , where  and  are not included.For unconstrained optimization problems at any generation  > 1, for each target vector, three vectors are randomly selected; then sort them in ascending according to their objective function values and assign  1 ,  2 , and  3 to   best ,   better , and   worst , respectively.Without loss of generality, we only consider minimization problem.
Obviously, from mutation equation ( 6), it can be observed that the incorporation of the objective function value in the mutation scheme has two benefits.Firstly, the perturbation part of the mutation is formed by the three sides of the triangle in the direction of the best vector among the three randomly selected vectors.Therefore, the directed perturbations in the proposed mutation resembles the concept of gradient as the difference vectors are directed from the worst to the better to the best vectors [41].Thus, it is considerably used to explore the landscape of the objective function being optimized in different subregion around the best vectors within search space through optimization process.Secondly, the convex combination vector    is a weighted sum of the three randomly selected vectors where the best vector has the significant contribution.Therefore,    is extremely affected and biased by the best vector more than the remaining two vectors.Consequently, the global solution can be easily reached if all vectors follow the direction of the best vectors; besides they also follow the opposite direction of the worst vectors among the randomly selected vectors.Indeed, the new mutation process exploits the nearby region of each    in the direction of (  best −   worst ) for each mutated vector.In a nutshell, it concentrates the exploitation of some subregions of the search space.Thus, it has better local search tendency so it accelerates the convergence speed of the proposed algorithm.Besides, the global exploration ability of the algorithm is significantly enhanced by forming many different sizes and shapes of triangles in the feasible region through the optimization process.Thus, the proposed directed mutation balances both global exploration capability and local exploitation tendency.
Thus, since the proposed directed mutation balances both global exploration capability and local exploitation tendency while the basic mutation favors exploration only, the probability of using the proposed mutation is twice as much the probability of applying the basic rule.The new mutation strategy is embedded into the DE algorithm and combined with the basic mutation strategy DE/rand/1/bin as follows.
If ((0, 1) ≤ (2/3)) then Else where  is a uniform random variable in [(−1, 0) ∪ (0, 1)], (0, 1) returns a real number between 0 and 1 with uniform random probability distribution.From the abovementioned scheme, it can be realized that for each vector only one of the two strategies is used for generating the current trial vector, depending on a uniformly distributed random value within the range (0, 1).For each vector, if the random value is greater than 2/3, then the basic mutation is applied.Otherwise, the proposed one is performed.It is noteworthy mentioning that the proposed triangular mutation and the trigonometric mutation proposed by Fan and Lampinen [23] use three randomly selected vectors but they are completely different in the following two main points.
(1) The proposed mutation strategy is based on the convex combination vector (weighted mean) of the triplet defined by the three randomly chosen vectors (as a donor) and three difference vectors between the tournament best, better, and worst selected vectors (they are directed difference, i.e., resembling the concept of gradient as the difference vectors are directed from the worst to the better to the best vectors).However, the trigonometric mutation is based on the center point (the mean) of the hypergeometric triangle defined by the three randomly chosen vectors.The perturbation to be imposed to the donor is then made up with a sum of three weighted vector differentials that are randomly constructed (not directed).
(2) With respect to the scaling factors in the proposed algorithm, at each generation , the scale factors 1, 2, and 3 of each individual target vector are independently generated according to uniform distribution in (0, 1) to enrich the search behavior.However, the scaling factors in the trigonometric mutation are constants; at each generation , the scale factors of each individual target vector are computed as the ratio of the objective function value of each vector divided by the sum of the objective function values of the three vectors (sum equals 1).Therefore, it is obviously deduced that the trigonometric mutation operation is a rather greedy operator since it biases the new trial solution strongly in the direction where the best one of three individuals chosen for the mutation is.Therefore, the trigonometric mutation can be viewed as a local search operator and the perturbed individuals are produced only within a trigonometric region that is defined by the triangle used for a mutation operation [23].Consequently, it is easily trapped in local points with multimodal problems and it may be also stagnated as it has not an exploration capability to seek the whole search space.However, the proposed triangular mutation has both the exploration capability and the exploitation tendency because directed perturbation in the proposed mutation resembles the concept of gradient as the difference vectors are directed from the worst to the better to the best vectors.Thus, it is considerably used to explore the landscape of the objective function being optimized in different subregion around the best vectors outside the trigonometric region that is defined by the triangle used for a mutation operation within search space through optimization process.Secondly, the convex combination vector    is a weighted sum of the three randomly selected vectors where the best vector has the significant contribution.Therefore,    is extremely affected and biased by the best vector more than the remaining two vectors.Consequently, the global solution can be easily reached if all vectors follow the direction of the best vectors; besides they also follow the opposite direction of the worst vectors among the randomly selected vectors.Indeed, the new mutation process exploits the nearby region of each    in the direction of (  best −  worst ) for each mutated vector.In a nutshell, it concentrates the exploitation of some subregions of the search space.Thus, it has better local search tendency so it accelerates the convergence speed of the proposed algorithm.Besides, the global exploration ability of the algorithm is significantly enhanced by forming many different sizes and shapes of triangles in the feasible region through the optimization process.

Parameter Adaptation Schemes in ANDE.
The successful performance of DE algorithm is significantly dependent upon the choice of its three control parameters: the scaling factor  and crossover rate CR and population size NP [4,24].In fact, they have a vital role because they greatly influence the effectiveness, efficiency, and robustness of the algorithm.Furthermore, it is difficult to determine the optimal values of the control parameters for a variety of problems with different characteristics at different stages of evolution.In the proposed ANDE algorithm, NP is kept as a user-specified parameter since it highly depends on the problem complexity.Generally speaking,  is an important parameter that controls the evolving rate of the population; that is, it is closely related to the convergence speed [15].A small  value encourages the exploitation tendency of the algorithm that makes the search focus on neighborhood of the current solutions; hence it can enhance the convergence speed.However, it may also lead to premature convergence [41].On the other hand, a large  value improves the exploration capability of the algorithm that can make the mutant vectors distributed widely in the search space and can increase the diversity of the population [40].However, it may slow down the search [41].With respect to the scaling factors in the proposed algorithm, at each generation , the scale factors 1, 2, and 3 of each individual target vector are independently generated according to uniform distribution in (0, 1) to enrich the search behavior.The constant crossover (CR) reflects the probability with which the trial individual inherits the actual individual's genes, that is, which and how many components are mutated in each element of the current population [17,41].The constant crossover CR practically controls the diversity of the population [40].As a matter of fact, if CR is high, this will increase the population diversity.Nevertheless, the stability of the algorithm may reduce.On the other hand, small values of CR increase the possibility of stagnation that may weaken the exploration ability of the algorithm to open up new search space.Additionally, CR is usually more sensitive to problems with different characteristics such as unimodality and multimodality, separable and nonseparable problems.For separable problems, CR from the range (0, 0.2) is the best while for multimodal parameter dependent problems CR in the range (0.9, 1) is suitable [42].On the other hand, there are wide varieties of approaches for adapting or self-adapting control parameters values through optimization process.Most of these methods are based on generating random values from uniform, normal, or Cauchy distributions or by generating different values from predefined parameter candidate pool.Besides, they use the previous experience (of generating better solutions) to guide the adaptation of these parameters [11, 15-17, 19, 40, 43-46].The presented work proposed a novel self-adaptation scheme for CR.
The core idea of the proposed self-adaptation scheme for the crossover rate CR is based on the following fundamental principle.In the initial stage of the search process, the difference among individual vectors is large because the vectors in the population are completely dispersed or the population diversity is large due to the random distribution of the individuals in the search space that requires a relatively smaller crossover value.Then, as the population evolves through generations, the diversity of the population decreases as the vectors in the population are clustered because each individual gets closer to the best vector found so far.Consequently, in order to maintain the population diversity and improve the convergence speed, crossover should be gradually utilized with larger values along with the generations of evolution increased to preserve well genes in so far as possible and promote the convergence performance.Therefore, the population diversity can be greatly enhanced through generations.However, there is no appropriate CR value that balance both the diversity and convergence speed when solving a given problem during overall optimization process.Consequently, to address this problem and following the SaDE algorithm [15], in this paper, a novel adaptation scheme for CR is developed that can benefit from the past experience through generations of evolutionary.
Crossover Rate Adaptation.At each generation , the crossover probability CR  of each individual target vector is independently generated randomly from pool  according to uniform distribution and Procedure 1 exists through generations.
In Procedure 1,  is the pool of values of crossover rate CR that changes during and after the learning period LP; we set LP = 10% of GEN,  is the current generation number; GEN is the maximum number of generations.The lower and upper limits of the ranges for () are experimentally determined; CR Flag List[] is the list that contains one of two binary values (0, 1) for each individual  through generation , where 0 represents failure, no improvement, when the target vector is better than the trial vector during and after the learning period and 1 represents success, improvement, when the trial vector is better than the target vector during and after the learning period, the failure counter list[] is the list that monitors the working of individuals in terms of fitness function value during generations after completion of the learning period, and if there is no improvement in fitness, then the failure counter of this target vector is increased by unity.This process is repeated until it achieves prespecified value of Max failure counter which assigned a value 20 that is experimentally determined; CR Ratio List[] is the list that records the relative change improvement ratios between the trial and target objective function values with respect to each value  of the pool of values  of CR through generation .It can be clearly seen from procedure 1 that, at  = 1, CR = 0.05 for each target vector and then, at each generation , if the generated trial vector produced is better than the target vector, the relative change improvement ratio (RCIR) associated with this CR value is computed and the correspondence ratio is updated.On the other hand, during the learning period, if the target vector is better than the trial vector, then the CR value is chosen randomly from the associated pool  of CR values, that is, gradually added more values, according to generation number and hence, for this CR value, there is no improvement and its ratio remains unchanged.However, after termination of the learning period, if the target vector is better than the trial vector, that is, if there is no improvement in fitness, then the failure counter is increased by one in each generation till this value achieves a prespecified value of Max failure counter which assigned a value 20; then this CR value should change to a new value that is randomly selected from the pool  of CR values that is taken in range 0.1-0.9 in steps of 0.1 and 0.05 and 0.95 are also included as lower and upper values, respectively.Note that the RCIR is only updated if there is an improvement.Otherwise, it remains constant.Thus, the CR value with maximum ratio is continuously changing according to the evolution process at each subsequent generation.In fact, although all test problems included in this study have optimum of zero, the absolute value is used in calculating RCIR as a general rule in order to deal with positive, negative, or mixed values of objective function.
Concretely, Procedure 1 shows that, during the first half of the learning period, the construction of pool  of CR values ensures the diversity of the population such that the crossover probability for th individual target increases gradually in staircase along with the generations of evolution process increased, taking into consideration that the probability of chosen small CR values is greater than the probability of chosen larger CR values as the diversity of the population is still large.Additionally, in the second half of the learning period, larger values of 0.9 and 0.95 are added to the pool as it favors nonseparable functions.However, all the values have an equally likely chance of occurrence to keep on the diversity with different values of CR.Consequently, the successful CR values with high relative change improvement ratio in this period will survive to be used in the next generations of the optimization process until it fails to achieve improvement for a specific value of 20; then it must be changed randomly by a new value.Thus, the value of CR is adaptively changed as the diversity of the population changes through generations.Distinctly, it varies from one individual to another during generations, and also it is different from one function to another one being optimized.Generally, adaptive control parameters with different values during the optimization process in successive generations enrich the algorithm with controlled-randomness which enhances the global optimization performance of the algorithm in terms of exploration and exploitation capabilities.Therefore, it can be concluded that the proposed novel adaptation scheme for gradual change of the values of the crossover rate can excellently benefit from the past experience of the individuals in the search space during evolution process which in turn can considerably balance the common trade-off between the population diversity and If ((CR Flag List[] = 0) and ( <= LP)), If the target vector is better than the trial vector during the learning period, then: It is worth noting that although this work is an extension and modification of our previous work in [47], there are significant differences as follows: (1) Previous work in [47] is proposed for solving small scale unconstrained problems (i.e., 10, 30, and 50 dimensions), whereas this work is proposed for solving large scale unconstrained problems with 1000 dimensions.(2) The crossover rate in [47] is given by a dynamic nonlinear increased probability scheme, but in this work a novel self-adaptation scheme for CR is developed that can benefit from the past experience through generations of evolutionary.(3) In [47], only one difference vector between the best and the worst individuals among the three randomly selected vectors with one scaling factor, uniformly random number in (0, 1), is used in the mutation, but in this work three difference vectors between the tournament best, better, and worst selected vectors with corresponding three scaling factors, which are independently generated according to uniform distribution in (0, 1), are used in the mutation scheme.(4) The triangular mutation rule is only used in this work, but in the previous work [47], the triangular mutation strategy is embedded into the DE algorithm and combined with the basic mutation strategy DE/rand/1/bin through a nonlinear decreasing probability rule.(5) In previous work [47] a restart mechanism based on Random Mutation and modified BGA mutation is used to avoid stagnation or premature convergence, whereas this work does not.

Benchmark Functions.
The performance of the proposed ANDE algorithm has been tasted on 20 scalable optimization functions for the CEC 2010 special session and competition on large scale in global optimization.A detailed description of these test functions can be found in [1].These 20 test functions can be divided into four classes: (1) Separable functions  1 - 3 (2) Partially separable functions, in which a small number of variables are dependent while all the remaining ones are independent ( = 50)  4 - 8 (3) Partially separable functions that consist of multiple independent subcomponents, each of which is nonseparable ( = 50)  9 - 18 (4) Fully nonseparable functions  19 - 20 , where the sphere function, the rotated elliptic function, Schwefels Problem 1.2, Rosenbrock function, the rotated Rastrigin function, and the rotated Ackley function are used as the basic functions.The control parameter used to define the degree of separability of a given function in the given test suite is set as  = 50.The dimensions () of functions are 1000.

Parameter Settings and Involved Algorithms.
To evaluate the performance of algorithm, experiments were conducted on the test suite.We adopt the solution error measure (() − ( * )), where  is the best solution obtained by algorithms in one run and  * is well-known global optimum of each benchmark function and is recorded after 1.2 + 05, 6.0 + 05, and 3.0 + 06 function evaluations (FEs), all experiments for each function run 25 times independently, and statistical results are provided including the best, median, mean, worst results and the standard deviation.The population size in ANDE was set to 50.The learning period (LP) and the maximum  rand = randint(1, ) (11) Compute the (crossover rate) Cr  according to Procedure 1. ( 12) For  = 1 to  Do (13) If (rand  [0, 1] < CR or  =  rand ) Then ( 14) End If (18) End For (19) If The relative change improvement ratio (RCIR) is updated failure counter (MFC) are set to 10% of total generations and 20 generations, respectively.For separable functions  1 - 3 , CR is chosen to be 0.05 as they are separable functions.ANDE was compared to population-based algorithms that were all tested on this test suite in this competition.These algorithms are (i) DE Enhanced by Neighborhood Search for Large Scale Global Optimization (SDENS) [37], (ii) Large Scale Global Optimization using Self-Adaptive Differential Evolution algorithm (jDElsgo) [36], (iii) Cooperative Coevolution with Delta Grouping for Large Scale Nonseparable Function Optimization (DECC-DML) [27], (iv) the Differential Ant-Stigmergy Algorithm for Large Scale Global Optimization (DASA) [35], (v) two-stage based Ensemble Optimization for Large Scale Global Optimization (EOEA) [28], (vi) Memetic Algorithm Based on Local Search Chains for Large Scale Continuous Global Optimization (MA-SW-CHAINS) [38], (vii) Dynamic Multiswarm Particle Swarm Optimizer with Subregional Harmony Search (DMS-PSO-SHS) [39].
Note that since paper [48] was not accepted (based on private communication with the author) so it was excluded from this comparison.To compare the solution quality from a statistical angle of different algorithms and to check the behavior of the stochastic algorithms [38,49], the results are compared using multiproblem Wilcoxon signed-rank test at a significance level 0.05.Wilcoxon signed-rank Test is a nonparametric statistical test that allow us to judge the difference between paired scores when it cannot make the assumption required by the paired-samples -test; for example, the population should be normally distributed, where  + denotes the sum of ranks for the test problems in which the first algorithm performs better than the second algorithm (in the first column), and  − represents the sum of ranks for the test problems in which the first algorithm performs worse than the second algorithm (in the first column).Larger ranks indicate larger performance discrepancy.The numbers in better, equal, and worse columns denote the number of problems in which the first algorithm is better than, equal to, or worse than the second algorithm.As a null hypothesis, it is assumed that there is no significance difference between the mean results of the two samples.Whereas the alternative hypothesis is that there is significance in the mean results of the two samples, the number of test problems  = 20 for 1.25 + 05, 6.00 + 05, and 3.00 + 006 function evaluations and 5% significance level.Use the smaller values of the sums as the test value and compare it with the critical value or use the  value and compare it with the significance level.Reject the null hypothesis if the test value is less than or equal to the critical value or if the  value is less than or equal to the significance level (5%).Based on the result of the test, one of three signs (+, −, and ≈) is assigned for the comparison of any two algorithms (shown in the last column), where (+) sign means the first algorithm is significantly better than the second, (−) sign means the first algorithm is significantly worse than the second, and (≈) sign means that there is no significant difference between the two algorithms.To perform comprehensive evaluation, the presentation of the experimental results is divided into three subsections.At first, the effectiveness of the proposed self-adaptive crossover rate scheme, modified basic differential evolution, and new triangular mutation scheme are evaluated.Second, overall performance comparisons between ANDE, ANDE-1, and ANDE-2 and other state-of-the-art DEs and non-DEs approaches are provided.Finally, the effectiveness of the proposed modifications and parameter settings on the performance of ANDE algorithm is discussed.

Experimental Results and Discussions.
Firstly, some trials have been performed to evaluate the benefits and effectiveness of the proposed new triangular mutation and selfadaptive crossover rate on the performance of ANDE.Two different versions of ANDE have been tested and compared against the proposed one denoted as ANDE-1 and ANDE-2.
(1) ANDE-1, which is same as ANDE except that the new triangular mutation scheme is only used (2) ANDE-2, which is same as ANDE except that the basic mutation scheme is only used The solution error of ANDE and the two variants ANDE-1 and ANDE-2 algorithms are recorded in 1.2 + 05, 6.0 + 05, and 3.0 + 06 function evaluations (FEs); all experiments for each function run 25 times independently and statistical results are provided including the best, median, mean, and worst results and the standard deviation in the supplemental file (Tables S1-S3 in Supplementary Material available online at https://doi.org/10.1155/2017/7974218).
In this section, we compare directly the mean results obtained by ANDE, ANDE-1, and ANDE-2.
Tables S1, S2, and S3 contain the results obtained by ANDE, ANDE-1, and ANDE-2 in 1.2 + 05, 6.0 + 05, and 3.0+06 function evaluations (FEs), respectively.For remarking the best algorithm, best median and mean for each function are highlighted in boldface.The following characteristics can be clearly observed: (i) In the majority of the functions, the differences between mean and median are small even in the cases when the final results are far away from the optimum, regardless of the number of function evaluations.That implies the ANDE and its versions are robust algorithms.(ii) In many functions, results with FEs = 3.0 + 06 are significantly better than with FEs = 6.0 + 05 and also results with FEs = 6.0 + 05 are significantly better than with FEs = 1.2 + 05.Therefore, ANDE, ANDE-1, and ANDE-2 benefit from desired FEs and there are continual improvements until the maximum FEs are achieved.
From Table S1, we can see that all algorithms perform well and there is no significant difference between them.However, it can be clearly seen from remaining problems, the performance of ANDE and ANDE-1 is almost similar and they outperform ANDE-2 while ANDE-2 performs some better that ANDE and ANDE-1 on  10 ,  15 ,  16 , and  20 .On the other hand, convergence behavior is another important factor that must be considered in comparison among all proposed algorithms.Therefore, in order to analyze the convergence behavior of each algorithm compared, the convergence graph of the median run has been plotted for each test problem.Figure S1 presents the convergence characteristics of all algorithms.From Figure S1, it can be observed that the convergence behavior supports the abovementioned analysis and discussions.Finally, it is clearly deduced that the remarkable performance of ANDE and ANDE-1 algorithms is due to the proposed mutation scheme that has both exploration ability and exploitation tendency.Additionally, it also visible that the proposed selfadaptive crossover procedure enhances the performance of the basic DE algorithm and it significantly promotes the performance of ANDE and ANDE-1 algorithms.In order to investigate and compare the performance of the proposed algorithms ANDE, ANDE-1, and ANDE-2 at statistical level, multiproblem Wilcoxon signed-rank test at a significance level 0.05 is performed on mean errors of all problems (with 1.25 + 05 FEs, 6.00 + 05 FEs, and 3.00 + 06 FEs) and the results are presented in Tables 1, 2, and 3, respectively.From Table 1, it can be obviously seen that there is no significant difference between ANDE-1, ANDE-2, and ANDE.Besides, it can be obviously seen from Tables 2 and 3 that ANDE and ANDE-1 are significantly better than ANDE-2.Besides, there is no significant difference between ANDE-1 and ANDE.Although ANDE-2 achieved good performance in the initial phase of the search, it cannot keep the same performance during the rest of the optimization phases as expected because it depends only on the basic mutation which has a lack of exploitation capability.On the other hand, ANDE and ANDE-1 have almost the same excellent performance due to the new triangular mutation.
(i) For many test functions, the worst results obtained by the proposed algorithms are better than the best results obtained by other algorithms with all FEs.(ii) For many test functions, there is continuous improvement in the results obtained by our proposed algorithms, especially ANDE and ANDE-1, with all FEs while the results with FEs = 6.0 + 05 are very close to the results with FEs = 3.0 + 06 obtained by some of the compared algorithms which indicate that our proposed approaches are scalable enough and can balance greatly the exploration and exploitation abilities for solving high-dimensional problems until the maximum FEs are reached.(iii) For many functions, the remarkable performance of ANDE and its two versions with FEs = 1.20 + 05 and FEs = 6.0 + 05 compared to the performance of other algorithms shows its fast convergence behavior.4, 5, and 6, respectively, where  + is the sum of ranks for the functions in which first algorithm outperforms the second algorithm in the row, and  − is the sum of ranks for the opposite.
From Table 4, it can be obviously seen that ANDE is significantly better than SDENS, jDElsgo, and DECC-DML algorithms; ANDE is significantly worse than DASA, EOEA, and MA-SW-CHAINS algorithms.Besides, there is no significant difference between DMS-PSO-SHS and ANDE.Besides, it can be obviously seen that ANDE-1 is significantly better than SDENS and jDElsgo algorithms; ANDE-1 is significantly worse than DASA and MA-SW-CHAINS algorithms.However, there is no significant difference between DMS-PSO-HA, DECC-DML, EOEA, and ANDE-1.Regarding ANDE-2, it is significantly better than SDENS and jDElsgo algorithms; ANDE-2 is significantly worse than DASA, EOEA, and MA-SW-CHAINS algorithms.Besides, there is no significant difference between ANDE-2 and DMS-PSO-SHS and DECC-DML.
On the other hand, from Table 5, it can be obviously seen that ANDE is significantly better than SDENS, jDElsgo, and DECC-DML algorithms; ANDE is significantly worse than MA-SW-CHAINS algorithms.Besides, there is no significant difference between DMS-PSO-HA, DASA, EOEA, and ANDE.Besides, ANDE-1 is significantly better than Rosenbrock) with all types of modality and nonseparability.This proves that the proposed triangular mutation with small value of CR performs well on some unimodal, multimodal, and nonseparable problems.Regarding the remaining problems, ANDE can perform well with the unknown optimal CR value that is appropriate for these types of problems.Additionally, it also shows the effectiveness of our proposed novel idea about the gradual increase of the crossover rate during the initial stage of the search process.In fact, the main idea behind this comparison is to prove that the proposed ANDE algorithm can achieve the same results on some of the problems but when CR is tuned manually.Therefore, it proves that the proposed mutation scheme has well exploration and exploitation abilities with associated appropriate CR values for each problem.Besides, it also proves that the proposed self-adaptive scheme of CR plays a vital role in determining the optimal CR values for all problems during the optimization process.Therefore, the statistical analysis on the results in Table 7 confirms that ANDE with the proposed self-adaptive crossover scheme outperforms the other compared ANDE versions with constant CR value and with one fixed list of CR values.

Parametric Study on the Learning Period Parameter.
The crossover (CR) practically controls the diversity of the population and it is directly affected by the value of the learning period parameter.In this subsection, some trials have been performed to determine the suitable values of learning period (LP).Thus, the performance of ANDE has been investigated under two different LP values (5% GEN and 20% GEN) and the results have been compared to those obtained by the value of 10% GEN.To analyze the sensitivity of this parameter, we further tested two extra configurations: (1) ANDE LP=5% , which is the same as ANDE except that the leaning period was set to LP = 5% GEN (2) ANDE LP=20% , which is the same as ANDE except that the leaning period was set to LP = 20% GEN The statistical results including the best, median, mean, and worst results and the standard deviation over 25 independent runs of these algorithms are summarized in Table S8.
It can be obviously seen from Table 8 that ANDE is significantly better than ANDE LP=20% algorithm.Besides, there is no significant difference between ANDE LP=5% and ANDE.However, Table 8 shows that ANDE obtains higher  + value, which means that ANDE is overall better than ANDE LP=5% .Moreover, from Table 8, ANDE outperforms ANDE LP=5% and ANDE LP=20% in 13 problems out of 17 problems while losing to ANDE LP=5% and ANDE LP=20% in 4 problems which indicate that the performance of ANDE with the value of LP 10% is better than the performance of ANDE with the other two learning periods.However, this is not meaning that the LP of 10% is suitable for all problems with all function evaluations.Indeed, by a closer look at Tables 2  and 3, it can be seen that the solution of  6 obtained by the three algorithms using 6.00+5 FEs is better than the solution achieved using 3.00 + 06.The main reason of this case can be obviously deduced from the convergence performance for  6 presented in Figure S1.It shows very fast convergence of all algorithms in the first 2.0 + 05 evaluations which is considered to be premature convergence, meaning that, in the rest of the given time interval, the algorithms are slowly moving in the neighborhood of a local optimum.This is due to the delay in finding the suitable crossover value during the learning period which is 6000 generations.Practically, we applied the same learning period of 6.00 + 05 FEs which is  S2.Therefore, the learning period plays a vital role in optimization process and it is surely varied from one problem to another.Really, future research will focus on how to control the learning period.Overall, the learning period is suitable for all other problems as explained.

Parametric Study on the Maximum Failure Counter.
In this subsection, some trials have been performed to determine the appropriate values of maximum failure counter (MFC).The crossover (CR) practically controls the diversity of the population and it is directly affected by the value of the maximum failure counter parameter.Thus, the performance of ANDE has been investigated under three different maximum failure counter values (0, 10, and 30) and the results have been compared to those obtained by the value of 10.
To analyze the sensitivity of this parameter, we further tested three extra configurations: (1) ANDE MFC=0 , which is the same as ANDE except that the maximum failure counter was set to MFC = 0 (2) ANDE MFC=10 , which is the same as ANDE except that the maximum failure counter was set to MFC = 10 (3) ANDE MFC=30 , which is the same as ANDE except that the maximum failure counter was set to MFC = 30 The statistical results including the best, median, mean, and worst results and the standard deviation over 25 independent runs of these algorithms are summarized in Table S9.
It can be obviously seen from Table 9 that ANDE is significantly better than ANDE MFC=0 algorithm which shows the vital role and benefits of introducing (MFC) in the proposed self-adaptive CR scheme.Besides, there is no significant difference between ANDE MFC=10 and ANDE.Moreover, ANDE outperforms ANDE MFC=10 in 9 problems out of 17 problems while it loses to ANDE MFC=10 in 7 problems and ties in 1 problem indicating that the performance of ANDE with the value of MFC = 10 is almost similar to the performance of ANDE with MFC = 20.However, ANDE is significantly better than ANDE MFC=30 deducing that the increasing of MFC value has negative effect that could lead to significant deterioration in the performance of ANDE algorithm.From the above statistical analysis, it can be concluded that ANDE with self-adaptive crossover rate and with learning period LP of 5% or 10% and maximum failure counter (MFC) of 10-20 generations performs well in the majority of the problems.

Conclusion
In order to efficiently concentrate the exploitation tendency of some subregion of the search space and to significantly promote the exploration capability in whole search space during the evolutionary process of the conventional DE algorithm, a Differential Evolution (ANDE) algorithm for solving large scale global numerical optimization problems over continuous space was presented in this paper.The proposed algorithm introduced a new triangular mutation rule based on the convex combination vector of the triplet defined by the three randomly chosen vectors and the difference vectors between the best, better, and the worst individuals among the three randomly selected vectors.The mutation rule is combined with the basic mutation strategy DE/rand/1/bin, where only one of the two mutation rules is applied with the probability of 2/3 since it has both exploration ability and exploitation tendency.Furthermore, we propose a novel self-adaptive scheme for gradual change of the values of the crossover rate that can excellently benefit from the past experience of the individuals in the search space during evolution process which in turn can considerably balance the common trade-off between the population diversity and convergence speed.The proposed mutation rule was shown to enhance the global and local search capabilities of the basic DE and to increase the convergence speed.The algorithm has been evaluated on the standard high-dimensional benchmark problems.The comparison results between ANDE and its versions and the other seven state-of-the-art evolutionary algorithms that were all tested on this test suite on the IEEE congress on evolutionary competition in 2010 indicate that the proposed algorithm and its two versions are highly competitive algorithms for solving large scale global optimization problem.The experimental results and comparisons showed that the ANDE algorithm performed better in large scale global optimization problems with different types and complexity; it performed better with regard to the search process efficiency, the final solution quality, the convergence rate, and robustness, when compared with other algorithms.Finally, the performance of the ANDE algorithm was statistically superior to and competitive with other recent and well-known DEs and non-DEs algorithms.The effectiveness and benefits of the proposed modifications used in ANDE were experimentally investigated and compared.It was found that the two proposed algorithms ANDE and ANDE-1 are competitive in terms of quality of solution, efficiency, convergence rate, and robustness.They were statistically superior to the state-of-the-art DEs and non-DEs algorithms.Besides, they perform better than ANDE-2 algorithm in many cases.Although the remarkable performance of ANDE-2 was competitive with some of the compared algorithms, several current and future works can be developed from this study.Firstly, current research effort focuses on how to control the scaling factors by self-adaptive mechanism and develop another self-adaptive mechanism for crossover rate.Additionally, the new version of ANDE combined with Cooperative Coevolution (CC) framework is being developed and will be experimentally investigated soon.Moreover, future research will investigate the performance of the ANDE algorithm in solving constrained and multiobjective optimization problems as well as real-world applications such as data mining and clustering problems.In addition, large scale combinatorial optimization problems will be taken into consideration.Another possible direction is integrating the proposed triangular mutation scheme with all compared and other self-adaptive DE variants plus combining the proposed self-adaptive crossover with other DE mutation schemes.Additionally, the promising research direction is joining the proposed triangular mutation with evolutionary algorithms, such as genetic algorithms, harmony search, and particle swarm optimization, as well as foraging algorithms such as artificial bee colony, bees algorithm, and Ant-Colony Optimization.

( 21 )Algorithm 2 :
CR Ratio List[] = CR Ratio List[] + (1 − min(|( Description of ANDE algorithm. In an exponential crossover, an integer value  is randomly chosen within the range {1, }.This integer value acts as a starting point in ⃗  , , from where the crossover or exchange of components with ⃗  ,+1 starts.Another integer value  (denotes the number of components) is also chosen from the interval {1,  − }.The trial vector ( ⃗  ,+1 ) is created by inheriting the values of variables in locations  to  +  from ⃗  ,+1 and the remaining ones from ⃗  , .
2.4.Selection.DE adapts a greedy selection strategy.If and only if the trial vector  +1  yields fitness function value as good as or a better than    , then  +1  is set to  +1 [27]main idea of CC is to partition the LSGO problem into a number of subproblems; that is, the decision variables of the problem are divided into smaller subcomponents, each of which is optimized using a separate EA.By using this divide-and-conquer method, the classical EAs are able to effectively solve many separable problems[25].CC show better performance on separable problems but deteriorated on nonseparable problems because the interacting variables could not be grouped in one subcomponent.Recently, different versions of CC-based EAs have been developed and shown excellent performance.Yang et al.[26]proposed a new decomposition strategy called random grouping as a simple way of increasing the probability of grouping interacting variables in one subcomponent.According to this strategy, without any prior knowledge of the nonseparability of a problem, subdivide -dimensional decision vector into  -dimensional subcomponents.Later, Omidvar et al.[27] (ii) Noncooperative Coevolution (CC) framework algorithms or no divide-and-conquer methods Cooperative Coevolution (CC) has become a popular and effective technique in evolutionary algorithms (EAs) for large scale global optimization since its initiation in the publication of Potter and Jong [25].proposed DECC-DML algorithm which is a Differential Evolution algorithm adopting CC frame.They suggested a new decomposition strategy called delta grouping.The central idea of this technique was that the improvement interval of interacting variables would be limited if they were in different subcomponents.Delta method measures the averaged difference in a certain variable across the entire population and uses it for identifying interacting variables.The experimental results show that this new method is more effective than the existing random grouping method.However, DECC-DML is less efficient on nonseparable functions with more than one group of rotated variables.Likewise, Wang and Li [28] proposed another CC-based technique, named EOEA, to handle LSGO problems, in which the search procedure is divided into two stages: (1) the global shrinking stage and (2) If the trial vector is better than the target vector through generations, then: Select the CR value from A with maximum relative change improvement ratio (RCIR) from CR Ratio List[].

Table 1 :
TableS2that ANDE-2 performs significantly better than ANDE and ANDE-1 algorithms on separable functions ( 1 ,  2 ).For  3 and singlegroup -nonseparable functions ( 4 - 8 ), the performance of the three algorithms are almost similar with the exception of test function  6 where ANDE-2 performs better than HDE and ANDE-2 algorithms.Regarding /2-group nonseparable functions ( 9 - 13 ), the performance of the three algorithms is almost similar for  9 .ANDE-2 outperforms ANDE and ANDE-1 on  10 ,  11 while for  12 , it loses to them.For  13 , ANDE outperforms other two algorithms.For /-group -nonseparable functions ( 14 - 18 ), ANDE and ANDE-1 perform best but ANDE-2 only performs better than ANDE and ANDE-1 on  16 .But results of ANDE-1 on  13 and  17 are relatively better than results of ANDE.As for fully nonseparable functions ( 19 - 20 ), it is obvious that ANDE and ANDE-1 perform better than ANDE-2.As for  20 , ANDE performs significantly better than ANDE-1 and ANDE-2 algorithms.As can be seen from TableS3, ANDE, ANDE-1, and ANDE-2 algorithms were able to nearly reach the global optimal solution with high consistency of the separable  1 (unimodal) and  3 (multimodal) functions.However, ANDE-2 performs better than ANDE and ANDE-1 on separable multimodal function  Results of multiple-problem Wilcoxon's test for ANDE, ANDE-1, and ANDE-2 over all functions at a significance level 0.05 (with 1.25E + 05 FEs).
2. Besides, ANDE, ANDE-1, and ANDE-2 got close to the optimum single-group nonseparable functions multimodal function  6 while only ANDE and ANDE-1 were also very close to the optimal solution of unimodal function  7 .As for  4 ,  5 , and  8 , ANDE-1 outperforms ANDE and ANDE-2.Regarding the