An Adaptive Cauchy Differential Evolution Algorithm for Global Numerical Optimization

Adaptation of control parameters, such as scaling factor (F), crossover rate (CR), and population size (NP), appropriately is one of the major problems of Differential Evolution (DE) literature. Well-designed adaptive or self-adaptive parameter control method can highly improve the performance of DE. Although there are many suggestions for adapting the control parameters, it is still a challenging task to properly adapt the control parameters for problem. In this paper, we present an adaptive parameter control DE algorithm. In the proposed algorithm, each individual has its own control parameters. The control parameters of each individual are adapted based on the average parameter value of successfully evolved individuals' parameter values by using the Cauchy distribution. Through this, the control parameters of each individual are assigned either near the average parameter value or far from that of the average parameter value which might be better parameter value for next generation. The experimental results show that the proposed algorithm is more robust than the standard DE algorithm and several state-of-the-art adaptive DE algorithms in solving various unimodal and multimodal problems.


Introduction
Differential Evolution (DE) is a powerful population based search technique for optimizing problems. Many researchers have used the DE in practical fields because this technique has good convergence properties and is easy to apply [1]. The DE has three control parameters such as scaling factor ( ), crossover rate (CR), and population size (NP). The performance of DE is largely influenced by what parameter values are assigned to these control parameters. Therefore, in order to have a good optimization performance, finding suitable parameter values is of crucial importance [2,3]. In the DE, the control parameters are usually adjusted by using the trialand-error search method. However, the value assigned by the trial-and-error method might be efficient for solving one type of problems and inefficient for solving other problems [4]. Moreover, it requires a lot of computational resources. As a solution of this problem, parameter adaptation has been utilized. According to Eiben et al. [5,6], the parameter adaptation can be categorized into three classes as follows.
(1) Deterministic parameter control: the control parameters are adapted by some deterministic rule.
(2) Adaptive parameter control: the control parameters are adapted by some form of feedback from evolutionary search.
(3) Self-adaptive parameter control: the control parameters are adapted by the evolution-of-evolution technique. The control parameters are encapsulated in each individual as additional chromosomes and undergo evolutionary procedure. The well designed adaptive or self-adaptive parameter control method can improve the performance of DE. Therefore, the adaptive and self-adaptive parameter controls are more applicable than the trial-and-error search method. So far, many adaptive and self-adaptive DE algorithms have been proposed and they have shown that the adaptive and self-adaptive DE algorithms have more robust performance than standard DE algorithm for many benchmark functions 2 The Scientific World Journal [4,7,8]. Although there are many suggestions for adapting control parameters, it is still a challenging task to properly adapt the control parameters for problem. Based on various experiments, we found out that the parameter adaptation should be performed in every generation and the control parameters of each individual should be adapted based on the average parameter value of successfully evolved individuals' parameter values by using the Cauchy distribution.
The Cauchy distribution is one of the long tail distributions. The Cauchy distribution generates large step from peak location with higher probability. Many evolutionary algorithms have used this long tail property as an escaping local minima method. The proposed algorithm also, but in different manner, utilizes the Cauchy distribution for the parameter adaptation. In the proposed algorithm, each individual has its own control parameters. The control parameters of each individual are adapted based on the average parameter value of successfully evolved individuals' parameter values by using the Cauchy distribution. It is because the successfully evolved individuals are led by appropriate parameter values. That is to say, the appropriate parameter values make the individuals take the good region for solving problem. However, there is a possibility that the current appropriate parameter values might be inappropriate parameter values in next generation. Therefore, we cannot assure that the parameter adaptation based on the average parameter value is good for making well-suited parameter values for future generations. In view of the above considerations, the parameter adaptation of proposed algorithm utilizes the Cauchy distribution as a large step method. According to it, the control parameters of each individual are assigned either near the average parameter value or far from that of the average parameter value which might be better parameter value for next generation. The experimental results show that the proposed algorithm is more robust than standard DE algorithm and some adaptive and self-adaptive DE algorithms such as jDE [4], SaDE [7], and MDE [9] on solving multimodal problems as well as unimodal problems.
The rest of this paper proceeds as follows. In Section 2, we introduce basic operations of standard DE algorithm and some adaptive and self-adaptive parameter control DE algorithms. In Section 3, the Cauchy distribution is described. The proposed algorithm is explained in detail in Section 4. Section 5 presents the experimental results. We conclude this paper in Section 6.

DE Algorithm.
In the DE, a parent vector is called "target vector, " a mutant vector is that generated by mixing donor vectors, and an offspring obtained by making crossover between target vector and mutant vector is called "trial vector. " A target vector generates a trial vector which is moved around in search space by using the mutation and the crossover operations. If the fitness value of the trial vector is better than or equal to the fitness value of the target vector, the trial vector is accepted and included in the population of next generation, otherwise it is discarded and the target vector remains for the next generation. This cycle of operations is repeated until some specific termination conditions are not satisfied.

Initialization.
The population of the DE consists of NP individuals. Each individual is a -dimensional parameter vectors, denoted as , = 1 , , . . . , , where = 1, . . . , NP. In the initialization stage, first of all, the DE designates the search space of the test problem by prescribing the minimum ( min = 1 min , . . . , min ) and the maximum ( max = 1 max , . . . , max ) parameter bounds. After that, the parameter vectors of the each individual are initialized as follows: where rand[0, 1] is the uniform distributed random number lying between 0 and 1. By doing this, all the individuals are randomly scattered in the search space. After initialization, the DE executes a loop of the operations: mutation, crossover, and selection.

Mutation Operation.
The mutation is the first operation to generate child individuals from their parent individuals. So far, a lot of mutation strategies have been proposed. Here, we explain an example, called DE/rand/1/bin, which was introduced by Storn and Price [1]. First of all, the mutation strategy randomly select three mutually exclusive individuals among [1,NP]. They are called the "donor vectors", denoted as 1 , , 2 , , and 3 , . A mutant vector , is generated by adding the scaling difference of 2 , and 3 , to 1 , .
where is a scaling factor for amplifying the difference value between 2 , and 3 , .

Crossover Operation.
The crossover generates the trial vectors by making a crossover between target vector and mutant vector. There are two crossover operations which are commonly used. They are the binomial and the exponential crossovers. Here, we describe the binomial crossover. At the beginning, a random number is selected. If the random number is less than or equal to the crossover rate CR, the first element of the trial vector is occupied by the first element of the mutant vector. Otherwise, the element is occupied by the target vector. This procedure is repeated times for each individual. The trial vector , = 1 , , . . . , , is generated as follows: Prior to crossover, the DE select another random number rand lying between [1, ]. The random number is used to guarantee that at least one element of the trial vector is occupied by the mutant vector.

Selection Operation.
The selection is the last operation of the DE iterations. It compares the fitness value of the target and the trial vectors. If the fitness value of the trial vector is better than or equal to the fitness value of the target vector, the trial vector is accepted and forms part of the population, otherwise it is discarded and the target vector remains for the next generation. these procedures are formulated as follows:

jDE.
Brest et al. [4] proposed a self-adaptive parameter control DE (called jDE) based on DE/rand/1/bin. In jDE, the control parameters, and CR, are encapsulated in each individual as additional chromosomes. Therefore, all individuals have their own control parameters, denoted as and CR . jDE utilizes four additional parameters: 1 , 2 , , and . The first two parameters are used to determine whether the control parameters need to be updated or not and the last two parameters are used to designate the range of the control parameter . At the beginning, the values of and CR are initialized by 0.5 and 0.9, respectively. Then, the control parameters and CR for next generation are adapted as follows: The author used 1 = 0.1, 2 = 0.1, = 0.1, and = 0.9. This procedure is executed before applying the mutation operation. Therefore, newly generated control parameters affect the mutation and the crossover operations. [10] proposed the first self-adaptive population size DE (called DESAP) based on the self-adaptive Pareto DE [11]. DESAP can self-adapt not only the scaling factor and the crossover rate CR but also the population size NP. This algorithm utilizes additional parameters such as , , and . These parameters are encapsulated in each individual as additional chromosomes and also participated in the mutation and the crossover operations for evolving itself. The newly generated control parameters are selected when the fitness value of the trial vector is lower than or equal to the fitness value of the target vector. DESAP is divided into two algorithms (i.e., DESAP-abs and DESAP-rel) according to the equation of the population size for the next generation. DESAP has shown the effectiveness of the selfadaptive population size technique. [12,13] proposed a new mutation strategy called DE/current-to-best which is lower greedy than DE/current-to-best/1. This strategy utilizes not the best individual of the population but the randomly selected one from the top 100 % ( ∈ (0, 1]) individuals. In addition, an external archive scheme was proposed by storing the set of parameter vectors of recently discarded individuals. These parameter vectors provide the additional information about promising progress direction and increase the population diversity. The following equations represent the DE/current-to-best with and without the external archive strategy:

JADE. Zhang and Sanderson
(1) DE/current-to-best with archive: (2) DE/current-to-best without archive: wherẽ2 , is an individual randomly selected from the population or external archive. In terms of parameter adaptation, JADE adapts the crossover rate CR , as follows: where rnd is the Gaussian distributed random number generator. After that, the crossover rate CR is truncated to [0, 1]. Moreover, CR that is a mean value to generate CR is modified as follows: where is a constant value in [0, 1], mean stands for the arithmetic mean, and CR contains the successfully evolved crossover rates of individuals after the selection operation. Similarly, the scaling factor is adapted as follows: where rnd is the Cauchy distributed random number generator. After that, the scaling factor is truncated to 1 if ≥ 1 or regenerated if ≤ 0. Also, that is a mean value to generate is modified as follows: where is a constant value in [0, 1], mean stands for the Lehmer mean, and contains the successfully evolved scaling factors of individuals after the selection operation. [9] proposed a Modified Differential Evolution (MDE). This algorithm utilizes the Cauchy distribution as another mutation operation. In the selection operation, all individuals are monitored and the results of the selection operation are stored in the failure counter. If some individuals consequently fail to be selected as an individual for the next generation over MFC (Maximum Failure Counter), MDE assumes that these individuals were felled into some local minima. Therefore, the algorithm applies the Cauchy distributed mutation to these individuals instead of the mutation and the crossover operations to escape the local minima. After that, the failure counter is initialized by 0. MDE has shown the good performance for the higher dimensional problems, compared with DE/rand/1/bin.

Analysis of the Cauchy Distribution
The Cauchy distribution is a continuous probability distribution and it has two parameters 0 and . 0 is the peak location of the distribution and stands for the halfwidth at halfmaximum (HWHM) of the distribution. The value of determines the shape of the Cauchy distribution. If is assigned a lower value, the peak of the probability density function would be higher and its width would be narrower. On the other hand, if is assigned a higher value, the probability density function would have a lower peak and a wider width. The Cauchy distribution generates a large step from the peak with a higher probability. In general, many evolutionary algorithms have used this long tail property as an escaping local minima technique. The probability density function and the cumulative distribution function of the Cauchy distribution are defined by Figure 1 illustrates the various probability density functions of the Cauchy distribution. Here, and denote the location ( 0 ) and the scaling factor ( ), respectively. In addition, = 0 and = 1 generate the standard Cauchy distribution.

When Parameter Adaptation Should Be Performed?
Finding appropriate moments of adapting control parameters is important problem for improving the DE performance. In this section, we explain when parameter adaptation should be performed. Looking for previous studies, jDE utilizes self-adaptive method which allows each individual to maintain suitable control parameter values by itself. However, the parameter adaptation of jDE depends on the predefined probabilities ( 1 and 2 ). Therefore, this method does not guarantee the adequacy of maintained control parameter values for current generation. In other words, it is possible that some individuals maintain unsuitable control parameter values. In SaDE, the scaling factor is calculated in every generation by using Gaussian distribution with the predefined mean value and all individuals utilize it. The crossover rate of SaDE, each individual has its own crossover rate and they are calculated by using Gaussian distribution with the median value of accumulated information about selection operation results during learning period as a mean value. This method is performed in every end of learning period. The parameter adaptation of SaDE has two problems. First, although each individual has different state during the DE iteration, the  scaling factor adaptation of SaDE does not consider it. Therefore, many individuals might be utilized unsuitable control parameter values. Second, the selection operation results of past generations might become unnecessary or even noisy information for adapting the crossover rate. In addition, similar to jDE, during the learning period, it is possible that some individuals maintain unsuitable control parameter values.
Parameter adaptation should be performed whenever current control parameter values are not suitable for finding optimal value. We can utilize the selection operation results for distinguishing that an individual has suitable control parameters or not because the DE is based on the elitism. We find out that parameter adaptation should be performed in every generation. This means that every generation is appropriate moments of adapting control parameters. The reasons are follows.
(1) If an individual has good control parameter values, the child individual can succeed in the selection operation and it can locate better region for finding optimal value than the region of its parent individual. At this moment, the characteristic of child individual region might differ from the characteristic of its parent region. It means that there is a possibility of the existing of more suitable control parameter values than the previous control parameter values for new region. Therefore, although an individual succeeds in the selection operation, we should apply parameter adaptation for finding more suitable control parameter values.
(2) On the contrary, if an individual does not have good parameter values, the child individual might fail to evolve in the selection operation and then it remains the same region with its parent individual. This indicates that the individual needs more suitable control parameter values for escaping the region. Therefore, if an individual fails to evolve itself, we should also apply parameter adaptation.
The Scientific World Journal 5 As a result, because the individuals of DE are evolved for exploring new regions, it is hard to assure that the previous suitable control parameter values are still suitable until satisfying some probabilities or during some periods. Therefore, parameter adaptation should be performed in every generation.

How Parameter Adaptation Should Be Performed?
Finding proper method of adapting control parameters is also important problem for improving the DE performance. In this section, we explain how parameter adaptation should be performed. When performing parameter adaptation, we can utilize the successfully evolved individuals' control parameter values for parameter adaptation. It is because the successfully evolved individuals are led by good parameter values. That is to say, good parameter values make the individuals take the good region for solving problem.
Looking for previous studies, jDE adapts control parameter values by using the uniform distribution. However, the randomly generated control parameter values might not be suitable for finding better region. In SaDE, the successfully evolved individuals' crossover rates are stored in CR Memory. After learning period, the parameter adaptation of SaDE extracts the median value from the CR Memory as a mean value of the Gaussian distribution. In general, the median function is not largely influenced by outliers. However, the outliers give us the information about a new possibility of better control parameter values. Therefore, we should consider the outliers together.
As a result, the successfully evolved individuals' control parameter values based parameter adaptation is proper 6 The Scientific World Journal

The Proposed
Algorithm. The proposed algorithm makes use of DE/rand/1/bin as a basic framework, in which the mutation is one of weaker greedy mutation strategies. In general, this mutation strategy is not so efficient in solving the unimodal problems since its lack of the fast convergence property makes the population slowly converge into the global minimum. However, if the control parameters are adapted suitably, this strategy can also demonstrate a good performance property in the unimodal and the multimodal problems.
The proposed algorithm adjusts two control parameters, and CR, except for NP. The control parameter NP does not seriously affect the performance of DE more than the other two control parameters. Prior to explaining the adaptation procedures, the characteristics of these parameters are described. The control parameter is related to the convergence speed of DE. Therefore, a higher value of encourages the exploration power which is generally useful in the early stage of DE. On the other hand, a lower value of The Scientific World Journal 7   promotes the exploitation power that is usually desirable in the later stage of DE. Moreover, the value of control parameter CR is related to the diversity of population.
The parameter adaptation of proposed algorithm utilizes Memory and CR Memory. The successfully evolved individuals' scaling factors and crossover rates are stored in these memories. When performing parameter adaptation, arithmetic mean function is applied to extract mean values and these are actual parameter values of the Cauchy distribution as location parameters. The Cauchy distribution is one of the long tail distributions. The Cauchy distribution generates the large step from the peak location with higher probability. There is a possibility that the current appropriate parameter values might be the inappropriate parameter values in next generation. Therefore, we cannot assure that the average parameter value is still the well-suited parameter value for the future generations. In view of the above considerations, the parameter adaptation of the proposed algorithm utilizes the Cauchy distribution as a large step method. Through this, the control parameters of each individual are assigned either near the average parameter value or far from that of the average parameter value which might be the better parameter value of the next generation.
The details of the proposed algorithm are given as follows. First of all, all individuals have their own control parameters, and CR where is the individual's index. At the initialization stage, these parameters are initialized as 0.5 and 0.9, respectively. The mutation and crossover operations used in DE/rand/1/bin are employed. In the selection operation, if the trial vector is selected as an individual for the next generation, the control parameter values of this individual are stored in the Memory and CR Memory. After the selection operation, the parameter adaptation is carried out.
The parameter is adapted by the Cauchy distribution with the average parameter value. After that, the is truncated to 0.1 or 1 if the is less than 0.1 or greater than 1. The adaptation of the scaling factor is performed as follows: where avg, is the average parameter value of the accumulated information in the Memory as the location parameter of the Cauchy distribution. The is scaling factor of the equation and is assigned 0.1.
Similarly, the CR is adapted by the Cauchy distribution with the average parameter value. After that, the CR is 8 The Scientific World Journal truncated to 0 or 1 if the CR is less than 0 or greater than 1. The adaptation of the crossover rate is given as follows: where CR avg, is the average parameter value of the accumulated information in the CR Memory as the location parameter of the Cauchy distribution. The is scaling factor of the equation and is assigned 0.1. Algorithm 1 describes the pseudocode of the proposed algorithm.
When performing parameter adaptation, if there is no successfully evolved individual, then the average parameter values are assigned the average parameter values of last generation.

Benchmark Functions.
The performance of proposed algorithm was evaluated by fourteen benchmark functions. The first eleven benchmark functions are from [14,15] and the rest benchmark functions are Extended 12 ( 12 ), Bohachevsky ( 13 ), and Schaffer ( 14 ). The functions are shown in Table 1.
The characteristics of the benchmark functions are described as follows: 1 -3 are continuous unimodal functions, 4 is a discontinuous step function, 5 is a noise quadratic function, and 6 -14 are continuous multimodal functions that the number of local minima exponentially increases when their dimension grows. A more detailed description of each function is given in [14,15].
The proposed algorithm shows better performance on solving the unimodal problems as well as in the multimodal problems except 3 benchmark function. jDE shows the best performance in 3 benchmark function. The proposed The Scientific World Journal 9    algorithm outperformed all multimodal problems. The second best algorithm is jDE. Although SaDE utilizes strategy adaptation as well as parameter adaptation, jDE shows better results than SaDE in all benchmark functions. It means that parameter adaptation is more important to improve the performance of DE. MDE shows better performance than DE/rand/1/bin in several unimodal and multimodal problems. Table 3 shows the success rate of comparison results. The success rate is obtained by a mount of successful counter divided by a mount of experiment runs (50). The proposed algorithm and two adaptive DE algorithms (jDE and SaDE) show perfect success rates. However, DE/rand/1/bin and MDE show lower success rate than the proposed algorithm and they totally failed to find global optimum in several benchmark functions. Figure 2 shows the average best graphs of adaptive Cauchy DE and the compared DE algorithms.

Comparison of Adaptive Cauchy DE and FEP and CEP.
The mean deviation and the standard deviation of experiment results obtained by adaptive Cauchy DE, DE/rand/1/bin, FEP (Fast Evolutionary Programming), and CEP (Classic Evolutionary Programming) for 1 -11 for = 30 are summarized in Table 4. The results of FEP and CEP are taken from [13, Tables 2-4].
The proposed algorithm shows better performance on solving all benchmark functions than DE/rand/1/bin, FEP, and CEP. The second best algorithm is DE/rand/1/bin. However, DE/rand/1/bin shows lower performance than FEP in several benchmark functions ( 6 and 7 ).

Comparison of Adaptive Cauchy DE and Adaptive LEP
and Best Lévy. The mean deviation and the standard deviation of experiment results obtained by adaptive Cauchy DE, DE/rand/1/bin, adaptive LEP, and best Lévy for 1 and 6 -11 for = 30 are summarized in Table 5. The results of adaptive LEP and best Lévy are taken from [18, Table 3]. The population size NP is fixed by 100 in all experiments. The maximum number of generations is assigned by 1500 for all benchmark functions.
The proposed algorithm shows better performance on solving all benchmark functions than DE/rand/1/bin, FEP, and CEP. The second best algorithm is DE/rand/1/bin again. However, DE/rand/1/bin shows lower performance than adaptive LEP and best Lévy in several benchmark functions ( 6 and 7 ).   The results show that adapting control parameters FC = 0 with FC CR = 0 and FC = 1 with FC CR = 0 had good performance in the comparison. However, when comparing success rate, FC = 0 with FC CR = 0 had higher success rate than FC = 1 with FC CR = 0 in 3 benchmark function.
Note that when failure counter is increasing, the performance of algorithm decreased. Therefore, parameter adaptation should be performed in every generation. This is because the individuals of DE are evolved for exploring new regions. Therefore, it is hard to assure that the previous suitable control parameter values are still suitable until satisfying some probabilities or during some periods. Tables 8 and 9 show the various parameter adaptation method experiment results. The goal of these experiments is finding proper method of utilizing Memory and CR Memory for parameter adaptation. Arithmetic mean indicates that the proposed algorithm utilized arithmetic   The results show that adapting control parameters based on arithmetic mean function had good performance than median function. This is because the outliers give us the information about a new possibility of better control parameter values. Therefore, the arithmetic mean function is more applicable than median function. Parameter adaptation based on its own control parameter values shows good success rate in the comparison. However, the performance was lower than that of other methods. Parameter adaptation based on best individual's control parameter values shows good performance in only unimodal problems. Tables 10 and 11 show the comparison results of the Cauchy distribution with the Gaussian distribution for parameter adapation method. The goal of these experiments is finding which distribution property (short or long tail) is more suitable for parameter adaptation. Cauchy = 0.1 indicates that parameter adaptation is performed based on the Cauchy distribution and the scaling parameter of distribution is assigned 0.1. Gaussian Std = 0.1 indicates that parameter adaptation is performed based on the Gaussian distribution and the standard deviation parameter of distribution is assigned 0.1.
The experiment results show that the Cauchy distribution with = 0.1 had good performance than others. This is because, the Cauchy distribution generates the large step from the peak location with higher probability. Therefore, the control parameters of each individual are assigned either near the average parameter value or far from that of the average parameter value which might be the better parameter value of the next generation.

Conclusion
The parameters of DE should be adequately assigned to attain better performance. But finding suitable values demands a lot of computational resources. In this sense, we present a new DE algorithm which utilizes success memories of scaling factors and crossover rates to properly adjust the control parameters of DE; the control parameters are adapted at each generation based on the Cauchy distribution with mean values of success memories. Experimental results showed that the adaptive Cauchy DE algorithm generally achieves 12 The Scientific World Journal better performance than existing DE variants on various multimodal and unimodal test problems. The results also supported the claim that a long tail distribution is more reliable than a short tail distribution in adjusting the control parameters.