Adaptive Randomness: A New Population Initialization Method

1 School of Computer Science and Information Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, China 2 College of Information, South China Agricultural University, Guangzhou, Guangdong 510642, China 3Wenzhou University Library, Wenzhou University, Wenzhou, Zhejiang 325035, China 4 School of Software and Communication Engineering, Jiangxi University of Finance and Economics, Nanchang, Jiangxi 330013, China


Introduction
Evolutionary algorithms (EAs) are population-based stochastic optimization algorithms.For each optimization problem, they maintain a set of candidate solutions to play the role of individuals in a population, perform crossover and mutation operations on this set to generate different solutions, and use a fitness function to determine the environment within which the solutions live.In the last decade, EAs have been applied successfully to solve many real-world and benchmark optimization problems.However, as populationbased algorithms, EAs such as the genetic algorithm (GA) [1] and differential evolution (DE) [2,3] all have common drawbacks-long computational time and premature convergence, especially when the solution space is hard to explore.
Since reducing computation time needed to reach optimal solutions and improve the quality of the final solutions would be beneficial, many efforts have been already done.However, most of the work mainly focused on the introduction and improvement of selection mechanisms, crossover and mutation operators, parameter adjustments, and some other hybrid strategies.If no information about the solution is available, the most commonly used method to generate the initial population is random initialization.Little work has been done on the population initialization, even though it is a crucial task in EAs and can affect the convergence speed and also the quality of the final solution.Maaranen et al. [4] used quasirandom sequences to generate the initial population for GAs.The experimental results showed that their approach could improve the quality of the final solutions but with no noteworthy improvement for convergence speed.Rahnamayan et al. [5] proposed an opposition-based population initialization method, which achieved a fast convergence speed.Wang et al. [6] presented a population initialization method based on space transformation search (in their following work, such a method is renamed as generalized opposition-based initialization [7]).Experimental results showed that their approach when combined with other strategies outperformed the traditional random initialization and opposition-based initialization.
This paper proposes a new approach for population initialization by employing the adaptive randomness (AR) to improve the quality of the final solutions and also accelerate the convergence speed.AR initialization is an enhanced version of random initialization.It is simple and easy to be implemented.The main idea of AR is to make use of the difference between individuals to make them more evenly spread over the entire search space and then a better approximation for the current candidate solution is obtained.Although this paper only embeds the AR for population initialization of classical DE, the idea is general enough to be applied to all other EAs.Experimental results on 34 well-known benchmark problems show that the proposed approach performs better than the random initialization, opposition-based initialization, and generalized oppositionbased initialization both in the quality of the final solutions and the convergence speed.
The remainder of this paper is organized as follows: in Section 2, the concept of AR is briefly explained.In Section 3, the classical DE is briefly reviewed.In Section 4, the proposed AR-based population initialization algorithm is presented.Experimental results are given in Section 5 with focus on the test functions used, parameter setting, results, and results' analysis.In Section 6, we conclude the work and all benchmark functions are listed in the appendix.

Adaptive Randomness
Traditionally, EAs imitate natural evolution in a population.The population is a set of candidate solutions to an optimization problem, making us consider several solutions at the same time.The population evolves from one generation to another as the individuals are crossbred and mutated until the predefined criteria are satisfied.If no a priori information about the solution is available, the initial population is often selected randomly using random numbers [4].Obviously, the computation time is directly related to the distance of the random numbers from optimal solutions [5].
In practice, random numbers cannot be generated algorithmically.The algorithmically generated numbers (usually called pseudorandom numbers) only try to imitate random numbers.However, it is usually more important that the numbers are as evenly distributed as possible than that they imitate random numbers [4], for they provide much more information about the fitness function.This forms the basis of our approach for population initialization, namely, adaptive randomness (AR).
AR slightly modifies the random initialization by controlling the individuals that can come into the initial population.When adding a new individual to the initial population, AR needs to make sure that the individual should not be too close to any of the previous individuals already in the initial population.
To achieve this, AR should maintain two sets of individuals, that is, partial initial population (PP) and set of trial individuals (ST).Before concentrating on AR-based population initialization, we define the two sets first.Definition 1.Let (ps) = { 1 ,  2 , . . .,  ps } be the initial population of a specific optimization method, where ps is the population size and   ( = 1, 2, . . ., ps) is the candidate solution in a -dimensional space.Then PP is defined by PP ⊆ .Definition 2. ST() = { 1 ,  2 , . . .,   } is the set of  trial individuals such that ST ∩ PP = 0.Each trial individual   ( = 1, 2, . . ., ) is randomly chosen from the -dimensional search space and  is the predefined number of trial individuals.
Obviously trial individuals are those individuals that are randomly generated from the search space but have not been added into PP.
To distribute the individuals in PP as spaced out as possible, when adding a new individual into PP, AR first generates  trial individuals to form ST and then the trial individual that is farthest away from all individuals in PP is selected to be added into PP.Such a process is repeated until the number of individuals in PP reaches ps.
Before introducing the AR-based population initialization algorithm, the classical DE is briefly reviewed in the following section.

Brief Review of the Classical DE
DE, firstly proposed by Storn and Price in [2,3], is a population-based stochastic optimization algorithm and has been successfully used in both benchmark test functions and real-world applications.It is simple yet effective and robust.A plethora of experimental studies show its better performance than other EAs.
The proposed algorithm is also based on this DE scheme.Let us assume that   () ( = 1, 2, . . ., ps) is the th individual Mathematical Problems in Engineering in population (), where ps is the population size,  is the generation index, and () is the population in the th generation.The main idea of DE is to generate trial vectors.Mutation and crossover are used to produce new trial vectors, and selection determines which of the vectors will be successfully selected into the next generation.
For classical DE (DE/rand/1/bin), the mutation, crossover, and selection operators can be defined as follows.
Selection.A greedy selection mechanism is used as where (⋅) is the fitness function.Without loss of generality, this paper only considers minimization problems.If, and only if, the trial vector   ( + 1) is better than   () (i.e., (  ( + 1)) ≤ (  ())),   (+1) is set to   (+1); otherwise, the   (+ 1) remains unchanged; that is,   ( + 1) =   ().Hence the population either gets better or remains the same with respect to the fitness function but never deteriorates.Though there are many variants of DE [2,3], to maintain a general comparison, this paper only uses the classical DE in the conducted experiments to demonstrate the improvement of the convergence speed and the quality of the final solutions by using AR-based population initialization.

The Proposed AR-Based Population Initialization Algorithm
For a specific optimization problem, when lacking a priori information about the solutions, the initial population is usually created using random numbers.AR makes full use of the distance information during the process of population initialization.By applying the AR strategy, we can distribute the individuals as spaced out as possible and obtain a better approximation for the current candidate solutions.So instead of using a pure random initialization, we propose the following AR-based population initialization algorithm (see Algorithm 1).
As is shown in Algorithm 1, PP is initially empty and the first individual of PP is randomly chosen from the search space.During population initialization, PP will be incrementally updated with the individuals selected from ST until the number of individuals in PP reaches the population size ps.
The flowcharts of DE with random population initialization, opposition-based population initialization, generalized opposition-based population initialization, and AR-based population initialization are shown in Figure 2.
AR-based population initialization will be embedded in the classical DE in Section 5 to show its effectiveness in the improvement of the convergence speed and the quality of the final solutions.

Empirical Study
To investigate the effectiveness of the proposed AR-based population initialization algorithm in the improvement of convergence speed and the final solution, we embedded it in the classical DE and conducted controlled experiments.Our experiments were carried out on a PC at 2.3 GHz with 2 GB of RAM.
In the following subsections, we provide details on the test functions of our study (Section 5.1), parameter settings (Section 5.2), and our experiment results and analysis (Section 5.3).[8,9].The definition, the range of search space, and also the global optimum(s) for each function are given in the appendix.The dimensionality of these problems varies from 2 to 100, covering a wide range of problem complexity.

Parameter Settings.
For all the conducted experiments, the parameters of the classical DE, namely, population size (ps), differential amplification factor (), crossover probability constant (CR), and maximum number of function calls (MAX NFC ), if no a change is mentioned, are fixed to 100, 0.5, 0.9, and 10 6 , respectively.Such a setting follows the suggestions given in the literature (e.g., [10][11][12]).And the parameter  of ST in AR-based population initialization is set to 3 unless a change is mentioned.For each optimization problem, NFC is recorded when a specific algorithm reduces the best value to a value smaller than the value-to-reach (VTR) before meeting MAX NFC .In order to minimize the effect of the stochastic nature of the algorithms, all the reported NFCs are averaged over 1,000 independent trials.Obviously, a smaller NFC means a higher convergence speed.In order to compare the convergence speed between two specific algorithms, we introduce another metric, acceleration rate (ARE), which is defined as

Results and
where NFC algA and NFC algB are the NFCs for the two algorithms algA and algB (algA and algB are all chosen from {DE r , DE o , DE go , DE ar }).So ARE > 1 means that algB is faster.The VTR is set to 10 −6 for all benchmark functions.The same setting has been used in the literature (e.g., [13,14]).
We also compare the robustness of DE r , DE o , DE go , and DE ar by measuring the success rate (SR) [13].In the current work, a successful running means that a specific algorithm successfully reaches the VTR for each test function in the allowed MAX NFC .So SR can be calculated as SR = number of times reached VTR total number of trials .
SR is a commonly used metric to characterize the robustness of a specific algorithm; that is, a larger SR means that the algorithm is more robust.
Further the average NFC (NFC avg ), the average SR (SR avg ), and the average ARE (ARE avg ) over the  test functions are calculated as So we can conclude that DE ar shows better convergence speed than the other 3 algorithms with the same parameter settings and fixing the  for the DE ar ( = 3).Some sample bar charts for the performance comparison of the 4 algorithms are given in Figure 3. From the results, it can be seen that DE ar achieves better results than DE r , DE o , and DE go on 24 test functions (about 70.5% of the test functions).DE ar achieves the same performance as the DE go on function  26 , and they are both much better than the other algorithms on this problem.For the rest of the 9 functions, all the algorithms achieve the same results.

Comparison of DE
To compare the performance of multiple algorithms on the test suite, the average ranking of the Friedman test is conducted by the suggestions considered in [7,15].Table 4 shows the average ranking of the 4 DE algorithms on functions  1 - 34 .These algorithms can be sorted by average ranking into the following order: DE ar , DE r , DE go , and DE o .It means that DE ar and DE o are the best and worst ones among the four algorithms, respectively.So as seen, although opposition-based population initialization can accelerate the convergence speed on some test problems, when compared with DE r , it cannot improve the quality of the final solutions.So if we want to obtain a high quality solution, oppositionbased population initialization cannot be used alone.
To investigate the significant differences between the behavior of two algorithms, we conduct four tests, that is, Nemenyi's, Holm's, Shaffer's, and Bergmann-Hommel's [7,15].For each test, we calculate the adjusted  values on pairwise comparisons of all algorithms.Table 5 shows the results of adjusted  values.Under the null hypothesis, the two algorithms are equivalent.If the null hypothesis is rejected, then the performances of these two algorithms are  significantly different.In this paper, we only discuss whether the hypotheses is rejected at the 0.05 level of significance.As we can see, all the four tests reject hypotheses 1-3.
Besides the above four tests, we also conduct Wilcoxon's test to recognize significant differences between the behavior of two algorithms [7,15].We investigate the correlation between  and the quality of the final solutions using the Spearman correlation [16].We repeat the conducted experiments in Section 5.3.3 for  ∈ [10,100] (since ps = 100) with step size of 10 (i.e., 1,000 trials per function per  ∈ {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}).For the limitation of space, we do not show all the results of the final solutions; only the final solutions obtained on  2 and  3 are shown in Table 8 for illustration.But almost a similar behavior has been observed for all functions that the quality of the final solution is better than that of DE r , DE o , and DE go when  > 1.
Table 9 shows the Spearman correlation test results between  and the final solutions obtained on each test function.As seen, there is not a significant correlation between  and the quality of the final solutions.It means that , like other control parameters of DE, has a problemoriented value.Since the larger the  value is, the more time to initialize the population is required, especially when the dimensionality of the problem is also large.Our limited experiments suggest to use a small value of .

Conclusions
This paper employs the concept of adaptive randomness (AR) for population initialization.The main idea of AR is to make use of the difference between individuals to make them more evenly spread over the search space and then a better approximation for the current candidate solution is obtained.In order to investigate the performance of the AR-based population initialization, the classical DE has been utilized.By  (iv) The influence of  (number of trial individuals) was studied by investigating the Spearman correlation between  and the quality of the final solutions.The results obtained on the 34 test functions show that there is not a significant correlation between  and the quality of the final solutions.But the quality of the final solution is better than that of the other three DE algorithms when  > 1.
The main motivation of the current work was the introduction of the concept of adaptive randomness for population initialization.Although this paper only embeds the AR within

Figure 1 :
Figure 1: Illustration of the selection of a trial individual into PP in a two-dimensional space.
r , DE o , DE go , and DE ar in Terms of the Quality of Final Solutions.In this section, DE ar is compared with DE r , DE o , and DE go with respect to the quality of the final solutions.All the experiments were conducted 1,000 times, and the mean function error value and standard deviation of the results are recorded.The results of the 4 DE algorithms on the 34 test problems are presented in Tables 2 and 3, where "Mean" indicates the mean function error value and "Std.Dev." stands for the standard deviation.The best results among the 4 DE algorithms are shown in boldface.

Table 1 :
Comparison of convergence speed (NFC) and success rate (SR) for DE with random population initialization (DE r ), with opposition-based population initialization (DE o ), with generalized opposition-based population initialization (DE go ), and with AR-based population initialization (DE ar ).The reported average values (the last row) are calculated only on the functions where all the algorithms have the same success rate (i.e., all SRs = 1 and one SR = 0.95).

Figure 3 :
Figure 3: Some sample bar charts for the performance comparison of the 4 DE algorithms.The values are calculated at every 1,000 function calls.
Analysis.The experiments are categorized as follows.In Section 5.3.1,DE r , DE o , DE go , and DE ar are compared in terms of convergence speed and robustness.In Section 5.3.2,DE r , DE o , DE go , and DE ar are compared in terms of the quality of the final solution.In Section 5.3.3, the effect of problem dimensionality is investigated.Comparison of DE r , DE o , DE go , and DE ar in Terms of Convergence Speed and Robustness.By the suggestions in the literature (e.g., [5, 6, 9]), we compare the convergence speed of DE r , DE o , DE go , and DE ar by measuring the number of function calls (NFC).
In Section 5.3.4,theeffect of parameter  is studied.All the experiments are conducted 1,000 times with different random seeds, and the average results throughout the optimization runs are recorded.It should be noted that, in the experiments, we find that for a small value of conducted times, the values of evaluation times and the final solutions are not stable.5.3.1.

)
Table 1 summarizes the numerical results when solving the 34 benchmark functions shown in the appendix.The best result of NFC for each function is highlighted in boldface and NFC avg , SR avg , and ARE avg are shown in the last row.Since comparing the algorithms with different SR values seems meaningless, the reported average values are calculated only on the functions where all the algorithms have the same success rate.As seen, DE ar outperforms DE r on 22 test functions (about 64.7% of the problems).Though DE r is faster than DE ar on 4 functions (i.e.,  6 ,  16 ,  26 , and  31 ), its SR is worse.Except for  6 , the rest 3 functions are functions with a low dimensionality ( ≤ 10).DE ar outperforms DE o on 22 test functions (about 64.7% of the problems), while DE o surpasses DE ar only on 4 functions (i.e.,  6 ,  18 ,  26 , and  29 ).Though DE o is faster than DE ar on  6 and  26 , its SR is worse.DE ar outperforms DE go on 24 test functions (about 70.6% of the problems), while DE go surpasses DE ar only on  4 and  6 .Though DE go is faster than DE ar on  6 , its SR is worse.All the algorithms fail to solve 6 functions (i.e.,  19 ,  21 ,  22 ,  23 ,  24 , and  25 ).On  15 and  30 , only DE ar can solve them with a very small ARE while the other three algorithms all fail.The ARE avg between DE r and DE ar is 1.0116, which means that DE ar is on average 1.16% faster than DE r .Similarly, DE ar is on average 1.52% faster than DE r and DE ar is on average 1.38% faster than DE go .

Table 2 :
Comparison among the 4 DE algorithms on test problems  1 - 22 , where "Mean" indicates the mean function error value and "Std.Dev." stands for the standard deviation.The best results among the four algorithms are shown in boldface.

Table 3 :
Comparison among the 4 DE algorithms on test problems  23 - 34 , where "Mean" indicates the mean function error value and "Std.Dev." stands for the standard deviation.The best results among the four algorithms are shown in boldface.

Table 4 :
Average ranking of the 4 DE algorithms.
Table 6 shows the  values of applying Wilcoxon's test among DE ar and the other three DE algorithms.The  values below 0.05 (the significant level) are shown in boldface.From the results, it can be seen that DE ar is significantly better than DE r , DE o , and DE go .5.3.3.Scalability Test: Effect of Problem Dimensionality.The performance of most EAs (including DE) deteriorates quickly with the growth of the dimensionality of the problem.The main reason is that, in general, the complexity of the problem (search space) increases exponentially with its dimension.Here, we show a scalable test of DE ar for /2, , and 2 for each scalable function in our test set.Table 7 summarizes the comparison results of the four DE algorithms for /2, , and 2.From the results, it can be seen that DE ar is not always affected by the growth of dimensionality.For  1 ,  2 ,  3 ,  7 ,  8 ,  17 ,  18 ,  21 , and  28 , DE ar achieves similar performance when the dimension increases from /2 to 2.The performance of DE ar deteriorates quickly with the growth of dimension for five functions ( 4 ,  5 ,  6 ,  19 , and  20 ).For the remaining one function ( 13 ), the growth of dimension does not affect the performance of DE ar .5.3.4.Effect of Different  Settings.In DE ar , a new control parameter  (the number of trial individuals) is added to DE's parameters (ps, , and CR).Since  denotes the number of trial individuals,  should be a positive integer.And when  = 1, DE ar is equal to DE r .So in our study we restrict  to the positive integers within the range of [2, ps].As mentioned above,  was fixed to 3 in all experiments.Such a value was set without any effort to find an optimal value.However the performance of DE ar may be influenced by different settings of  values.

Table 6 :
Wilcoxon test between DE ar and the other three DE algorithms on functions  1 - 34 .The  values below 0.05 are shown in boldface.
embedding AR within DE, DE ar was proposed.Experiments are conducted on 34 benchmark functions.The experimental results can be summarized as follows.(i) DE ar is compared with other DE algorithms such as DE with random initialization (DE r ), DE with opposition-based learning (DE o ), and DE with generalized opposition-based learning (DE go ) with respect to the convergence speed and robustness.The results demonstrate that DE ar performs better than the other three DE algorithms at least on 64.7% of the test functions.Although the other three DE algorithms outperform DE ar on some functions, their success rates are always worse.(ii) DE ar is further compared with DE r , DE o , and DE go with respect to the quality of the final solutions.The results show that DE ar performs better than the other three DE algorithms on the majority (about 70.5%) of test functions.And on the rest of functions, DE ar obtains no worse results than the results of the other three DE algorithms.Statistical comparisons also show that DE ar is the best of the four DE algorithms.(iii) A scalability test of DE ar over 15 test functions with different problem dimensions (/2, , and 2) are conducted.The 15 functions are scalable and chosen from the 34 test functions.The results show that DE ar is not always affected by the growth of dimensionality.For 9 functions, DE ar achieves similar performance when the dimension increases from /2 to 2.For 5 functions, the performance of DE ar deteriorates quickly with the growth of dimension while for the remaining 1 function, the growth of dimension does not affect the performance of DE ar .

Table 7 :
Mean function error values of DE ar for /2, , and 2 for each scalable function in our test set.Note.The values of  for each test function are shown as those in Table 1.

Table 8 :
[10,l solutions obtained on  2 and  3 on interval[10, 100]of  with step size of 10.DE, the idea is general enough to be applied to all other population-based methods (e.g., GA and PSO).Further since AR is new, studies are still required to investigate its benefits, weaknesses, and limitations.The current work can be considered as a first step in applying AR. classical

Table 9 :
Spearman's correlation test results between  and the final solutions on all the test functions.Note.SC denotes the correlation coefficient of the Spearman correlation test.