A Comparative Study of EAG and PBIL on Large-Scale Global Optimization Problems Imtiaz

Estimation of Distribution Algorithms (EDAs) use global statistical information effectively to sample offspring disregarding the location information of the locally optimal solutions found so far. Evolutionary Algorithm with GuidedMutation (EAG) combines global statistical information and location information to sample offspring, aiming that this hybridization improves the search and optimization process.This paper discusses a comparative study of Population-Based Incremental Learning (PBIL), a representative of EDAs, and EAG on large-scale global optimization problems. We implemented PBIL and EAG to build an experimental setup uponwhich simulationswere run.Theperformance of these algorithmswas analyzed in terms of solution quality and computational cost. We found that EAG performed better than PBIL in attaining a good quality solution, but the latter performed better in terms of computational cost. We also compared the performance of EAG and PBIL with MA-SW-Chains, the winner of CEC’2010, and found that the overall performance of EAG is comparable to MA-SW-Chains.


Introduction
Many search and optimization techniques have been developed to solve complex optimization problems, like travelling salesman problem.One widely studied approach in this area is Estimation of Distribution Algorithms (EDAs) [1][2][3].The main difference between traditional evolutionary algorithms [4][5][6], for example, genetic algorithms, and EDAs lies in their offspring generation strategies.Traditional evolutionary algorithms use crossover and mutation to generate new solutions, whereas EDAs use probabilistic models to sample offspring.The probabilistic models are based on global statistical information, extracted from population.According to proximate optimality principle [7], which assumes that good solutions have similar structure, an ideal offspring generator should be able to generate a solution that is close to the best solutions found so far.In this respect, both evolutionary algorithms and EDAs have their own merits and demerits.Evolutionary algorithms allow that the new solutions would not be far away from the best solutions found so far, whereas EDAs have no mechanism to directly control the similarity between an offspring and its parent.On the other hand, EDAs better control the similarity among solutions in the current population because they use the global statistical information effectively to sample offspring.
Evolutionary Algorithm with Guided Mutation (EAG) [8] combines global statistical information (i.e., the EDA approach) and location information (i.e., the traditional evolutionary algorithmic approach) to sample offspring.The authors evaluated the performance of EAG on maximum clique problem and showed promising results.In [9], Khan used numerical optimization functions [10,11] and standard genetic algorithm test problems [12,13] to compare the performance of EAG against two classic EDAs, Population-Based Incremental Learning (PBIL) [14] and Compact Genetic Algorithm [15].The previous studies revealed that EAG performed better than its competitors.However, so far, the performance of EAG was measured on small-scale optimization problems.Therefore, it is unclear to us how scalable EAG is.
In this study, we evaluated thoroughly the performance of EAG against PBIL on a set of benchmark functions for large-scale global optimization problems [16].The results of these two algorithms are also compared with MA-SW-Chains [17], the winner of CEC'2010 competition [16].This paper contributes in two different ways.First, our results

Require:
(learning rate),  (mutation rate),  (mutation shift),  (length of the probability vector), and  (population size) (1) Initialize the probability vector  (2) Repeat Steps 3 to 7 until stopping criteria are met (3) Generate a population of  solutions using  (4) Evaluate the fitness of the solutions (generated in Step 3) (5) Select the  best solutions from  (6) Update the probability vector  as following: ) (7) Mutate the probability vector  as following: if random(0, 1) <  then Pseudocode 1: The PBIL algorithm.
further strengthen the previous findings that a combination of EDAs and traditional evolutionary algorithms works better than standalone EDAs or evolutionary algorithms.Second, our findings indicate that, originally developed for discrete optimization problems, EAG is also suitable for continuous optimization problems.The algorithm is scalable and its performance is comparable to MA-SW-Chains, the winner of CEC'2010.The rest of this paper is organized as follows.In Section 2, we give a background overview of PBIL, EAG, other closely related EDA approaches, and large-scale global optimization problems.The empirical study is outlined in Section 3. We discuss main findings and limitations of our approach in Section 4. The paper concludes in Section 5.

Background
2.1.PBIL.PBIL is a statistical approach to evolutionary computation in which solutions are represented as fixed length binary strings  = ( 1 ,  2 ,  3 , . . .,   ).A probability vector  = ( 1 ,  2 ,  3 , . . .,   ) is used to sample offspring.In this vector,   measures the distribution of 1 s (and consequently 0 s) at the th position of the simulated population in a given search space.Initially, these probabilities are set to 0.5 at each position to give a uniform distribution over the search space.Solutions are generated using ; for each solution  a 1 is generated at position   with probability   .The probabilities in  are then moved gradually towards 1 or 0 as the search progresses.The PBIL algorithm is illustrated in Pseudocode 1.
Apart from stopping criteria and the number of best solutions () used to update the probability vector, PBIL is sensitive to four main parameters: learning rate (), mutation rate (), mutation shift (), and population size ().Among these,  and  are kept low (e.g., 0.1 and 0.02, resp.)[14].

EAG. EAG (Evolutionary Algorithm with Guided Mutation
) is a hybrid of evolutionary algorithms and EDAs.Variation operators in evolutionary algorithms directly use the location information of the locally optimal solutions found so far, disregarding the distribution of promising solutions in the search space.The offspring thus produced are close to their parents, but they may be far away from other best solutions in the current population.This is because evolutionary algorithms do not benefit from the global statistical information.On the other hand, EDAs use the global statistical information effectively to sample offspring, but they disregard the location information of the optimal solutions found so far.This is an important limitation in EDAs because there is no mechanism to directly control the similarity between the new solutions and the current "good" solutions.EAG combines global statistical information and the location information of optimal solutions to sample offspring, aiming at the fact that this hybridization would improve the solution quality.A new variation operator "guided mutation" is developed in EAG.The pseudocode of EAG is similar to PBIL except that the solutions for the next generation are produced using the guided-mutation operator.The guided-mutation operator is illustrated in Pseudocode 2.
The guided-mutation operator is sensitive to the guidedmutation rate .The operator decides on the basis of  to sample new offspring by copying the location information either from the parent or from the probability vector ; with the larger value of , more genes of the offspring () are sampled from the probability vector.EAG is sensitive to learning rate (), guided-mutation rate (), and population size () [8].[1-3, 14, 15, 18-26].Here we review a piece of work, which is devoted to hybrid approaches in EDAs [18,[21][22][23][24][25]; for a detailed review on EDAs, please see [27].Mahnig and Mühlenbein used a hybrid approach in which mutation operators were introduced into EDAs using Bayesian prior [18].They found that the introduction of mutation in EDAs greatly decreases the dependence of an optimal population size.In another study, Handa incorporated mutation operators into EDAs to maintain the diversities in the EDA population [21].His results show that mutation improves the search ability of EDAs, even with a small population size.In [23] EDAs with the variable neighborhood search (VNS) heuristic and found that this hybrid approach performed reasonably well as compared to simple EDA or simple VNS approaches.It is worth mentioning here that the problem dimensions explored in these studies were relatively small; for example, in [21] the problem dimensions were not more than 70.In yet another study [24], Valdez et al. developed a hybrid algorithm that combines EDA with support vector machine for selection of key feature genes.They compared their method with some other hybrid EDAs and found it effective.In a very recent study [25], the authors presented an EDA based on Gaussian probability distribution.They showed that on higher dimension problems their algorithm offered better performance than its competitors.

Large-Scale Global Optimization.
In any empirical investigation of evolutionary algorithms, the selection of test problems is always vital.Care must be taken to try and select problems that will hopefully prove illuminating for the investigation at hand.As a rule of thumb, at least two factors are often considered while selecting test problems: (a) comparison with the previous findings/results and (b) representativeness.Usually, a well-studied and broad range of problems are suitable because they can provide useful means of comparison with the previous experimental results.
In recent years, evolutionary computation community has seen a substantial number of studies to evaluate the performance of evolutionary algorithms on large-scale global optimization problems, such as those with more than one hundred decision variables [16,28].These test problems pose significant challenges not only because of their high dimensionality, that is, "curse of dimensionality" [29], but also because most of them are nonseparable.A function of  variables is separable if it can be rewritten as a sum of  functions of justone variable.
In this study, we compared the performance of PBIL and EAG on the benchmark functions provided for CEC'2010 [16].Their characteristics vary from separable to nonseparable and unimodal to multimodal functions.Moreover, all these functions are shifted, scalable, and minimization problems, whose global minimum value is known, which is 0. These functions are provided in Table 1; for details, see [16].

Empirical Study
We used an empirical approach to compare the strength of EAG and PBIL on large-scale global optimization problems.Three primary aims were pursued in this study: (a) experimental setup upon which simulations are run, (b) optimal parameter settings for EAG and PBIL (formative experiment), and (c) statistical analysis to compare the performance of EAG with PBIL on the selected test problems (summative experiment).The performance of each algorithm was analyzed in terms of solution quality and total elapsed time.In what follows, we discuss the experimental setup, the optimal parameter settings, and the analysis of results in turn.

Experimental
Setup.An empirical investigation requires a system upon which experiments are run.We implemented the system in Matlab-R2009a.A brief description of this implementation is outlined as follows.
(i) The solutions were encoded as binary strings.The length of a string was set to 10 times the dimension of the test problem (in this study, for the summative experiment, the dimension of a problem was set to 1000 for each test function).
(ii) To keep uniformity, the probability vector was initialized to 0.5 for both PBIL and EAG.
(iii) Initial population was sampled randomly for both algorithms using the same initial probability vector; the population size was kept 100 throughout the study.
(iv) Because fitness of an individual is computed from its phenotype value, a function binry2real was implemented which maps binary value (genotype) into the corresponding decimal value (phenotype).
(v) In each generation, best and average fitness of the population were recorded; for each algorithm the total time that elapsed was also recorded.Both algorithms terminated when the maximum number of generations exceeded a preset limit, which was set to 10000 generations for the formative experiment and 3.0 + 06 generations for the summative experiment.

Formative Experiment.
Before running the experiment properly, we conducted a pilot study to find the optimal parameter values for EAG and PBIL.Since parameter tuning in an evolutionary algorithms-based system is a challenging task, a principled approach is required for this purpose.In this study, we were interested in tuning three parameters for EAG, learning rate (), guided-mutation rate (), and population size (), and four parameters for PBIL, learning rate (), mutation rate (), mutation shift (), and population size ().We used the following numerical optimization functions to find the optimal parameter values for each algorithm; for details, see [30].(We encoded these functions as 300-bit strings, 10 consecutive bits for each dimension.)(i) Sphere Function.This is a smooth, unimodal function.
It is separable and relatively easy to optimize.
(ii) Rosenbrock's Function.This is a complex optimization function, which follows a parabolic trajectory.
(iii) Rastrigin's Function.This function has many local minima, but only one global minimum.
(iv) Griewank's Function.This is a multimodal function with an exponentially increasing number of local minima as its dimension increases.
(v) Ackley's Function.This is a multimodal function.It is nonseparable and difficult to optimize.To find the optimal values for the selected sensitive parameters, we changed them in an orderly manner and recorded the fitness value as shown in Algorithm 1.
We ran both algorithms on each test problem in turn for a maximum of 10000 generations; 10 independent runs were performed to stabilize the parameter settings.With regard to EAG, we observed that a smaller value of  (mostly 0.1), a larger value of  (mostly 0.9), and a larger value of  (mostly  > 50) provided better results.We further finetuned the parameter ; keeping  = 100 and  = 0.1, it was observed that  = 0.95 gave much better result; on this combination of parameter values, EAG was able to find the global optimal value for each test function.Similarly, for PBIL, it was observed that a combination of parameter values ( = 0.02,  = 0.05,  = 0.1, and  = 100), the same as reported in [14], gave better results.Therefore, in the summative experiment, we used  = 100,  = 0.1, and  = 0.95 as parameter values for EAG (throughout this study, for EAG, we used mutation rate = 0.02, mutation shift = 0.05, and negative learning rate = 0.075 [8]).Also, for PBIL, the following combination of parameter values was used:  = 0.1,  = 0.02,  = 0.05, and  = 100.

Summative Experiment.
The performance of an evolutionary algorithm can be manifested in different ways.We are interested in two performance gains: (a) quality improvement and (b) speed improvement.However, we mainly focused on the former.This is important, because even though most evolutionary algorithms use sophisticated strategies to find good solutions, finding an acceptably "good-enough" solution is not guaranteed.So, if the solution quality is not "good-enough" then the secondary aspects such as speed are of little consequence.We define that an algorithm  beats an algorithm  under quality criterion if algorithm  attains a converged solution of higher fitness than algorithm .Similarly, an algorithm  beats an algorithm  under speed criterion if algorithm  attains a solution of a given quality/fitness in lesser time than algorithm .It is possible, however, that a gain in quality may be obtained at the expense of time and vice versa.Therefore, in order to prove that the overall performance of an algorithm  is better than , we must show one of the following propositions to be true.P1: algorithm  performs better than algorithm  in both speed and quality.
P2: algorithm  performs better in terms of speed without being outperformed in quality.
P3: algorithm  performs better in terms of quality without being outperformed in speed.
To test these propositions, a series of experiments were run on a Matlab-R2009a system, Sony Vaio Core i5-2430M, with 2.4 GHz speed and 4 GB RAM (DDR3).We recorded (a) the fitness of a solution for each generation, (b) the best fitness by the end of a run, (c) the best fitness at various points (5 + 05, 1 + 06, 1.5 + 06, 2 + 06, and 2.5 + 06) during the evolution process, and (d) the total time that elapsed.

Results and Discussion.
In this section, we present the simulation results of PBIL and EAG on each test problem.To test whether the apparent differences in the performance gain are statistically significant, we also report on a two-tailed pairwise -test.
We measured the quality of a solution  in terms of function error value defined as () − ( * ), where  * is the known global optimum of  [31].Table 2 shows the simulation results of the two algorithms on each test problem in 25 independent runs, including the overall best solution (Best), the mean of best-of-run solution averaged over 25 runs (Mean), the standard deviation in best-of-run solution (), and the  value for a two-tailed -test (the  value measures whether or not the pairwise difference in the best solution for each algorithm in 25 independent runs is statistically significant; moreover, in this study,  < 0.05 is our standard value for statistical significance).
It is clear from Table 2 that both algorithms failed to find the known global optimum value for any function.However, the performance of EAG looks better than PBIL on all functions.It is interesting to note that on separable functions, namely,  1 ,  2 , and  3 , both algorithms offered competitive performance.In the remaining 17 functions,  4 - 20 , the performance of EAG was far better than PBIL.
Figure 1 depicts the evolutions of solutions for some select functions.It is clear from the figures that EAG performed better than PBIL during the search process.It was observed that, for functions  4 - 20 , the average solution quality of EAG at various points (5+05, 1+06, 1.5+06, 2+06, and 2.5+06) was greatly improved over PBIL.
We also compared the performance of EAG and PBIL with MA-SW-Chains [17], the winner of CEC'2010 [16].It is evident from Table 2 that MA-SW-Chains performed better than EAG and PBIL.However, interestingly, we found that, for two separable functions ( 1 and  2 ) and seven partially nonseparable functions ( 4 - 9 and  13 ), the best results of EAG are better than the best mean results of MA-SW-Chains (cf.Table 2, the results in bold).
A one-way ANOVA was used to test whether the apparent differences in the best solutions among the competing algorithms are significant or not.Results are shown in Table 3. Results differed significantly across the three algorithms on 20 select functions: (2, 57) = 7.46, -critical = 2.83,  < 0.5.To further analyze the performance of EAG and PBIL, we report pairwise comparisons using a -test.Pairwise comparisons revealed that EAG outperformed PBIL on 17 functions ( 4 - 20 ):  < 0.01.On the remaining three functions ( 1 - 3 ), the difference between EAG and PBIL was not significant though:  > 0.5.
To test our second performance criterion (speed gain), we recorded time (in milliseconds) taken by each algorithm to finish the search process.As discussed earlier, both PBIL and EAG terminated after a fixed number of generations, which were set to 3.0 + 6.Again, all the simulation results were averaged over 25 independent runs.The results are shown in Table 4, which indicate that PBIL converged more quickly than EAG on all functions.Again, a one-way ANOVA test revealed that the speed differences between EAG and PBIL are statistically significant: (1, 38) = 5.93, -critical = 2.09,  < 0.5 (Table 5).Pairwise comparisons further revealed that the convergence time of PBIL was significantly faster ( < 0.01) than EAG on all functions.These results suggest that EAG is computationally more expensive than PBIL.
None of our propositions (P1, P2, and P3) was found true, because EAG performed better than PBIL in terms of solution quality, but the latter outperformed the former in terms of speed.But as mentioned earlier, in this study, solution quality was our primary objective; therefore we can conclude that EAG offered better performance than PBIL.To  this endeavor, we also explored another dimension of speed gain, the first ever generation in which the best solution was found.Interestingly, it was observed that overall EAG found the best solution earlier than PBIL; these results are also apparent in Figure 1.

General Discussion
The study presented here revealed two primary results.First, EAG outperformed PBIL in attaining a good-quality solution.This indicates that, in sampling offspring, a combination of   global statistical information and the location information of the solutions found so far is better than using global statistical information only.We observed that, on separable functions, both algorithms offered competitive performance but on nonseparable functions EAG outperformed PBIL.This suggests that the underlying assumption in EDAs that problem variables are independent may prevent efficient convergence to the global optimum when problem variables interact strongly.Second, PBIL was found faster than EAG.This suggests that EAG is computationally expensive.The reason behind this expensiveness is that the guided-mutation operator used in EAG involves many computations in sampling offspring.We also observed that the speed of EAG becomes further slow with the increase in chromosome length.
If both solution quality and computational time are addressed, this raises the question of how these two dimensions should be traded off against each other.If the output of one algorithm was better than that of another but was found more slowly, which of the two algorithms should be preferred?Perhaps solution quality should be given more weight as compared to speed.
Finally, the rather high difference between the known global optimum solution and the solution found by PBIL and Note.It is important to mention here that we do not include time for MA-SW-Chains, because the authors did not report on computational time.EAG could be because of our suboptimal encoding scheme.We encoded the solutions (genotype) as bitstrings, but fitness of these solutions was computed from their phenotype values.Therefore, binary (genotype) to decimal (phenotype) conversion was necessary, and this conversion could have resulted in sufficient accuracy loss.It was also observed that this conversion takes a significant amount of time and as a whole degrades the performance of an algorithm in terms of speed as well.

Conclusion
This paper has described a comparative study of EAG and PBIL on large-scale global optimization problems.In a nutshell, we found that combining global statistical information and the location information of the solutions found so far significantly improves the quality of search/optimization process.We observed that, on separable functions, both algorithms offered competitive performance but on nonseparable functions EAG outperformed PBIL.These findings suggest that the location information should be used and it is not sufficient to use a very limited number of dependence relationships (i.e., the EDAs approach) to solve optimization and search problems.We also observed that EAG achieved better solution quality at the expense of more computational cost.We conclude that the computational overheads should not penalize EAG, because if the solution quality is not goodenough (e.g., in a fault-critical situation) then the secondary aspects such as speed are of little consequence.The results show that, originally developed for discrete optimization problems, EAG is also suitable for continuous optimization problems.We found that the solution quality of EAG is comparable to MA-SW-Chains, the winner of CEC'2010.

EAG for 𝑁 from 10 to 100
Step 10 for  from 0.1 to 1.0 Step 0.1 for  from 0.1 to 1.0 Step 0.1 Record solution-fitness value end for end for end for PBIL for  from 10 to 100 Step 10 for  from 0.1 to 1.0 Step 0.1 for  from 0.01 to 0.1 Step 0.01 for  from 0.01 to 0.1 Step 0.01 Record solution-fitness value end for end for end for end for Algorithm 1: Parameter tuning mechanism for EAG and PBIL.

Table 2 :
The solution quality of PBIL and EAG on CEC'2010 benchmark functions.The boldface values indicate that the best results of EAG are better than the best mean results of MA-SW-Chains. *

Table 3 :
One-way ANOVA test results (solution quality).

Table 4 :
Computational time of PBIL and EAG on CEC'2010 benchmark functions.