Evaluating Multiobjective Evolutionary Algorithms Using MCDM Methods

1Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information Science & Technology, Nanjing 210044, China 2Research Center for Prospering Jiangsu Province with Talents, Nanjing University of Information Science & Technology, Nanjing 210044, China 3China Institute for Manufacture Developing, Nanjing University of Information Science & Technology, Nanjing 210044, China 4School of Management Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China


Introduction
Without a loss of generality, the mathematical formula of multiobjective problems (MOPs) can be expressed as follows: min  () = ( 1 () ,  2 () , . . .,   ())  ∈ Ω, (1) where  →  = ( 1 ,  2 , . . .,   ) is the decision vector in the decision space Ω. () is the objective function.Generally speaking, the objectives contradict each other.We cannot find a single solution to optimize all the objectives.Optimizing one objective often leads to deterioration in at least one objective.
In the single optimization, the algorithm performance can be evaluated by the difference between () and function optimal value.However, the method cannot be adopted in MOPs.In order to solve the problem, many criteria are proposed to evaluate the performance of MOEAs.In fact, the experiment results of almost every algorithm indicate that the proposed algorithm is competitive compared with the stateof-the-art algorithms.Nondominated objective space and box plot are adopted in SPEA2 [4].NSGAII employs convergence and diversity metrics to compare with SPEA and PAES [8].The set convergence and inverted generation distance (IGD) are used to evaluate the performance of MOEAD [9].

Five performance measures
Two MCDM methods Epsilon indicator is used in IBEA [11].Convergence measurement, spread, hypervolume and computational time are selected as performance metrics in epsilon-MOEA [12].To validate the proposed MOPSO, four quantitative performance indexes (success counting IGD, set coverage, two-set difference hypervolume) and qualitative performance index (plotting the Pareto fronts) are adopted [13].Three quality indicators, additive unary epsilon indicator, spread, and hypervolume, are considered in SMPSO [14].Spacing, binary metrics  and  are used in GD3 [15].Three metrics, generation distance (GD), spread, and hypervolume, are used to estimate ABYSS [16].GD, diversity, computational time, and box plot are considered as measurement in MOSOS [17].GD and diversity metrics are adopted in MOEDA [18].There are three metrics, GD, IGD, and hypervolume, in GAMODE [19].Among these metrics, some focus on the convergence of MOEAs, while some pay attention to the diversity of MOEAs.Convergence is to measure the ability to attain global Pareto front and diversity is to measure the distribution along the Pareto front.It is observed that every proposed algorithm often introduces few metrics to estimate the performance based on the results from benchmarks.The conclusions of these MOEAs are that they are the best and competitive.However, it is unfaithfully to measure MOEAs performance by one or two metrics.Every metric can just demonstrate some specific qualification of performance while neglecting other information.For instance, the metric GD can provide information about the convergence of MOEAs, but it cannot evaluate the diversity of MOEAs.Therefore, these evaluations are not comprehensive.It cannot entirely estimate the whole performance of MOEAs.As evaluation of MOEAs involves many metrics, it can be regarded as a multiple-criteria decision making (MCDE) problem.MCDE techniques can be used to cope with the problem.In order to overcome the problem and make fair comparisons, a framework using MCDE methods is proposed.In the framework, comprehensive performance metrics are established, in which both convergence and diversity are considered.Two MCDE methods are employed to evaluate six MOEAs.The efforts can give more fair and faithful comparisons than single metric.
The rest of this paper is organized as follows: Section 2 proposes the framework, in which six algorithms, five performance metrics, and two MCDM methods are briefly introduced.Experiments are presented in Section 3 and conclusions are illustrated in Section 4.

Evaluation Framework
A framework is proposed to evaluate multiobjective algorithms in Figure 1.Six MOEAs, five performance metrics, and two MCDM methods are employed in the framework.

Six MOEAs
(1) NSGAII [8].NSGAII was proposed to solve the high computational complexity, lack of elitism, and specifying of the sharing parameter of NSGA.In NSGAII, a selection operator is designed by creating a mating pool to combine the parent population and offspring population.Nondominated sort and crowding distance ranking are also implemented in the algorithm.
(2) PAES [2].The Pareto archived evolution strategy (PAES) is a simple evolutionary algorithm.The algorithm is considered as a (1 + 1) evolution strategy, employing local search from a population of one but using a reference archive of previously found solutions in order to identify the approximate dominance ranking of the current and candidate solutions vectors.
(3) SPEA2 [4].The strength Pareto evolutionary algorithm (SPEA) was proposed in 1999 by Zitzler.Based on the SPEA, an improved version, namely, SPEA2, was proposed, which incorporated a fine-grained fitness assignment, a density estimation technique, and an enhanced archive truncation method.
(4) MOEAD [9].Multiobjective evolutionary algorithm based on decomposition (MOEAD) was proposed by Li and Zhang.It decomposes a multiobjective optimization problem into a number of scalar optimization subproblems and optimizes them simultaneously.Each subproblem is optimized by only using information from neighboring subproblems, which makes the algorithm effective and efficient.It won the outstanding paper award of IEEE Transactions on evolutionary computation.
(5) MOPSO [13].Multiobjective particle swarm optimizer (MOSPO) is based on Pareto dominance and the use of a crowing factor to filter out the list of available leaders.Different mutation operators act on different subdivisions of the swarm.The epsilon-dominance concept is also incorporated in the algorithm.
(6) SMPSO [14].Speed-constrained multiobjective PSO (SMPSO) was proposed in 2009.It allows generating new particle position in which velocity is extremely high.The turbulence factor and an external archive are designed to store the nondominated solutions found during search.

Performance Metrics.
Nowadays, there are many metrics to measure performance of MOEAs.Among them, the following five metrics are widely employed.They can reveal the convergence and diversity of MOEAs very well.However, many researches just employ a few of them to evaluate algorithms and argue that their proposed algorithms are the best.In fact, it is unfair to give the conclusion without comprehensive metrics and evaluations.Therefore, the five metrics are selected to make the comprehensive comparisons.
(1) GD where   = min  ‖(  ) − PF true (  )‖ is the distance between nondominated solution (  ) and the nearest Pareto front solution in objective space.It is to measure the closeness of the solutions to the real Pareto front.If GD is equal to zero, this reveals that all the nondominated solutions generated are located in the real Pareto front.Therefore, the lower value of GD indicates that the algorithm has better performance [20].
(2) IGD.PF true is a set of uniformly distributed points in the objective space.  is the nondominated solution set obtained by an algorithm and the distance from PF true to   is defined as where (,   ) is the minimum Euclidean distance between V and the points in   .Algorithms with smaller IGD values are desirable [21,22].
(3) Hypervolume.This hypervolume metric calculates the volume (in the objective space) covered by members of nondominated solutions sets obtained by MOEAs where all objectives are to be minimized [16].A hypervolume can be calculated as follows: The larger the HV value is, the better the algorithm is.
(4) Spacing.The metric spacing is to measure how uniformly the nondominated set is distributed.It can be formulated as follows: where   is the same as the   in GD metric,  is the average value of   , and n is the number of individuals in nondominated set.The smaller the spacing is, the better the algorithm performs [23,24].
(5) Maximum Pareto Front Error.It is to measure the worst case and can be formulated as follows: where   is the same as   employed in GD.MPFE is the largest distance among these   .The lower the value of MPFE is, the better the algorithm is [25].
In order to elaborate the five metrics, Figure 2(a) reveals the distance used in GD, space, and MPFE. Figure 2(b) presents the distance in IGD metric, and Figure 2(c) depicts the HV metric.

TOPSIS.
TOPSIS is one of MCDM methods to evaluate alternatives.In TOPSIS, the best alternative should have two characteristics: one is the farthest from the negativeideal solution and the other one is the nearest to positiveideal solution.The negative-ideal solution is a solution that maximizes the cost criteria and minimizes the benefit criteria, which has all the worst values attainable from the criteria.The positive-ideal solution minimizes the cost criteria and maximizes the benefit criteria.It is consisted of all the best values attainable from the criteria [26,27].TOPSIS consists of the following steps.
Step 1 (obtain decision matrix).If the number of alternatives is  and the number of criteria is , decision matrix with  rows and  columns will be obtained as in Table 1.
Step 4 (find the negative-ideal and positive-ideal solutions).
where   is associated with cost criteria and   is associated with benefit criteria.
Step 5 (calculate the -dimensional Euclidean distance).The separation of each algorithm from the ideal solution is presented as follows: The separation of each algorithm from the negative-ideal solution is defined as follows: Step 6 (calculate the relative closeness to the ideal solution).
The relative closeness of the algorithm jth is defined as Step 7 (rank algorithms order).The  is between 0 and 1.
The larger the  is, the better the algorithm  is.

VIKOR Method.
The VIKOR was proposed by Opricovic and Tzeng [28][29][30][31].The method is developed to rank and select from a set of alternatives.The multicriteria ranking index is introduced based on the idea of closeness to the ideal solutions.The VIKOR requires the following steps. Step where   is the value of th criterion for alternative   ,  is the number of criteria, and J is the number of alternatives.
Step 2.   and   ( = 1, 2, . . ., ) can be formulated as follows: where   is the weight of th criteria.  and   are employed to measure ranking.
Step 3. Compute the values   ( = 1, 2, . . ., ) as follows: where the alternative obtained by  * is with a maximum utility, the alternative acquired by  * is with a minimum individual regret of the opponent, and  is the weight of the strategy of the majority of criteria and is often set to 0.5.
Step 4. Rank the alternatives in decreasing order.Rank the three measurements, respectively: , , and .
Step 5.The alternative  is considered as the best if the following two conditions are met: , where   is the alternative with second position in the ranking list by  and  is the number of alternatives; C2: alternative  should be the best ranked by  or .

Experiments
The experiments are designed to evaluate the above six algorithms.In order to make fair comparisons, thirteen test benchmark functions are widely used in MOPs and employed in the experiments.They can be divided into two groups: ZDT suites and WFG suites.All of these test suites are minimization of the objective.The detailed information is given in Table 2 [32,33].
The mathematical forms of WFG can be obtained in [32] and ZDT suites are presented as follows: ZDT2: ZDT3: ZDT6: The parameters settings of these algorithms are the same as the original paper.The maximum function evaluations are set to 25,000.Each algorithm runs thirty times and the average values of performance metrics are obtained.

Results.
In order to elaborate the whole calculation process, the ZDT1 results of five metrics are presented in Table 3.The four metrics GD, IGD, MPFE, and spacing of SMPSO are the smallest, and hypervolume is the biggest.PAES is the worst because the performances of five metrics are the worst among the six algorithms.Normalized decision matrix of five performances metrics is presented in Table 4. Suppose that the weight is equal to 1/5.Thus, according to Table 4, positive-ideal and negative-ideal solutions can be defined as follows: + = {0.0163,0.0553, 0.0239, 0.0410, 0.4150} × 1 5 ; − = {0.9939,0.9864, 0.9981, 0.9115, 0.3805} × 1 5 ; (20) then, the distances  + and  − are calculated according to (10) and (11), demonstrated in Table 5.The global performance of each algorithm is determined by  * , calculated in ( 12) and presented in Table 5.Therefore, the ranking of six algorithms is as follows: SMPSO > SPEA2 > MOPSO > NSGAII > MOEAD > PAES.SMPSO is the best algorithm and PAES is the worst one for ZDT1.
For VIKOR method, the Q, S, and R are calculated and presented in Table 6.According to the feature of Q, S, and R, SMPSO is the best one while PAES is the worst one.SPEA2 is better than MOEAD.However, as the condition (  ) − () > 1/(6 − 1) = 0.2 cannot be satisfied, S value is used to determine the ranking for NSGAII, SPEA2, MOEAD, MOPSO.Therefore, the ranking among six algorithms is SMPSO > SPEA2 > MOPSO> NSGAII > MOEAD > PAES.
However, TOPSIS and VKIOR methods have different rankings for WFG1, WFG6, and WFG7.Take the WFG1 as an instance.The final values of TOPSIS and VKIOR are presented in Table 9.As there are six algorithms, J is set to six, 1/( − 1) = 1/(6 − 1) = 0.2.It indicates that the Q value difference between two algorithms should be more than 0.2.Otherwise, the rank between the two algorithms is determined by S or R.
From Table 9, it can be noticed that this condition cannot be met between NSGAII and SPEA2, so the values S are used to compare the two algorithms.The value of NSGAII is smaller than that of SPEA2, so NSGAII is better than  SPEA2, NSGAII is the first, and SPEA2 is the second.However, TOPSIS directly uses CC as the ranking criteria.
The CC value of SPEA2 is bigger than NSGAII, so it can be ranked the first and NSGAII is the second in TOPSIS method.
3.2.Discussion.In order to further make comparisons, the best and worst performances of the above six algorithms are selected.The nondominated solutions obtained by the two kinds of algorithms are depicted in Figures 3-5.WFG1, NSGAII, and SPEA2 achieve the best ranking according to  ZDT6 is a biased function as the first objective function value is larger compared to the second one.MOEAD obtains superior results, so, if the problem has the feature, MOEAD should be chosen.
From Tables 7 and 8, the no-free-lunch theorem can also be observed: any optimization algorithm improves performance over one class of problems exactly paid for in loss over another class.No algorithm can achieve the best or worst performance for all test functions.

Conclusions
There are many MOEAs.When a multiobjective optimization algorithm is proposed, the experiment results often indicate that the algorithm is competitive based on one or two performance metrics.Generally, these comparisons are unfair and the results are unfaithful.In order to make fair comparisons and rank these MOEAs, a framework is proposed to evaluate MOEAs.The framework employs six well-known MOEAs, five performance metrics, and two MCDM methods.The six MOEAs are NSGAII, PAES, SPEA2, MOEAD, MOPSO, and SMPSO.The five performance metrics GD, IGD, MPFE, spacing, and hypervolume are selected, in which both convergence and diversity of nondominated solutions are fully considered.Two methods are TOPSIS and VIKOR.
The results have indicated that SPEA2 is the best algorithm and PAES is the worst one.However, SPEA2 cannot perform well on all test functions and PAES also does not achieve the worst performance on all test functions.The experiment results are consistent with the no-free-lunch theorem.What is more, the observation of experiment results shows that the ability of MOEAs to solve MOPs depends on both MOEAs and the features of MOPs.

Figure 2 :
Figure 2: The distance and nondominated solutions used in above metrics.

Table 1 :
The multiple attribute decision matrixes.

Table 3 :
Five metric results of ZDT1 results.

Table 4 :
Normalized decision matrix of five performance metrics.

Table 6 :
The results of , , and  from VIKOR.

Table 9 :
The results of , , S, and R value from TOPSIS and VIKOR.