Decomposition-Based Multiobjective Evolutionary Optimization with Adaptive Multiple Gaussian Process Models

,


Introduction
Multiobjective optimization problems (MOPs) widely exist in the fields of scientific research and engineering applications. Since the first multiobjective evolutionary algorithm (MOEA) was reported in 1985 [1], MOEAs have been studied sufficiently and become one of the hottest research directions in the field of evolutionary computation [2][3][4]. Internationally, MOEAs represented by NSGA-II [5], SPEA2 [6], PAES [7], HypE [8], MOEA/D [9], etc., have been widely used in many application fields. According to the selection strategy to handle the convergence enhancement and diversity maintenance, most of existing MOEAs can be roughly divided into the following three categories.

Pareto-Based MOEAs.
e basic idea of such algorithms, represented by NSGA-II and SPEA2, is to use Pareto-based ranking scheme for sorting the population into different convergence layers and then calculate the density of individuals in the last layer. In this way, the population can be sorted according to the dominance relationship and density estimation, and then the relatively superior individuals are selected to the next generation. Crowded distance [5], Knearest neighbor method [6], ε-domination [10,11], grading [12], and other methods [13][14][15] are often used to estimate the density of the individuals. As Pareto-based MOEAs have some advantages like simple principle, easy understanding, and fewer parameters, this kind of MOEAs has induced many research and extensive applications. However, their ability to guarantee convergence dramatically degrades when the number of objectives is larger than three, mainly due to the loss of selection pressure.

Decomposition-Based MOEAs. Decomposition-based
MOEAs transform a MOP into a set of subproblems and then solve them simultaneously using a collaborative evolutionary search, such as MOEA/D [9], RVEA [16], NSGA-III [17], RPEA [18], SPEA/R [19], and RdEA [20]. Note that most of these algorithms adopt additional reference information (reference vectors, reference points, or weight vectors) during the environmental selection, which helps to maintain the diversity of the population. Due to the advantage with a better mathematical explanation, decomposition-based MOEAs have become very popular in recent years [21][22][23][24].

Indicator-Based MOEAs.
Indicator-based MOEAs directly employ a performance indicator, like hypervolume (HV), generational distance (GD), and R2, to effectively guide the selection of promising solutions for next generation. IBEA [25], MOMBI-II [26], HyPE [8], GD-MOEA [27], R2-IBEA [28], and DDE [29] are representatives for the indicator-based MOEAs. In these MOEAs, performance indicators are used as selection criterion to rank nondominated solutions that cannot be distinguished by traditional Pareto dominance. However, for this kind of MOEAs, high computational complexity is usually required for calculating the performance indicator, which is very challenging especially when the number of objectives is large.
ese MOEAs usually include two main components, i.e., variation and selection [6]. Selection plays an important role in MOEAs to maintain the promising solutions, as introduced above for classifying MOEAs, while variation is the key factor to determine the generation quality of solutions. In [30], the effects of several variation operators are studied on some test problems with variable linkages, showing that variable linkages may cause some difficulties for MOEAs. Actually, there are a number of real-valued variation approaches having been proposed during the recent decades [31], which can be classified into the three main kinds, i.e., traditional recombination operators, estimation of distribution algorithms, and the inverse model methods for variation.
Traditional recombination operators are generally used in most existing MOEAs due to their simplicity. is kind of variation simulates the binary crossover method to produce the real-valued offspring, such as simulated binary crossover (SBX) [32], Laplace crossover [33], parent central crossover [34], blend crossover α [35], unimodal normal distribution crossover [36], and simple crossover [37]. Moreover, differential evolution (DE) [38] is also used as the recombination operator in many MOEAs, which samples offspring based on the difference vectors of parents. e evolution paths are used in DE [39] to depict the population movement and predict its tendency, which could produce potential solutions to speed up convergence toward the PS. Recently, a number of hybridized recombination operators have been proposed, trying to combine the advantages of differential recombination operators. In [40], a cooperative DE framework is designed for constrained MOPs, in which multiple DE operators are run in different subpopulations to optimize its own constrained subproblem. In [41], four DE operators are combined and a sliding window is adopted in [42] to provide the reward for each DE operator according to the enhancement on subproblems. Similarly, four DE operator pools are presented in [43] including two DE operators with complementary search patterns in each pool to provide an improved search ability. In ACGDE [44], an adaptive cross-generation DE operator is designed by exploiting information from individuals across generations to adapt the parameters. e estimation of distribution algorithms (EDAs) [45] exploit the probabilistic models extracted from the population's distribution to run variation [46,47]. Unlike the above traditional recombination operators, no crossover or mutation procedures will be run in EDAs. Instead, the globally statistical information of the selected parents is used in EDAs to build a posterior probability distribution model and then offspring are sampled from this built model. Several EDAs are studied for solving continuous MOPs in [48,49], while the Gaussian distribution model, mixture Gaussian distribution model, or mixed Gaussian with the principal component analysis (PCA) model are introduced in [50] for offspring variation. Different from the individual's information used in traditional recombination operators, the local PCA operator is employed in [45,47] to generate new offspring. Different from [47], the PCA model was replaced by the locally linear embedding (LLE) [51] model in [52]. In this way, this model only considers the decision space, without considering too much about the target MOP itself. e inverse model methods for variation utilize machine learning approaches to capture the connection from the objective space to the decision space by exploiting the characteristics of the target MOP. e representative algorithms like IM-MOEA [53] and E-IM-MOEA [54] use the Gaussian process-based inverse model to complete crossover. In another work [55] as inspired by LLE manifold learning idea, a new LLE modeling approach is introduced to use the mapping function known in the MOP that the decision space is considered as the high-dimensional space and the objective space is regarded as a low-dimensional space. us, this new modeling method is no longer to build the overall low dimensional space of the sample, which is then reflected back to the high dimensional space, but it replaces directly by constructing new samples in high dimensional space. In this way, the model mapping from the objective space to the decision space is built based on the obtained approximated Pareto set during the evolution. In other words, this model can use the probability model or surrogate model to build a bridge from the objective space to the decision space.
In the above three kinds of variation operators, traditional recombination operators are often used in many MOEAs. However, as these operators are executed based on the individuals, they only provide a finite number of search patterns and are also criticized for lack of mathematic explanation. For the inverse model methods for variation, it is very changeling for mapping from the objective space into the decision space when tackling some complicated MOPs. EDAs usually adopt one standard Gaussian process model with fixed variance, which may not work well for various kinds of MOPs. To enhance the search capability of EDAs, this paper introduces a decomposition-based MOEA with adaptive multiple Gaussian process models, called MOEA/ D-AMG, which is effective in generating more superior offspring. Based on the performance enhancements on a number of decomposed subproblems, one suitable Gaussian 2 Complexity process model will be selected accordingly. In this way, our method is more intelligent and can provide more search patterns to produce the diversified solutions. After evaluating our performance on some well-known F, UF, and WFG test instances, the experimental results have validated the superiorities of our algorithm over six competitive MOEAs (MOEA/D-SBX [9], MOEA/D-DE [56], MOEA/D-GM [57], RM-MEDA [47], IM-MOEA [53], and AMEDA [50]). e rest of this paper is organized as follows. Section 2 presents some background information of MOPs, Gaussian process model, and some Gaussian process model-based MOEAs. Section 3 introduces the details of MOEA/D-AMG. At last, Section 4 provides and discusses the simulation results, while Section 5 gives the conclusions and some future work.

Multiobjective Optimization Problems.
In real life, there exist a number of engineering problems with multiple complicated optimization objectives, which are often called multiobjective optimizations problems (MOPs). is paper considers to solve continuous MOPs, as formulated below: where n and m are, respectively, the numbers of decision variables and objectives, a i and b i are, respectively, the lower and upper bounds of i th dimensional variable of x in the decision space, x � (x 1 , . . . , x n ) T ∈ R n is a decision variable vector, n i�1 [a i , b i ] ⊂ R n is the feasible search space, f i : R n ⟶ R, i � 1, . . . , m, is a continuous mapping, and F(x) consists of m continuous objective functions.
In MOPs, the conflicts often exist among different objectives, i.e., improvement of one objective results in deterioration of another. Generally, there does not exist an optimal solution that can minimize all the objectives in (1) at the same time. A set of trade-off solutions among the objectives can be found for solving MOPs, which are equally optimal when considering all the objectives. Suppose that there are two vectors p � (p 1 , . . . , p m ) T and q � (q 1 , . . . , q m ) T ∈ R m , where m is the number of objectives. p is said to dominate q in equation (1), denoted by p ≺ q, if p i ≤ q i for all i � 1, . . ., m, and p ≠ q. A solution x * ∈ Ω, where Ω is the feasible search space in equation (1), is called Pareto optimal solution if and only if there is no another x ∈ Ω such that F(x) ≺ F(x * ). e collection of all the Pareto optimal solutions is called Pareto optimal set (PS), and the projection of PS in the objective space is called Pareto optimal front (PF).

Gaussian Process Model.
e recombination operator based on the Gaussian process model belongs to EDA. Different from the classical recombination operators, the recombination using Gaussian process model uses the distribution information from the whole population to generate offspring, which can ensure the diversity of search patterns.
e Gaussian model is one of the most widely used probability models in scientific research and practical applications [58][59][60][61]. In general, a random variable x � (x 1 , x 2 , . . . , x n ) T with a Gaussian distribution can be expressed as where μ is an n dimensional mean vector and is the covariance matrix. e probability density function of the random variable is expressed as For a given set of data x 1 , x 2 , . . . , x K , the mean vector and covariance matrix are, respectively, obtained by Hence, a new solution x � x 1 , x 2 , . . . , x n can be generated using the Gaussian model, which can be divided into three steps of Algorithm 1. At first, decompose the covariance matrix into a lower triangular matrix A by using the Cholesky decomposition method [62] in line 1, where � AA T . en, generate a vector y � (y 1 , . . . , y n ) T in line 2, of which the element y i , y � 1, . . . , n, is sampled from a standard Gaussian distribution N(0, 1). After that, a new trial solution x can be yielded by x � μ + Ay in line 3. Generally, to improve the search capability of the Gaussian model, a small variation is added to mutate the new solution y by the polynomial mutation operator [2].

Gaussian Process Model-Based MOEAs.
In recent years, several Gaussian process model-based MOEAs have been proposed. eir details are, respectively, introduced in the following paragraphs. At last, the motivation of this paper is clarified.
In [63], a mixture-based multiobjective iterated density estimation evolutionary algorithm (MIDEA) with both discrete and continuous representations is proposed. is approach employs clustering analysis to discover the nonlinear distribution structure, which validates that a simple model is feasible to describe a cluster, and the population can be adequately represented by a mixture of simple models. After that, a mixture of Gaussian models is proposed to produce new solutions for continuous real-valued MOPs in MIEDA.
Although MIEDA is very effective for solving certain MOPs, the regularity property of MOPs is not considered so that it may perform poorly in some problems.
us, a multiobjective evolutionary algorithm based on decomposition and probability model (MOEA/D-MG) is designed in [57]. In this approach, multivariate Gaussian models are Complexity embedded into MOEA/D [9] for continuous multiobjective optimization. Either a local Gaussian distribution model is built around a subproblem based on the neighborhood or a global model is constructed based on the whole population.
us, the population distribution can be captured by all the probability models working together. However, MOEA/D-MG has to reconstruct a Gaussian model for each subproblem, resulting in a large computational cost in the process of building the model. Moreover, when building the Gaussian model for the similar subproblems, some individuals may be repeatedly used, which aggravates the computational resources.
To reduce the computational cost for the modeling process, an improved MOEA/D-MG with high modeling efficiency (MOEA/D-MG2) is reported in [64], where the neighboring subproblems can share the same covariance matrix to build Gaussian model for sampling solutions. At first, a process of sorting population is used to adjust the sampling orders of the subproblems, which tries to avoid the insufficient diversity caused by the reuse of the covariance matrix and can obtain the uniformly distributed offspring set. en, in global search, only some subproblems are selected randomly to construct the Gaussian model. Although MOEA/D-MG2 performs well for solving some MOPs, it just builds the Gaussian model for solution sampling in the neighborhoods as defined in MOEA/ D, which is only used in the MOEA/D paradigm.
Moreover, an adaptive multiobjective estimation of distribution algorithm with a novel Gaussian sampling strategy (AMEDA) is presented in [50]. In this work, a clustering analysis approach is adopted to reveal the structure of the population distribution. Based on these clusters, a local multivariate Gaussian model or a global model is built for each solution to sample a new solution, which can enhance the accuracy of modeling and the searching ability of AMEDA. Moreover, an adaptive update strategy of the probability is developed to control the contributions for two types of Gaussian model.
From the above studies, it can be observed that their modeling differences mainly focus on the methods of selecting sampling solutions. However, the above Gaussian models in [50,63,64] only adopt the standard one with fixed variance, which cannot adaptively adjust the search step sizes according to the different characteristics of MOPs. To alleviate the above problem, a decomposition-based MOEA with adaptive multiple Gaussian process models (called MOEA/D-AMG) is proposed in this paper. Multiple Gaussian models are used in our algorithm by using a set of different variances.
en, based on the performance enhancements of decomposed subproblems, a suitable Gaussian model will be adaptively selected, which helps to enhance the search capability of MOEA/D-AMG and can well handle various kinds of MOPs as validated in the experimental section.

The Proposed Algorithm
In this section, our proposed algorithm MOEA/D-AMG is introduced in detail. At first, the adaptive multiple Gaussian process models are described. en, the details of MOEA/D-AMG are demonstrated.

Adaptive Multiple Gaussian Process Models.
Many approaches have been designed by taking advantages of the regularity in distributions of Pareto optimal solutions in both the decision and objective spaces to estimate the population distribution. Considering the mild conditions, it can be induced from the Karush-Kuhn-Tucker (KTT) condition that the PF is (m-1)-dimensional piecewise continuous manifolds [65] for an m-objective optimization problem. at is to say, the PF of a continuous biobjective MOP is a piecewise continuous curve, while the PF of a continuous three-objective MOP is a piecewise continuous surface. us, the Gaussian process model has been widely studied for both single-objective and multiobjective optimization [66][67][68][69][70][71][72]. However, a single Gaussian process model is not so effective for modeling the population distribution when tackling some complicated MOEAs as studied in [57]. In consequence, multiple Gaussian models are used in this paper, which can explicitly exploit the ability of different Gaussian models with various distributions. By using multiple Gaussian process models with good diversity, a more suitable one should be adaptively selected to capture the population structure for sampling new individuals more accurately. us, five types of Gaussian models are used in this paper, which have the same mean value 0 with different standard deviations, i.e., 0.6, 0.8, 1.0, 1.2, and 1.4. e distributions of five Gaussian models (as, respectively, represented by g1, g2, g3, g4, and g5) are plotted in Figure 1.
To select a suitable one from multiple Gaussian process models, an adaptive strategy is proposed in this paper to improve the comprehensive search capability. e probabilities of selecting these different models are dependent on their performance to optimize the subproblems. Once a Gaussian process model is selected, it will be used to generate the Gaussian distribution variable y in line 3 of Algorithm 1. Before the adaptive strategy comes into play, these predefined Gaussian models are selected with an equal probability. In our approach, a set of offspring solutions will be produced by using different Gaussian distribution variables y. After analyzing the quality of these new offspring solutions, the contribution rate of each Gaussian process model can be calculated. It should be noted that the quality of the new offspring is determined by their fitness value, which can be calculated by many available methods. In our work, the Tchebycheff approach in [9] is adopted, as follows: (1) Obtain a lower triangular matrix A through decomposing the covariance matrix using Cholesky, and � AA T (2) Generate a single Gaussian distribution variable ALGORITHM 1: Sampling model. 4 Complexity where z is a reference point, λ i is the i th weight vector, and m is the number of objectives. A smaller fitness value of new offspring which is generated by Gaussian distribution variable indicates a greater contribution for the corresponding Gaussian model. Hence, for each Gaussian model, the improvement of fitness (IoF) value can be obtained as follows: where Fe k,G is the fitness of new offspring with the k th Gaussian distribution in generation G. In order to maximize the effectiveness of the proposed adaptive strategy, this strategy is executed by each of LP generations in the whole process of our algorithm. Afterwards, the contribution rates (Cr) of different Gaussian distributions can be calculated as follows: where ε is a very small value, which works when IoF is zero. en, the probabilities (Pr) of different distributions being selected are updated by the following formula: where K is the total number of Gaussian models. As described above, we can adaptively adjust the probability that the k th Gaussian distribution is selected by updating the value of Pr k,G .

e Details of MOEA/D-AMG.
e above adaptive multiple Gaussian process models are just used as the recombination operator, which can be embedded into a stateof-the-art MOEA based on decomposition (MOEA/D [9]), giving the proposed algorithm MOEA/D-AMG. In this section, the details of the proposed MOEA/D-AMG are introduced.
To clarify the running of MOEA/D-AMG, its pseudocode is given in Algorithm 2 and some used parameters are introduced below: (1) N indicates the number of subproblems and also the size of population. (2) K is the number of Gaussian process models.
(3) G is the current generation. (4) ξ is the parameter which is applied to control the balance between exploitation and exploration. (5) gss is the number of randomly selected subproblems for constructing global Gaussian model. (6) LP is the parameter to control the frequency of using the proposed adaptive strategy. Lines 1-2 of Algorithm 2 describe the initialization process by setting the population size N and generating N weight vectors using the approach in [56] to define N subproblems in (5). en, an initial population is randomly generated to include N individuals and the neighborhood size is set to select T neighboring subproblems for constructing the Gaussian model. en, the reference point z � z 1 , z 2 , . . . , z m can be obtained by including all the minimal value of each objective. In line 3, multiple Gaussian distributions are defined, and the corresponding initial probabilities are set to an equal value. en, a maximum number of generations is used as the termination condition in line 4 and all the subproblems are randomly selected once at each generation in line 5. After that, the main evolutionary loop of the proposed algorithm is run in lines 5-26. For the subproblem selected at each generation, it will go through three components, including model construction, model updation, and population updation as observed from Algorithm 2.
Lines 6-10 run the process of model construction. As controlled by the parameter ξ, a set of subproblems B is either the neighborhood of current subproblem B i in line 7 or gss subproblems selected randomly in line 9. If the neighborhood of current subproblem is used, the local multiple Gaussian models are constructed for running exploitation; otherwise, the global models are constructed for running exploration. At last, the generation counter G is added by 1 and the reference point z is updated by including the new minimal value for each objective. If the termination in line 4 is not satisfied, the above evolutionary loop in lines 5-26 will be run again; otherwise, the final population will be outputted in line 28.

Test Instances.
Twenty-eight unconstrained MOP test instances are employed here as the benchmark problems for empirical studies. To be specific, UF1-UF10 are used as the benchmark problems in CEC2009 MOEA competition and F1-F9 are proposed in [48]. ese test instances have complicated PS shapes. We also consider WFG test suite [49] with different problem characteristics, including no separable, deceptive, degenerate problems, mixed PF shape, and variable dependencies. e number of decision variables is set to 30 for UF1-UF10 and F1-F9; for WFG1-WFG9, the numbers of position and distance-related decision variable are, respectively, set to 2 and 4, while the number of objectives is set to 2.

Performance Metrics
e Inverted Generational Distance (IGD) [73]. Here, assume that P * is a set of solutions uniformly sampled from the true PF and S represents a solution set obtained by a MOEA [73]. e IGD value from P * to S will calculate the average distance from each point of P * to the nearest solution of S in the objective space, as follows: where dist (x, S) returns the minimal distance from one solution x in P * to one solution in S and |P * | returns the size of P * . [74]. Here, assume that a reference point Z r � (z r 1 , z r 2 , . . . , z r m ) in the objective space is dominated by all Pareto-optimal objective vectors [74]. en, the HV metric will measure the size of the objective space dominated by the solutions in S and bounded by Z r , as follows:

Hypervolume (HV)
(1) Initialize N subproblems including weight vector w, N individuals, and neighborhood size T Model construction (6) if rand( ) < ξ then (7) Define neighborhood individuals B � B i (8) else (9) Select gss individuals from x 1 , x 2 , . . . , x N ) to construct B (10) end Model updation (11) if mod(G, LP) �� 0 then (12) Update Pr k,G using equations (5)-(8) (13) end (14) Select a Gaussian model according to Pr k,G (15) Generate y based on the selected Gaussian model (16) Add B and y into Algorithm 1 to generate a new offspring x Population updating (17) Set counter c � 0 (18) for each j ∈ B do (19) if g i (x) < g i (x j ) and c < n r then (20) x j is replaced by  Both IGD and HV metrics can reflect the convergence and diversity for the solution set S simultaneously. e lower IGD value (or the larger HV value) indicates the better quality of S for approximating the entire true PF. In this paper, six competitive MOEAs, including MOEA/D-SBX [9], MOEA/D-DE [56], MOEA/D-GM [57], RM-MEDA [47], IM-MOEA [53], and AMEDA [50], are included to validate the performance of our proposed MOEA/D-AMG. All the comparison results obtained by these MOEAs regarding IGD and HV are presented in the corresponding tables, where the best mean metric values are highlighted in bold and italics. In order to have statistically sound conclusions, Wilcoxon's rank sum test at a 5% significance level is conducted to compare the significance of difference between MOEA/D-AMG and each compared algorithm.

Public Parameters.
For population size N, we set N � 300 for biobjective problems and N � 600 for threeobjective problems. For number of runs and termination condition, each algorithm is independently launched by 20 times on each test instance, and the termination condition of an algorithm is the predefined maximal number of function evaluations, which is set to 300000 for UF instances, 150000 for F instances, and 200000 for WFG instances. We set the neighborhood size T � 20 and the mutation probability p m � 1/n, where n is the number of decision variables for each test problem and its distribution index is μ m � 20.

Parameters in MOEA/D-SBX.
e crossover probability and distribution index are, respectively, set to 0.9 and 30.

Parameters in MOEA/D-DE.
e crossover rate CR � 1 and the scaling factor F � 0.5 in DE as recommended in [9], the maximal number of solution replacement n r � 2, and the probability of selecting the neighboring subproblems δ � 0.9.

Parameters in MOEA/D-GM.
e neighborhood size for each subproblem K � 15 for biobjective problems and K � 30 for triobjective problems (this neighborhood size K is crucial for generating offspring and updating parents in evolutionary process), the parameter to balance the exploitation and exploration p n � 0.8, and the maximal number of old solutions which are allowed to be replaced by a new one C � 2.

Parameters in RM-MEDA.
e number of clusters K is set to 5 (this value indicates the number of disjoint clusters obtained by using local principal component analysis on population) and the maximum number of iterations in local PCA algorithm is set to 50.

Parameters in IM-MOEA.
e number of reference vectors K is set to 10 and the model group size L is set to 3.

Parameters in AMEDA.
e initial control probability β 0 � 0.9, the history length H � 10, and the maximum number of clusters K � 5 (this value indicates the maximum number of local clusters obtained by using hierarchical clustering analysis approach on population K).

Parameters in MOEA/D-AMG.
e number of Gaussian process models K � 5, the initial Gaussian distribution is (0, 0.6 2 ), (0, 0.   Tables 1 and 2, regarding the IGD and HV metrics, respectively. When compared to other algorithms, the performance of MOEA/D-AMG has a significant improvement when the multiple Gaussian process models are adaptively used. It is observed that the proposed algorithm can improve the performance of MOEA/D in most of the test problems. Table 1 summarizes the statistical results in terms of IGD values obtained by these compared algorithms, where the best result of each test instance is highlighted. e Wilcoxon rank sum test is also adopted at a significance level of 0.05, where symbols "+," "− ," and "∼" indicate that results obtained by other algorithms are significantly better, significantly worse, and no difference to that obtained by our algorithm MOEA/D-AMG. To be specific, MOEA/D-AMG shows the best results on 12 out of the 28 test instances, while the other compared algorithms achieve the best results on 3, 3, 5, 2, 2, and 1 out of the 28 problems in Table 1 Table 2, the experimental results also demonstrate the superiority of MOEA/D-AMG on these test problems. It is clear from Table 2  Moreover, we also use the R2 indicator to further show the superior performance of MOEA/D-AMG, and the similar conclusions can be obtained from Table 3.
To examine the convergence speed of the seven algorithms, the mean IGD metric values versus the fitness evaluation numbers for all the compared algorithms over 20 independent runs are plotted in Figures 2-4, respectively, for some representative problems from F, UF, and WFG test suites. It can be observed from these figures that the curves of the mean IGD values obtained by MOEA/D-AMG reach the lowest positions with the fastest searching speed on most cases, including F2-F5, F9, UF1-UF2, UF8-UF9, WFG3, and WFG7-WFG8. Even for F1, F6-F7, UF6-UF7, and WFG9, MOEA/D-AMG achieves the second lowest mean IGD values in our experiment. e promising convergence speed of the proposed MOEA/D-AMG might be attributed to the adaptive strategy used in the multiple Gaussian process models.
To observe the final PFs, Figures 5 and 6 present the final nondominated fronts with the median IGD values found by each algorithm over 20 independent runs on F4-F5, F9, UF2, and UF8-UF9. Figure 5 shows that the final solutions of F4 yielded by RM-MEDA, IM-MOEA, and AMEDA do not reach the PF, while MOEA/D-GM and MOEA/D-AMG have good approximations to the true PF. Nevertheless, the solutions achieved by MOEA/D-AMG on the right end of the nondominated front have better convergence than those achieved by MOEA/D-GM. In Figure 5, it seems that F5 is a hard instance for all compared algorithms. is might be due to the fact that the optimal solutions to two neighboring subproblems are not very close to each other, resulting in little sense to mate among the solutions to these neighboring problems. erefore, the final nondominated fronts of F5 are not uniformly distributed over the PF, especially in the right end. However, the proposed MOEA/D-AMG outperforms other algorithms on F5 in terms of both convergence and diversity. For F9 that is plotted in Figure 5, except for MOEA/D-GM and MOEA/D-AMG, other algorithms show the poor performance to search solutions which can approximate the true PF. With respect to UF problems in Figure 6, the final solutions with median IGD obtained by MOEA/D-AMG have better convergence and uniformly spread on the whole PF, when compared with the solutions obtained by other compared algorithms. ese visual comparison results reveal that MOEA/D-AMG has much more stable performance to generate satisfactory final solutions with better convergence and diversity for the test instances. to construct the model by using the clustering method, which cannot show good diversity as that in the MOEA/D framework. Both algorithms using the EDA method firstly classify the individuals and then use all the individuals in the same class as samples for training model. Implicitly, an individual in the same class can be considered as a neighboring individual. However, on the boundary of each class, there may exist individuals that are far away from each other, and there may exist individuals from other classes that are closer to this boundary individual, which may result in the fact that their model construction is not so accurate. For IM-MOEA that uses reference vectors to partition the objective space, its training data are too small when dealing with the problems with low decision space, like some test instances used in our empirical studies.

Effectiveness of the Proposed Adaptive Strategy.
According to the experimental results in the previous section, MOEA/D-AMG with the adaptive strategy shows great   3  advantages when compared with other algorithms. In order to illustrate the effectiveness of the proposed adaptive strategy, we design some experiments for in-depth analysis. In this section, an algorithm called MOEA/D-FMG is used to run this comparison experiment. Note that the difference between MOEA/D-FMG and MOEA/D-AMG lies in that MOEA/D-FMG adopts a fixed Gaussian distribution to control the generation of new offspring without adaptive strategy during the whole evolution. Here, MOEA/ D-FMG with five different distributions (0, 0.6 2 ), (0, 0.  Table 4 for the used test problems.

Complexity
As observed from Table 4, MOEA/D-AMG obtains the best mean IGD values on 17 out of 28 cases, while all the MOEA/D-FMG variants totally achieve the best mean IGD values on the remaining 11 cases. More specifically, for five MOEA/D-FMG variants with the distributions (0, 0.6 2 ), (0, 0.8 2 ), (0, 1.0 2 ), (0, 1.2 2 ), and (0, 1.4 2 ), they, respectively, obtain the best HV results on 4, 1, 3, 2, and 1 out of 28 comparisons. As observed from the one-to-one comparisons in the last row of Table 4 erefore, it can be concluded from Table 4 that the proposed adaptive strategy for selecting a suitable Gaussian model significantly contributes to the superior performance of MOEA/D-AMG on solving MOPs. e above analysis only considers five different distributions with variance from 0.6 2 to 1.4 2 . e performance with a Gaussian distribution with bigger variances is further studied here. us, several variants of MOEA/D-FMG with three fixed (0, 1.6 2 ), (1, 1.8 2 ), and (0, 2.0 2 ) are used for comparison. Table 5 lists the IGD comparative results on all instances adopted. As observed from the results, MOEA/D-AMG also shows the obvious advantages, as it performs best  18 Complexity on most of the comparisons, i.e., in 24 out of 28 cases for IGD results, while other three competitors are not able to obtain a good performance. In addition, we can also observe that with the increase of variances used in competitors, the performance becomes worse. e reason for this observation may be that if we adopt a too larger variance in the whole evolutionary process, the offspring will become irregular leading to the poor convergence. erefore, it is not surprising that the performance with bigger variances will degrade. It should be noted that the results from Table 5 also explain why we only choose Gaussian distributions from (0, 0.6 2 ) to (0, 1.4 2 ) for MOEA/D-AMG.

Parameter Analysis.
In the above comparison, our algorithm MOEA/D-AMG is initialized with K � 5, including five types of Gaussian models, i.e., (0, 0.6 2 ), (0, 0.8 2 ), (0, 1.0 2 ), (0, 1.2 2 ), and (0, 1.4 2 ). In this section, we test the proposed algorithm using different K values (3, 5, 7, and 10), which will use K kinds of Gaussian distributions. To clarify the experimental setting, the standard deviations of Gaussian models adopted by each K value are listed in Table 6. As shown by the obtained IGD results presented in Table 7, it is clear that K � 5 is a preferable number of Gaussian models for MOEA/D-AMG since it achieves the best IGD results in 16 of 28 cases. It also can be found that as K increases, the average IGD values also increase for most of test problems. is is due to the fact that in our algorithm MOEA/D-AMG, the population size is often relatively small, which makes little sense to set a large K value. us, a large K value has little effect on improving the performance of the algorithm. However, K � 3 seems to be too small, which is not able to take full advantage of the proposed adaptive multiple Gaussian process models. According to the above analysis, we recommend that K � 5.

Conclusions and Future Work
In this paper, a decomposition-based multiobjective evolutionary algorithm with adaptive multiple Gaussian process models (called MOEA/D-AMG) has been proposed for solving MOPs. Multiple Gaussian process models are used in MOEA/D-AMG, which can help to solve various kinds of MOPs. In order to enhance the search capability, an adaptive strategy is developed to select a more suitable Gaussian process model, which is determined based on the contributions to the optimization performance for all the decomposed subproblems. To investigate the performance of MOEA/D-AMG, twenty-eight test MOPs with complicated PF shapes are adopted, and the experiments show that MOEA/D-AMG has superior advantages on most cases when compared to six competitive MOEAs. In addition, other experiments also have verified the effectiveness of the used adaptive strategy to significantly improve the performance of MOEA/D-AMG. When comparing to generic recombination operators and other Gaussian process-based recombination operators, our proposed method based on multiple Gaussian process models is effective to solve most of the test problems. However, in MOEA/D-AMG, the number of Gaussian models built in each generation is the same as the size of subproblems, which still needs a lot of computational cost. In our future work, we will try to enhance the computational efficiency of the proposed MOEA/D-AMG, and it is also interesting to study the potential application of MOEA/D-AMG in solving some many-objective optimization problems and engineering problems.
Data Availability e source code and data are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.