A Fast Elitism Gaussian Estimation of Distribution Algorithm and Application for PID Optimization

Estimation of distribution algorithm (EDA) is an intelligent optimization algorithm based on the probability statistics theory. A fast elitism Gaussian estimation of distribution algorithm (FEGEDA) is proposed in this paper. The Gaussian probability model is used to model the solution distribution. The parameters of Gaussian come from the statistical information of the best individuals by fast learning rule. A fast learning rule is used to enhance the efficiency of the algorithm, and an elitism strategy is used to maintain the convergent performance. The performances of the algorithm are examined based upon several benchmarks. In the simulations, a one-dimensional benchmark is used to visualize the optimization process and probability model learning process during the evolution, and several two-dimensional and higher dimensional benchmarks are used to testify the performance of FEGEDA. The experimental results indicate the capability of FEGEDA, especially in the higher dimensional problems, and the FEGEDA exhibits a better performance than some other algorithms and EDAs. Finally, FEGEDA is used in PID controller optimization of PMSM and compared with the classical-PID and GA.


Introduction
Various optimization problems exist in engineering and academic research, which expect to find the best solution. If the problems are conventional or linear, the common mathematical methods will be effective. However, if the problems are too complicated to the common methods, some heuristic algorithms will be considered. Evolutionary algorithms (EAs) are very popular heuristic optimization techniques in the recent years. EAs are general terms of several optimization algorithms that are inspired by the Darwinian theory of natural evolution. It has the capability of solving the complicated optimization problems with nonlinear, high dimension and non-continuous characteristics. The algorithms search the optimal solution from many possible solutions, and the genetic operators, which simulate the principle of natural genetic evolution, are used to update the individuals. By several iterations, the optimal solution will be obtained, such as the genetic algorithms (GAs) [1], evolution strategies (ES), differential evolution (DE) [2,3], and the artificial immune system (AIS) [4,5] and also swarm evolutionary algorithm like particle swarm optimization (PSO) [2,6,7].
Although these algorithms have applied success to solve kinds of optimization problems [8], there are some inherent drawbacks. For example, if the building blocks spread all over the solutions, the standard EAs have very poor performance [9]. The recombination operators ether breaks the building blocks frequently or do not mix them effectively.
In recent years, estimation of distribution algorithms (EDAs) have attracted a lot of attention. It was proposed by Miuhlenbein and Paaß [10] and emerged as a generalization of EAs for overcoming some problems of EAs, like building blocks broken, poor performance in high dimensional problems, and the difficulty of modeling the distribution of solutions. Sometimes the gene blocks are built based on simple selection and crossover operators are not effective enough to get optimum solution as the blocks may be broken in EAs [9,11]. Compared with building blocks in EAs, EDAs do not use the crossover or mutation operator to update individuals [12]. Instead, they extract the global statistical information from the superiority individual of previous iteration and build the distribution probability model of solution for sampling new individuals [13]. It is the main advantages of EDAs compared with EAs that the search 2 The Scientific World Journal process is guided by the probabilistic model with explanatory and transparent characteristics [14,15]. The algorithms are based on the probabilistic models following two steps: (1) Statistics the information of selected individuals and establish the probability model and (2) generate new population by sampling the probability model. Therefore, the new offspring of EDAs is based on the probability distribution instead of performing recombination of individuals as EAs.
The type of probabilistic models used by EDAs and the methods employed to learn them may vary according to the characteristics of the optimization problem. Therefore, different EDAs have been proposed for discrete and continuous problems. In traditional EDAs, the individuals are encoded with binary strings inheritance from EAs. In the populationbased incremental learning (PBIL) algorithm [16], the individuals are encoded as fixed length binary strings. The population of solutions is updated by the probability vector, which is initially set to probability 0.5 for each position of the binary strings. For univariate marginal distribution algorithm (UMDA) [10], the frequencies of values on each position are computed according to the selected individuals, which are then used to generate the new population. The compact genetic algorithm (cGA) [17] updates the population according to the probability vector like the PBIL. However, unlike the PBIL, it modifies the probability vector according to the contribution of individuals.
In case of real-valued problems, there are some approaches to extend EDAs to other domains, such as mapping other domains to the domain of fixed-length binary strings and then mapping the solution back to the problem's original domains, or extend or modify the class of probabilistic models to other domains. This first approach might lead to undesirable limitations and errors on real-coded problems. For the second method, the normal pdf is commonly used in continuous EDAs to represent univariate or multivariate distributions. Therefore, some EDAs based on the Gaussian distribution have been designed. In the stochastic hillclimbing with learning by vectors of normal distributions [18], the population of solutions is modeled by a vector of mean of Gaussian normal distribution for each variable. The standard deviation is stored globally and it is the same for all variables. After generating a number of new solutions, the mean are shifted towards the best solutions and the standard deviation is reduced to make future exploration more specifically. Various ways of modifying the parameter have been exploited in [19]. Regularized estimation of distribution algorithms (RegEDA) [20] makes use of regularized model estimation in EDAs for continuous optimization. The regularization techniques can lead to more robust model estimation in EDAs. Continuous Gaussian estimation of distribution algorithm (CGEDA) [14] which is a kind of multivariate EDAs is proposed for real-coded problems. Gaussian data distribution and dependent individuals are two assumptions that are considered in CGEDA. In the algorithm, the joint distribution of promising solutions is used in every dimension of the problem. In literature [21,22], an estimation of distribution algorithm with Gaussian process based on a subspaces method was proposed, which can reduce the computation of complex problems. A real-coded EDA using multiple probabilistic models (RMM) was proposed [23], which includes multiple types of probabilistic models with different learning rates and diversities. There are also other EDAs, which adopt more involved probability models and mixtures of pdfs. However, the probability models cannot reflect the problem completely, because it is hard to obtain an accurate model. In particular, with the increases of number of variables and the number of mixture components, the optimization results become unreliable [24]. Therefore, we specifically focus on the use of the single normal distribution in this paper, as it is more intuitive to be analyzed. Moreover, the use of single and easy normal pdf will not prevent us from obtaining a better understanding of the exploitation of the solutions. We propose a fast elitism Gaussian EDA (FEGEDA) based on the standard process of EDA. A fast learning rule is used to parameters of pdf learning, and an elitism strategy is used for a better performance. Hence, the increased convergence exhibited in this study is expected.

The Fast Elitism Gaussian EDA
2.1. The Framework of the Algorithm. EDA is realized by probability estimation and sampling. The probability model is used to estimate the solution distribution, and the probability sampling is used to generate new individuals. In order to improve the performance of standard EDA, we adopt an elitism strategy in FEGEDA. Figure 1 is the flowchart of FEGEDA.
The steps of the FEGEDA are as follows.
Step 1 (initialization). Set the population size , define the number of best individuals for probability model establishment and generate the initialized population Pop(0).
Step 3 (statistical information obtaining). Select BN best individuals according to the fitness and obtain the statistical information of mean and standard deviation .
Step 4 (probability model ( 1 , 2 , . . . , ) establishment). Use the fast learning rule and build the Gaussian normal distribution by the of means and a covariance .
Step 5 (new population Pop( ) generation). Make use of sampling technique to generate a new population according to the probability model built in Step 4.
Step 6. Finally, the iteration is terminated according to the termination criteria. These criteria can be as simple as a fixed number of generations or imply a statistical analysis of the current population to evaluate the stopping condition criteria. If the stopping conditions do not meet, return to Step 2.
The probability model is built according to the distribution of the best solutions in the current population. Therefore, sampling solutions from this model should fall in promising areas with high probability. For some iterations, the solutions should be more likely to be close to the global optimum. The details of the main algorithm are explained in the following.

Initialization.
In the algorithm, little parameters are needed to set except for the population size and the best individuals size selected to build the probability model. Then, a random function is used to generate the initial population according to the variable domain [ , ]. Make use of random function generating variables ∈ [ , ] and then convert to the domain [ , ] by where is the th optimization variable of th individual, is the th random variable, and are the bounds of th random variable, and , and are the bounds of th optimization variable.

Population Evaluation.
In the individuals' evaluation, it depends on the characteristics of the problem. Conventionally, we should define an objective function ( ) in order to evaluate the fitness of individuals. Consider where is dimensional optimization variable, ( ) is the objective function, ( ) is the th inequality constraints, and ℎ ( ) is the th equality constraints. and are the bounds of variable.

The Establishment of Probability
Model. The most important and crucial step of EDAs is the construction of probabilistic model for the solution distribution; to do this step of FEGEDA, Gaussian distribution of individuals is assumed to model and estimate the distribution of promising solutions in every dimension of the problem. Therefore, mean and standard deviation parameters of promising population are required which computed adaptively by maximum likelihood technique.

Statistical Information Acquisition.
In order to construct a pdf model of the promising solutions, we should obtain the statistical information of promising solutions. Hence, statistical techniques have been extensively applied to the optimization problems. Fortunately, these parameters can be efficiently computed by the maximum-likelihood estimations [24].
In the pdf models that assume full independence, every variable is assumed independent of any variable. It must be noted that, in difficult optimization problems, different dependency relations can appear between variables and, hence, considering all of them independent may provide a model that does not represent the problem accurately. However, even if more involved probability models and mixtures of pdfs are defined and used in EDAs, the probability models cannot reflect the problem completely. For system modeling, the dependency relations between variables are very important. Conversely, for optimization problem, the problem decoupled as the combination of some independent variables is expected. Therefore, we specifically focus on the use of independent probability model to construct a fast elitism Gaussian EDA with better performance. That is, the probability distribution ( 1 , 2 , . . . , ) of the vector ( 1 , 2 , . . . , ) of variables is assumed to consist of a product of the distributions of individual variables: This is very suitable for calculation. Different from the discrete case, the number of parameters to be estimated does not grow exponentially with . Therefore, it is relatively fast and efficient.
The mean and covariance parameters of the normal pdf can be estimated from the selected individuals. Consider The Scientific World Journal ( ) is the mean of th variable in th iteration, is the selected individuals size, and 2 ( ) is the covariance of th variable in th iteration.
These parameters are always learned in the process of optimization. The iterative learning approaches are used in some literatures [23,[25][26][27] as follows: where and are the weights of ( ) and ( − 1). The learning method depends on the class of models used; this step involves parametric or structural learning, also known as model fitting and model selection, respectively. This can improve the performance of EDAs, no matter how simple or complex the learning rule is. We adopt a fast learning method ( = 1 and = 0) in this paper, and an elitism strategy is adopted to maintain a smooth convergence process.

Probability Model.
In this paper, the normal pdf ( , ) for variables is parameterized by the of means and covariance , which is defined by The probability distribution ( 1 , 2 , . . . , ) of the vector ( 1 , 2 , . . . , ) of variables is The parameters ( , ) have been estimated according to the selected best individuals. The estimation of marginal parameters ( , ) is updated in every iteration.

Probability Sampling.
The probability sampling is used to generate new individuals using the learned probabilistic models. The sampling method depends on the type of probabilistic model and the characteristics of the problem. For normal pdf problem, a conversion is used in order to convert the normal pdf to a standard normal pdf. Suppose The normal pdf about is converted to a standard normal pdf about . Consider ( , , ) → ( , 0, 1) .
The variable can be calculated by In the probability models, every variable ( 1 , 2 , . . . , ) is assumed independent of any variable. The mean and variance of variable are and ; when → ∞, Therefore, when → ∞ and → (0, 1). We can select an appropriate to generate a normal pdf for probability sampling. Figure 2 shows the cartogram of sampling data in different . From the figure, we can see the sampling data following the pdf better with the increasing of .

Elitism Strategy.
Elitism strategy is an effective strategy to ensure that the best individual is selected as the next generation in EAs, because the best individual may include the information of optimal solution. Therefore, elitism can improve the convergence performance of EAs in many cases [28], and elitism has long been considered an effective method for improving the efficiency of EAs [29]. This is achieved by simply copying the best individual directly to the new generation [30]. However, the number of best individuals selected as the next generation must be handled properly and carefully; otherwise it may lead to premature convergence or cannot improve the efficiency of algorithm. Figure 3 is the process of new population generation. The elitism individuals will be selected as the new generation directly, and the best individuals are used to establish a probability model to generate other individuals of the next generation. Consider where Elitism( ) is the operator to copy the best solution of Pop( − 1) to Pop( ) Sample() is the sampling function, The Scientific World Journal 5 The best individuals The worst individuals Current generation

Simulation
In the simulation, in order to exhibit the performance of FEGEDA, we carry out several different simulations, including one-dimensional benchmark, two-dimensional benchmarks, and higher dimensional benchmarks. Moreover, we compare the FEGEDA with other EDAs and other kinds of optimization algorithms.
3.1. One-Dimensional Benchmark. One-dimensional problem is easy for FEGEDA. In order to visualize the information of optimization process and models learning process during the evolution clearly, we carry out a one-dimensional benchmark optimization simulation: where 0 is a multimodal [31], ∈ [2.7, 7.5], with several local minimum value, and the global minimum value 1.6013 at = 5.19978. The best individuals number selected to build the probability model is a very important parameter for FEGEDA. The elitism strategy is a very important strategy to maintain a smooth optimization process in this paper. Therefore, in order to visualize the performance of corresponding part, we use different BN to testify the effect of outstanding individuals No. to the algorithm performance, and the elitism strategy is optional to testify the effect of the elitism strategy to the convergent performance of the algorithm. Many literatures [32][33][34] have proved that EDAs are convergent under certain conditions. From Figure 4, we can see that the optimization processes are unstable due to the use of fast learning rule when the algorithm is without elitism strategy no matter what is. The elitism strategy can make the convergent process smooth and improve the convergent performance too.
In Figure 5, the individuals' distribution and probability models of some iteration are exhibited. The individuals' distributions of iterations 1, 10, and 100 are shown in the Figure 5. The individuals spread over the optimized function at initial iteration, and then the individuals will focus on the area of optimum solution with the iterations going on. Therefore, the diagram of pdf is flat at the beginning. The parameter of pdf is smaller and smaller with the increase of iteration and focus on the global optimum solution. The probability models learning process is shown in Figure 5. The elitism strategy is a very important part of the algorithm. Form the exhibition of probability models learning process in Figure 5, we can see that the probability model learning process of solution is smooth when adopting elitism strategy; otherwise it is unstable. The best selected individuals number is also an important parameter. The convergent speed is faster when the best selected individuals number is /2. However, if it is too small, it will lead to premature. Figure 6 is the statistics information of population of some iteration. Form Figure 6, we can see the population distribution when = using elitism strategy (Figure 6(a)) or without elitism strategy (Figure 6(b)), and = /2 using elitism strategy (Figure 6(c)) or without elitism strategy (Figure 6(d)). According to Figure 6, the distribution of population is stable when using elitism strategy; otherwise it is fluctuant regardless of = or = /2. A small can make the individuals focus on a certain area quickly.
The Scientific World Journal

Two-Dimensional Problems.
In order to testify the optimization capability of FEGEDA further, three two-dimensional complex functions are considered: where , ∈ [−5.12, 5.12]. 1 has infinite maximum value, and the global maximum value 1 is point (0, 0). A circuit ridge surrounds the global maximum value. Hence, it is easy to fall into local maximum, which can be used to test the global searching capability of the algorithm. 2 is a local peak function, and the maximum value is 3600 at point (0, 0). This function can be used in determining the local  searching capability of the algorithm. The 3 function is a complicated function with a vibrating circuit ridge outside the global maximum value 0. This function can verify the global and local optimization capability of the algorithm simultaneously. Figure 7 shows the functions 1 , 2 , and 3 correspondingly. We compare the optimization result with three other algorithms [35]. The population size is set to 50, the maximum iteration is set to 100, and is set to /2 in order to have comparison under the same conditions. From Figure 8, we can see that the FEGEDA can get the optimum solution faster. It has similar optimization capability of CDMIA, which has preferable performance for the three benchmarks.

Higher Dimensional
Problems. The advantage of FEGEDA is the capability of higher dimensional problems solution. Some typical benchmarks are considered, including Quadric, Rosenbrock, Ackley, Griewank, Rastrigrin, and Schaffer's 7 function [21], which are shown in Table 1. In addition, they are configured with a dimension = 10. In order to compare with other EDAs under the same , −100 ≤ ≤ 100 Schaffer's 7 = 10 conditions, the population size of FEGEDA is 300 and the maximum iteration is 100. The algorithm is testified under different (from to /20). The convergent results under different are shown in Figure 9. Form Figure 9, we can see that the optimization process is slow when = .
With the decrease of , the convergent speed is faster. However, the increase of convergent speed is limited. If the is too small, the optimization will trap into local minimum easily.
We have a comparison of FEGEDA with other EDAs in [21]. Figure 10 is the comparison diagram. From Figure 10, we can see that FEGEDA is superior to standard EDA and other improved ones for the six benchmarks. For Ackley function, the performance of FEGEDA is the same as EDA. For Rosenbrock function, the initial fitness of FEGEDA is lower than other EDAs. Therefore, we put an enlarger diagram of corresponding area.

PID Controller Optimization
PID is the most used controller in the permanent magnet synchronous motors (PMSM) control. However, PID controller has poor performance in PMSM control due to the inappropriate parameters. During the past decades, great attention has been paid to the stochastic approach, which is potential in dealing with the problem [36,37]. In this paper, we adopt FEGEDA to optimize the PID controller of PMSM.

Mathematic Model of PMSM.
The mathematical model of PMSM in a , two-phase rotating coordinate system is shown below [38]. The voltage equation is where and represent the stator winding shaft in a straight axis and the quadrature voltage, respectively; and are the direct-axis current and quadrature-axis current, respectively; is the stator phase resistance; is the straight axis inductance; is the quadrature axis inductance; is the permanent-magnet fundamental excitation magnetic field and stator winding of the magnetic chain; is the electric angular speed of rotor.
The magnetic linkage equation can be expressed as follows: where and represent the syntheses of the magnetic fields in space-direct and quadrature-axis stator winding of the magnetic chain, respectively.
The electromagnetic torque of PMSM in the , coordinate is where is number of pole pairs. According to the motion equation of motor, where Ω is mechanical angular speed of rotor, is the viscous friction coefficient, is the total moment inertia of rotor and load, and is the load torque.
Thus, the state equation can be derived from the above equations as follows: 10 The Scientific World Journal   In the VC system of PMSM, = 0. Therefore, the state space equation (22) is described aṡ

PID Controller.
The continuous form of a PID controller, with input and output , is shown as follows: where is the proportional gain, is the integral gains, and is the derivative gains. There are two types of discrete PID by discretization of continuous PID. The position-type discrete PID is described as where ( ) is the controller output, ( ) is the error. In practical system control, the integral part is not flexible. Therefore, another velocity-type discrete PID is described as where is the sample time. For velocity-type PID, we do not need to calculate the integral part, and the controller output is the increment of PID. Therefore, it is often used in practical system control. Aggregation function is a conventional method, which can convert a multiobjective problem to a single-objective problem. Consider where fitness is the summation of fitness, is the weight of th objective, and is the fitness value of th objective. In the optimization process, the objective is to evaluate the performance of PIDs. Thus, for PID, the fitness function is written as [39] where ( ) is the system error, ( ) is the control output, and is the rising time.
To avoid overshoot, a penalty value is adopted in the fitness function. That is, once overshoot occurs, the value of overshoot is added to the fitness function. Hence, the penalty function is written as where ( ) is the control output. Making use of the aggression function, the fitness function is constructed as follows: where 1 , 2 , 3 , and 4 are the weight coefficients, and 4 ≫ 1 .

PID Controller Optimization Based on FEGEDA.
According to state space equation (6), we can build the state space model of PMSM in MATLAB/Simulink as Figure 11. The parameters of PMSM are that is 0.9664, is 0.00621, is 4, is 0.00033, is 0.0001619, and is 0.09382 according to motor.
The component of PMSM is encapsulated into a module. A speed controller added to the speed closed loop. Figure 12 is the diagram of PMSM control system. The "simouterror, " "simoutui, " and "simout" units are used to record the simulation data for optimization.
In order to testify the algorithm, GA and traditional PID are selected to compare against FEGEDA. 1 , 2 , 3 , and 4 of are set according to the requirement of control system.
1 is corresponding to the control variable of error, 2 is a weight coefficient of controlled variable, 3 is for the control variable of rising time, and 4 is the penalty of overshoot. If we want a system without overshoot and have a small rising time, 1 , 3 , and 4 will be set bigger, and 2 is smaller. If the controlled variable is limited, 2 will be set bigger. Therefore, these parameters can be set according to the practical requirement. In the test, 1 is 1, 2 is 0.1, 3 is 2, and  50. The optimal results are shown in Figure 13. Although the optimal result of traditional PID has shorter response time, the overshoot is bigger. The optimal result of GA has no overshoot, but the system response is slower. Therefore, the performance of FEGEDA is promising.

Conclusions
We studied the estimation of distribution algorithm in this paper and proposed a fast elitism Gaussian EDA for continuous optimization. Every dimension of individuals is represented by means and standard deviations of Gaussian distribution. These parameters are estimated using maximum likelihood technique by fast learning rule. Then the new population is generated by sampling and elitism strategy. The elitism strategy improves the convergent performance of the algorithm. In the one-dimensional test, we exhibit the optimization process and probability models learning process clearly. In the two-dimensional and higher dimensional problems, we compare the FEGEDA with danger immune algorithm and other EDAs, and the FEGEDA exhibits a good performance. Although the performance of FEGEDA is promising, further studies are still recommended, for instance, how to increase the diversity of population under fast convergence rate.