Parallel and Cooperative Particle Swarm Optimizer for Multimodal Problems

Although the original particle swarm optimizer (PSO) method and its related variant methods show some effectiveness for solving optimization problems, itmay easily get trapped into local optimumespeciallywhen solving complexmultimodal problems.Aiming to solve this issue, this paper puts forward a novel method called parallel and cooperative particle swarm optimizer (PCPSO). In case that the interacting of the elements in D-dimensional function vector X = [x 1 ,x 2 ,. . .,x d ,. . .,x D ] is independent, cooperative particle swarm optimizer (CPSO) is used. Based on this, the PCPSO is presented to solve real problems. Since the dimension cannot be split into several lower dimensional search spaces in real problems because of the interacting of the elements, PCPSO exploits the cooperation of two parallel CPSO algorithms by orthogonal experimental design (OED) learning. Firstly, the CPSO algorithm is used to generate two locally optimal vectors separately; then the OED is used to learn the merits of these two vectors and creates a better combination of them to generate further search. Experimental studies on a set of test functions show that PCPSO exhibits better robustness and converges much closer to the global optimum than several other peer algorithms.


Introduction
Inspired from social behavior and cognitive behavior, Kennedy and Eberhart [1,2] proposed particle swarm optimizer (PSO) algorithm to search for optimal value through population-based iterative learning algorithm.Due to the simple implementation and effective searching ability of the PSO algorithm, it has been widely used in feature selection [3], robot path planning [4], data processing [5], and other problems.However, many experiments have shown that the PSO algorithm may easily get trapped into local optimum especially when facing complex multimodal optimization problems.Better optimization algorithms are always needed for solving complex real-world engineering problems.In general, the unconstrained optimization problems that we are going to solve can be formulated as a -dimensional minimization problem as follows: Min  () ,  = [ 1 ,  2 , . . .,   , . . .,   ] , where  = [ 1 ,  2 , . . .,   , . . .,   ] is the vector to be optimized and  is the number of parameters [6].
In PSO, a member in the swarm, called a particle, represents a potential solution which is a point in the search space.Let  denote the size of the swarm, the current state of each particle  is represented by its position vector   = [ 1 ,  2 , . . .,   , . . .,   ], and the movement of particle  is represented by velocity vector   = [V 1 , V 2 , . . ., V  , . . ., V  ], where  = 1, 2, . . .,  is positive integer indexing particle in the swarm.Using  = 1, 2, . . .,  represents the iteration number; the velocity V  () and the position   () can be updated as follows: ( + 1) =   () + V  ( + 1) , (2) where the inertia weight  ∈ (0, 1) determines how much the previous velocity can be preserved.A large inertia weight value tends to global exploration and a small value for local exploitation. 1 and  2 denote the acceleration constants which are usually set 2.0 or adaptively 2 Mathematical Problems in Engineering controlled [7].1  and 2  are random numbers generated between 0 and 1 for the th dimension of th particle.  () = [ 1 (),  2 (), . . .,   (), . . .,   ()] represents the best previous position of particle  which is defined by  from the previous  iterations and   () = [ 1 (),  2 (), . . .,   (), . . .,   ()] represents the best position among particle 's neighborhood which is defined by  or  from the previous  iterations.The  model, which is inclined to exploitation, has a faster convergence speed but has a higher probability of getting stuck into local optimum than the  model.On the contrary, the  model considered focusing more on exploration is less vulnerable to the attraction of local optima but has slower convergence speed than the  model [8].The position vector   = [ 1 ,  2 , . . .,   , . . .,   ] and the velocity vector   = [V 1 , V 2 , . . ., V  , . . ., V  ] are initialized randomly and are updated by (2) generation by generation until some criteria are met.
There are many modified versions of PSO that have been reported in the literature.Most studies address the performance improvement of PSO from one of the four aspects: population topology [9][10][11], diversity maintenance [6,12,13], hybrid with auxiliary operations [14][15][16], and adaptive PSO [8,17].However, it is difficult to find a robust solution due to the complexity and large search space of high dimensional problems.Therefore, people try to split the large search space into smaller ones in order to simplify the problem.In [18], the search space implemented by genetic algorithm is divided by splitting the solution vector into several smaller vectors.Each smaller search space is optimized separately and the fitness function is evaluated by combining all the solutions found by the smaller search spaces.The same technique is used in PSO called cooperative PSO (CPSO H), which uses several subswarms to search dimensional function vector  = [ 1 ,  2 , . . .,   , . . .,   ] separately [19].The searching results are integrated by a global swarm to improve the performance of the original PSO on high dimensional problems.Compared with traditional PSO, the CPSO H has significant performance in terms of solution quality and robustness and performs better and better as the dimension of the problem increases.However, the efficiency of the algorithm is highly affected by the degree of interacting between function vectors  = [ 1 ,  2 , . . .,   , . . .,   ].
Inspired from previous studies, a novel algorithm called parallel and cooperative PSO (PCPSO) is proposed in this paper.PCPSO tries to overcome the influence of interacting between vector elements when taking similar split method as mentioned in [19].Assuming that there is no interacting among the elements of the vector  = [ 1 ,  2 , . . .,   , . . .,   ], the CPSO is used [19].Although the application of CPSO is useless in solving real problems, it provides a framework for PCPSO.For high degree of interacting of vector elements, local optimum can be obtained by CPSO [20].In order to jump out of local optimum, orthogonal experimental design (OED) is used to learn from two locally optimal vectors which are achieved by CPSO [21].A better combination of these two vectors can be obtained to push the search further to get closer to global optimum.
The rest of this paper is organized as follows.Section 2 describes the PCPSO algorithm.In Section 3, the benchmark test functions are used to test PCPSO algorithm and their results are compared with some peer algorithms taken from literature to verify the effectiveness and robustness of PCPSO algorithm.Final conclusions are given in Section 4.

Parallel and Cooperative Particle
Swarm Optimizer Here, we first introduce the CPSO used in [19]; then the PCPSO is proposed based on CPSO.OED is used to learn and create a new vector from two locally optimal vectors which are achieved by CPSO and the new vector is treated as a start point for further search.
In [19], there are two cooperative PSO algorithms, CPSO S  and CPSO H  .In CPSO S  , the -dimensional search space is decomposed into  subcomponents, each corresponding to a swarm of -dimensions (where  =  × ).CPSO H  is a hybrid method combining a standard PSO with the CPSO S  .The  subcomponents are used to place interacting elements together.However, the interacting elements are not known in real problems.We simplify the CPSO S  algorithm as CPSO which means the -dimensional search space is decomposed into  subcomponents.Algorithm 1 shows the algorithm of CPSO.In order to evaluate the fitness of a particle in a swarm, the context vector is applied, which is a concatenation of all global best particles from all  swarms.The evaluation of the th particle in the th swarm is done by calling the function b(,   ⋅   ) which returns a dimensional vector with its th component replaced by   ⋅  [22].
Algorithm 1: Pseudocode for the generic CPSO algorithm.
Table 1: Best combination levels of Sphere function by using OED method.
values, the independent vectors  1 and  2 , containing good information for optimum search, will fall into different locally optimal vectors.It is desperately needed to find a method of extracting good information from  1 and  2 and form a new vector   which is closer to globally optimal value.If we exhaustively test all the combinations of  1 and  2 for the new vector   , there are 2  trials which are unrealistic to accomplish in practice.With the help of OED [15,16,23,24], a relatively good vector   is obtained from  1 and  2 , using only a few experimental tests.
The OED with both the orthogonal array (OA) and the factor analysis makes it possible to discover the best combinations for different factors with only small number of experimental samples [23,24].Here we have a brief discussion about OED combined with an example of vectors  1 and  2 .In order to simplify the problem, assuming  = 3, the optimization problem is to minimize the Sphere function 0] which are derived from First1D and Second1D (actually First1D and Second1D should obtain the same globally optimal vector because the test function is simple; here we just want to explain the OED theory).The whole analysis of using OED for Sphere function is shown in Table 1.
OA is a predefined table which has numbers arranged in rows and columns.In Table 1, we can see that there are three factors  1 ,  2 , and  3 that affect function values and two levels  1 and  2 to choose.The rows represent the levels of factors in each combination, and the columns represent specific factors that can be changed from each combination [15,24].The programming of generating OA can be found in [24].Factor analysis evaluates the effects of individual factors based on function values and determines the best level for each factor to form a new vector   .Assuming   denotes a function value of the combination , where  = 1, . . .,  stands for total number of combinations.The main effect of factor  with level  as   is expressed as where  = 1, . . .,  and  = 1, 2. If the factor is  and the level is ,   = 1, otherwise 0. In Table OED works well when the interacting of vector elements is small.When the OED method is used to process some

Start End Yes No
Two vectors X L1 and X L2 obtained from First1D and Second1D are used for OED learning to create new vector X Ln From X L1 and X L2 , choose better one as X L1 ; X Ln is assigned to X L2

Parallel First1D and Second1D
Parallel First1D and Second1D with X L1 and X L2 as context vector, respectively algorithms, each operates eighty iterations functions with high interacting vector elements, there exists a problem that the function value of new vector   is bigger than the function values of vector  1 and  2 .However, there are still two advantages by using OED in high correlated problems.One is the particle with new vector   jumping out of locally optimal vector; the other one is that each element of the new vector   chosen between  1 and  2 is probably closer to the globally optimal vector.Therefore, we try to use the new vector   as context vector to repeat the CPSO algorithm.
Based on the analysis mentioned above, we propose the PCPSO algorithm to overcome the problem of falling into local optimum.Figure 1 shows the flowchart of PCPSO.First we use two parallel CPSO algorithms (First1D and Second1D); then OED learning is applied to form the new vector   which is assigned to  2 ; the better vector chosen from  1 and  2 is assigned to the new  1 .The new vectors  1 and  2 , which are treated as context vectors in CPSO algorithm, keep the search going further.In order to illustrate the efficiency of PCPSO algorithm, Figure 2 shows the convergence characteristic of the PCPSO on Rotated Ackley's function.The detailed description of Rotated Ackley's function can be found in Section 3. The parameter settings for First1D and Second1D are the same as CPSO H algorithm [19], where  decreases linearly from 0.9 to 0.4,  1 =  2 = 1.494,  = 10 stands for particle number, and  = 30 represents dimensions.Assuming First1D and Second1D need 80 CPSO iterations to accomplish locally optimal vectors followed with OED computation, there are 6 iterations of OED computation and therefore the total CPSO iteration is 6 × 80 = 480.From Figure 2, we can see that the function value drops sharply at first 80 iterations and that First1D and Second1D have similar performance.For the next 80 iterations, the better vector chosen from  1 and  2 is assigned to  1 for another First1D implementation; the new vector   is assigned to  2 for another Second1D implementation.The performance on First1D and Second1D are significantly different in iterations from 80 to 240.First1D algorithm gets trapped in locally optimal vector and Second1D algorithm still makes the function value drop sharply even though it is higher at start point at the 81st iteration.Since First1D algorithm has almost lost searching ability after 80 iterations, we can also adopt the strategy that Firt1D algorithm is only implemented at first 80 iterations in order to save computational cost.Vector  1 is used for saving the best optimal vector ever found and is combined with vector  2 for new vector   by OED method after 80 iterations.These operations iterate generation by generation until some criteria are met.
Compared with other PSO algorithms using OED [15,16], the PCPSO needs some extra computations of updating vector  1 .However, the cooperation between local vector  1 and  2 makes the searching more effective.In [15], the OED result is set as an exemplar for the other particles to learn.From Figure 2, we can see that sometimes the function values after OED are even bigger.Another limitation of [15] is that the searching method gets trapped into local optimum easily because the two vectors for OED learning are similar.Based on the principle that each element of function vector moves closer to the globally optimal vector after OED, it can be seen that the new vector   jumps out of local optimum and keeps searching further even though the function value is higher than corresponding local optimum at start iteration.

Experimental Results and Discussions
3.1.Test Functions.Test functions including unimodal functions and multimodal functions are used to investigate the performance of PSO algorithms [25].We choose 15 test functions on 30 dimensions.The formulas and the properties of these functions are presented below.
In Group A, the functions can be divided into smaller search spaces because of low interacting of the elements.Therefore, the CPSO algorithm is suitable to optimize these functions.In Group B, seven rotated multimodal functions are generated by an orthogonal matrix  to test the performance of the PCPSO algorithm.The new rotated vector  =  * , which is obtained through the original vector  left multiplied by orthogonal matrix , performs like the high interacting vector, because all elements in vector  will be changed once one element in vector  changes.Detailed illustration of generating the orthogonal matrix is introduced in [26].Meanwhile, one shifted and rotated function is also listed in group B to test the performance of PSO algorithm.Table 2 shows the globally optimal vectors  * , the corresponding function values ( * ), the search range, and the initialization range of each function.Consider the following: where  = [ 1 ,  2 , . . .,   ] is the shifted globally optimal vector.

PCPSO's Performance on 15
Functions.The performance of PCPSO is tested by functions  1 - 15 which includes multimodal, rotated, and shifted functions in 30 dimensions.The  parameter settings for the PCPSO algorithm are as follows:  decreases linearly from 0.9 to 0.4,  1 =  2 = 1.494, and particle number  = 10.Since the vectors of  1 - 8 have very low interacting elements, Figure 3 shows the convergence characteristics of PCPSO algorithm on test functions  1 - 8 in 30 iterations, where the -axis is expressed in log format.From Figure 3, we can see that all the function values drop significantly in 30 iterations except Rosenbrock's function.In order to fully test PCPSO's performance on Rosenbrock's function, we make a comparison between PCPSO and CPSO algorithm in Figure 4.The upper plot of Figure 4 shows convergence characteristic of PCPSO which cooperates with the two independent algorithms (First1D and Second1D) by using OED learning; the lower plot is the convergence characteristic of CPSO algorithm.From Figure 4, we can see that the OED learning makes the function vector jump out of locally optimal vector and drops the function value finally.The convergence characteristics for the remaining six test functions  10 - 15 are shown in Figure 5.All these six functions are rotated multimodal and nonseparable functions.Similar to Figure 2, in spite the fact that the function value of Second1D is increased in some cases at the start point caused by OED learning, it will drop sharply with the increase of iterations.This phenomenon is especially obvious in test function  12 .To sum up, PCPSO works well in terms of preventing locally optimal vector and dropping function value sharply.The OED learning finds the best combination of two locally optimal vectors  1 and  2 in iterations 80, 160, and 240.However, there are still some drawbacks causing the vector falling into locally optimal vector like  14 ; the reason is that the vectors  1 and  2 are equal to each other, the OED learning ability is lost finally.

Comparison with Other PSOs and Discussions.
Seven PSO variants are taken from the literature for comparison on 15 test functions with 30 dimensions.In the following, we briefly describe the peer algorithms shown in Table 3.The first algorithm is the standard PSO (SPSO), whose performance is improved by using a random topology.The second algorithm fully informed PSO (FIPS) uses all the neighbor particles to influence the flying velocity.The third orthogonal PSO (OPSO) aims to generate a better position by using OED.The fourth algorithm fitness-distance-ratio PSO (FDR PSO) solves the premature convergence problem

Algorithm
Parameter settings SPSO [27] : 0.4∼0.9, 1 =  2 = 1.193FIPS [10]  = 0.7298, ∑   = 4.1 OPSO [16] : 0.4∼0.9, 1 =  2 = 2.0 FDR PSO [13] : 0.4∼0.9,∑   = 4.0 CLPSO [6] : 0.4∼0.9, 1 =  2 = 2.0 CPSO H [19] : 0.4∼0.9, = 1.49,  = 6 OLPSO [15] : 0.4∼0.9, = 2.0,  = 5 by using the fitness distance ratio.The fifth comprehensive learning PSO (CLPSO) is proposed for solving multimodal problems.The sixth cooperative PSO (CPSO H) uses onedimensional swarms to search each dimension separately.The seventh orthogonal learning PSO (OLPSO) can guide particles to fly towards an exemplar which is constructed by   and   using OED method.The swarm size is set 40 for the seven PSO variants mentioned above and 10 for the PCPSO algorithm.The maximal FEs for the eight algorithms are set 200,000 in each run of each test function.All functions were run 25 times and the mean values and standard deviation of the results are presented in Table 4.The best results are shown in boldface.From the results, we observe that CPSO H, CLPSO, and OLPSO perform well on the first 8 functions.The reason is that CPSO-H suits well on 1-dimensional search, CLPSO and OLPSO can search further by using their own learning strategies.The learning strategy in CLPSO is to prevent falling into local optimum by random learning, while in OLPSO the strategy is to find a better combination of locally optimal vector.PCPSO method can keep searching by the cooperation of two locally optimal vectors  1 and  2 .With the implementation of Second1D, the algorithm can always find a better vector  2 compared with the previous best one  1 by using OED method.The PCPSO method performs well on  2 - 3 ,  9 - 11 ,  13 , and  15 .However, the function value obtained by this method is not close enough to the globally optimal value, the premature convergence still happens in  12 and  14 due to lack of vector diversity.The advantages of this method are the rapid convergent speed and robustness; whatever test functions are, the algorithm can find an acceptable optimum especially when facing some complex functions.The nonparametric Wilcoxon rank sum tests are taken as -test in Table 5 to determine whether the results obtained by PCPSO are statistically different from the best results achieved by the other algorithms.A value of one indicates that the performance of the two algorithms is statistically different with 95% certainty, whereas the value of zero implies that the performance is not statistically different.Among these 15 functions, there are 10 functions that are statistically different between the best results and the results of PCPSO.From Tables 4 and 5, we can see that the PCPSO obtains 5 best functions ( 2 ,  3 ,  10 ,  11 ,  13 ) that are statistically different from the others.In addition, most of the best functions obtained by PCPSO are focused on

Conclusion
In this paper, a novel PCPSO algorithm is presented in order to overcome the problem of falling into local optimum.In PCPSO, considering the complex interacting of the elements of the vector, the CPSO (First1D and Second1D) algorithm is used twice to create two locally optimal vectors.These two vectors are combined together by OED learning so that the best elements can be chosen to jump out of local optimum and search further.
Although OED operation may cause the function value higher than local optima in some cases, the function value will always drop sharply after OED because the OED operation chooses the elements that are closer to globally optimal vector.Experimental tests have been conducted on 15 benchmark functions including unimodal, multimodal, rotated, and shifted functions.From the comparative results, it can be concluded that PCPSO significantly improves the performance of PSO and provides a better solution on most rotated multimodal problems.

Figure 2 :
Figure 2: The convergence characteristic of the PCPSO on Rotated Ackley's function in 30 dimensions.First1D and Second1D are CPSO algorithms.

Figure 3 :
Figure 3: The convergence characteristics of the PCPSO algorithm on test functions  1 - 8 in 30 iterations.
[21] , ...,   , ...,   ] simultaneously, but usually the PSO variants can only change one element at one time when facing local optimum.Although the OED method is used to combine the best elements of vector  = [ 1 ,  2 , ...,   , ...,   ] as an exemplar[21]; the function value () usually gets bigger because of the high correlation.
1, we first calculate the effect  11 of level 1 in factor  1 ; then the level 2 in factor  1 is  12 .Since this is a minimal optimization problem, the better level 1 is chosen because the effect value of level 1  11 is less than that of  12 in Table1.As the example shown in Table1, although this new vector   = [2, 0, 0] is not shown in the four combinations tested, the best combination can be obtained by OED method.

Table 2 :
Globally optimal vector  * , corresponding function value ( * ), search range, and initialization range of test functions.

Table 3 :
Parameter settings for the seven PSO variants.