Convergence Time Analysis of Particle Swarm Optimization Based on Particle Interaction

We analyze the convergence time of particle swarm optimization (PSO) on the facet of particle interaction. We firstly introduce a statistical interpretation of social-only PSO in order to capture the essence of particle interaction, which is one of the key mechanisms of PSO. We then use the statistical model to obtain theoretical results on the convergence time. Since the theoretical analysis is conducted on the social-only model of PSO, instead of on common models in practice, to verify the validity of our results, numerical experiments are executed on benchmark functions with a regular PSO program.


Introduction
Particle swarm optimizer (PSO), introduced by [1,2], is a stochastic population-based algorithm for solving continuous optimization problems.As shown by [3] and by lots of real-world applications, PSO is an efficient and effective optimization framework.Although PSO has been widely applied in many fields [4][5][6][7], understanding of PSO from the theoretical point of view is still quite limited.Most of previous theoretical results [8][9][10][11][12][13][14][15][16][17][18] are derived under the system that assumes a fixed attractor or a swarm consisting of a single particle.
Due to the lack of theoretical analysis on PSO particle interaction, in this paper, we will make an attempt to analyze the convergence time for PSO on the facet of particle interaction.In particular, we will firstly introduce a statistical interpretation of PSO, proposed by [19], to capture the essence of particle interaction.We will then analyze the convergence time based on the statistical model.Finally, numerical experiments will be conducted to confirm the validity of our theoretical results obtained on simplified PSO, the social-only model, in a normal PSO configuration.
In the next section, we will briefly introduce the algorithm of PSO and the statistical interpretation of socialonly PSO.In Section 3, we will analyze the convergence time of PSO based on the statistical model.The experimental results are presented in Section 4, followed by Section 5 which concludes this work.

Particle Swarm Optimization and the Statistical Interpretation
The social-only model of PSO can be described as pseudocode shown in Algorithm 1.In this paper, we will use boldface for vectors, for example, X i , V i .Without loss of generality, we assume that the goal is to minimize the objective function.
According to Algorithm 1, in the beginning, m particles are initialized, where m is the swarm size, an algorithmic parameter of PSO.Each particle contains three types of information: its location (X i ), velocity (V i ), and personal best position (Pb i ).At each generation, each particle updates its personal best position (Pb i ) and neighborhood best position (Nb) according to its objective value.After updating the personal and neighborhood best positions, each particle updates the velocity according to Pb i and Nb.In the velocity update formula, w is the weight of inertia which is usually a constant.C p and C n are random values sampled from uniform distributions U(0, c p ) and U(0, c n ), where c p and c n are called acceleration coefficients.Finally, each particle updates its position according to the velocity and then goes to next generation.
procedure Social-only PSO (Objective function F : R n → R) Initialize m particles while the stopping criterion is not satisfied From the aforementioned brief description, we can already see that particle interaction is a crucial mechanism in the design of PSO.Although there have been previous studies on particle interaction and the PSO behavior, most of these studies were totally based on the assumption of fixed attractors, a false condition for PSO in action.In order to take particle interaction into consideration, we use an alternative view of PSO that regards the whole swarm as a unity.Instead of tracking the movement of each particle, we consider the overall swarm behavior by transforming the state of entire swarm into a statistical abstraction.Furthermore, in order to concentrate on particle interaction, we adopt the social-only model of PSO [20], which does not consider personal best positions.
The statistical interpretation of PSO we use in this paper is modified from [19], summarized in Algorithm 2. In the statistical model, the exact particle locations are not traced but modeled as a distribution θ over R n .Velocities are viewed as random vectors V ∈ R n .The swarm size m is considered as the number of samples from distribution θ, since the geographic knowledge is embodied in the distribution, the neighborhood attractor can be viewed as the best of the m samples.
Each particle P i is considered as a random vector sampled from θ, and the velocity V i is sampled from V. The neighborhood attractor can then be defined as P a := min Pi {F (P 1 ), F (P 2 ), . . ., F (P m )}.At each generation, P i j is updated as P i j + wV i j + C(P a j − P i j ).The next distribution is thus the statistical characterization denoted by functions of the observed values: Since w is a constant, distribution V can be removed because given two random vectors X ∼ θ and V ∼ V, we can simply let θ be the distribution of X := X + wV.
For simplicity, in this paper, we consider the positions of each dimension of a particle is independently sampled from distribution θ i .Consider the random variable X ∼ θ i and let and each region is associated with a random variable of velocity V i ∼ V i .By picking x i ∈ R i for each region, when s is sufficiently large, the swarm can be characterized as Each component of s i=1 V i /s can be approximated with a normal distribution by the central limit theorem.As a consequence, normal distributions are a reasonable choice for describing the behavior of the entire swarm.We let the distribution of ith dimension, θ i , be N(μ i , σ 2 i ), where N(μ i , σ 2 i ) is the normal distribution with mean μ i and variance σ 2 i .The update of distribution becomes simply by calculating the mean and the variance.
The mean can be calculated by taking the average of updated positions, and the variance is calculated by using the maximum likelihood estimation (MLE).Let σ t 2 i and μ t 2 i be the variance and mean of the ith dimension at the tth generation.Let y j = P j i and y j = P ji for j = 1, 2, . . ., m and let y a = P ai , y = (1/m) m j=1 y j .To estimate the variance of the ith dimension at the (t + 1)th generation, we use the maximum likelihood estimation (MLE).The likelihood function L(σ t 2 i ) is defined as the joint probability: (3) the value of σ t+1

Convergence Time Analysis
In this section, we will analyze the PSO convergence time based on the aforementioned statistical interpretation of the social-only model.As the first step, we must define the state of convergence.Since, in this work, we regard the entire swarm as a distribution, the state of convergence is then referred to as the variance of the distribution.We define the state of convergence as the variance for every dimension is less than a given value > 0. By using this definition, we can now start our analysis of PSO convergence time.To estimate the variance after distribution update, we need the following lemma from [21].
With this lemma, we can obtain the following.

Lemma 2. Given the swarm size m, acceleration coefficient c, and variance of the ith dimension at the tth generation σ
Proof.We know σ t+1 ( Lemma 2 is derived under the condition that σ t 2 i is given.The following lemma will derive the relationship between Proof.
Now, we can obtain the relationship of convergence time and algorithmic parameters of PSO.

Theorem 4. Given swarm size m, acceleration coefficient c,
, and Proof.From Lemma 3, we know The last inequality holds because log 1 3 We have two corollaries immediately from Theorem 4. , m, c, and such that

Corollary 6. Given swarm size
Corollary 5 reveals the linear relationship between the level of convergence and the convergence time, and the interpretation of Corollary 6 is that when the swarm size is sufficiently large, the effect of enlarging swarm size on the convergence time is not important.In the next section, we will empirically examine the two corollaries with a common practical PSO configuration.

Experiments
In this section, we verify the validity of Corollaries 5 and 6 by running standard PSO 2006 downloaded from Particle Swarm Central.We use two objective functions in our experiments: (i) sphere function [22]: (ii) schwefel's problem 1.2 [22]: We have D = 10 for both f 1 (x) and f 2 (x) in the following experiments.We firstly examine Corollary 5.The PSO algorithmic parameters are given as c p = 1, c n = 1, w = 1/(2 ln 2), and swarm size = 50.The value of is varied from 10 −1 to 10 −10 .For each value of , we perform 100 independent runs.For each run, we count the number of generations from initialization to the state in which variances for all dimensions are smaller than , and we calculate the mean number of generations for the 100 runs.
The comparison of these experimental results and our theoretical results is shown in Figures 1 and 2. From Figure 1, we can see that the experimental results of f 1 (x) are very close to −4.6After Corollary 5 is empirically verified with the standard PSO, we now examine Corollary 6.The parameters we used in PSO are given as c p = 1, c n = 1, w = 1/(2 ln 2), and = 10 −6 .The swarm size ranges from 50 to 1000 with step 5.For each swarm size, we perform 100 independent runs and record the mean as we did in last experiment.
The comparison of experimental and theoretical results is shown in Figures 3, 4, 5, and 6.From Figures 3 and 4, we can see that the convergence time is close to −64.9/ log 0.555 ( , where c = 0.405.As we can observe from these figures, when the swarm size becomes large, the increase of convergence time is insignificant, confirming our estimation in Corollary 6.

Conclusions
In this paper, a statistical interpretation of a simplified model of PSO was adopted to analyze the PSO convergence time.
In order to capture the essence of particle interaction, the statistical model adopted in this paper assumed no fixed attractors.The effect of particle interaction was included in our analysis.Our theoretical results revealed the relationship between the convergence time and the level of convergence as well as the relationship between the convergence time and the swarm size.Numerical results, in the standard settings of PSO, were obtained to empirically verify our theoretical results derived with a simplified PSO configuration.The agreement between the experimental and theoretical results indicated the importance of particle interaction in PSO.
Consequently, more research effort should be invested into analyzing the working of particle interaction in order to better understand particle swarm optimization.Some future extensions of this study are now ready to be explored.First of all, the relationship between PSO and the number of dimensions, that is, in the adopted model, the relationship between t and E[σ t(n) 2 ], where σ t(n) 2 = max{σ t 2 1 , σ t 2 2 , . . ., σ t 2 n }.Second, the theoretical analysis conducted in this study is independent of objective functions.In Section 4, we verify the analysis with only two objective functions.More functions of various features and properties, which do not violate the settings of the adopted statistical model, should be used to examine the estimation.Third, the distribution which we used in this paper is the normal distribution.However, there should exist some objective functions that enforce the swarm to distribute according to different distributions.The analysis presented in this paper will fail on those objective functions.As a result, more sophisticated models should be adopted to provide good descriptions of the PSO macrobehavior and to enable researchers to derive more accurate PSO estimations.Finally, the social-only PSO model we adopted in this paper does not take the personal experience into consideration.We also need more sophisticated models to analyze the PSO macrobehavior influenced by the personal experience.

Algorithm 1 :
Social-only model of PSO.

Figure 2 :
Figure 2: Comparison of experimental results and theoretical results from Corollary 5 of f 2 (x).The x-axis represents the value of , and y-axis represents the mean number of generation.The experimental results are very close to O(− log ).
log + 43 = O(− log ), and from Figure 2, the experimental results of f 2 (x) are very close to −4.7 log + 43.5 = O(− log ).The experimental results agree with our estimation in Corollary 5, in which the value of − ln and the PSO convergence time are linearly related.

Figure 3 :Figure 4 :
Figure 3: Comparison of experimental results and theoretical results from Corollary 6 of f 1 (x).x-axis represents the swarm size ranging from 50 to 200, and y-axis represents the mean number of generation.The experimental results are very close to O(−1/ log c (1 − 1/m)) with c < 1.

2 Figure 5 : 2 Figure 6 :
Figure 5: Comparison of experimental results and theoretical results from Corollary 6 of f 2 (x).x-axis represents the swarm size ranging from 50 to 200, and y-axis represents the mean number of generation.The experimental results are very close to O(−1/ log c (1 − 1/m)) with c < 1.