Convergence Analysis of a Class of Computational Intelligence Approaches

. Computational intelligence approaches is a relatively new interdisciplinary field of research with many promising application areas. Although the computational intelligence approaches have gained huge popularity, it is difficult to analyze the convergence. In this paper, a computational model is built up for a class of computational intelligence approaches represented by the canonical forms of generic algorithms, ant colony optimization, and particle swarm optimization in order to describe the common features of these algorithms. And then, two quantification indices, that is, the variation rate and the progress rate, are defined, respectively, to indicate the variety and the optimality of the solution sets generated in the search process of the model. Moreover, we give four types of probabilistic convergence for the solution set updating sequences, and their relations are discussed. Finally, the sufficient conditions are derived for the almost sure weak convergence and the almost sure strong convergence of the model by introducing the martingale theory into the Markov chain analysis.


Introduction
Computational intelligence approaches (CIAs) are a set of nature-inspired computational methodologies to interpret and construct the processes of optimization problems solving from different biological points of view, based on which a large number of derived approaches have been developed [1].An example of particularly successful research direction in CIAs is genetic algorithm (GA) [2] which generates solutions to optimization problems using techniques inspired by natural evolution, such as selection, crossover, and mutation.Differential evolution (DE) [3,4], proposed by Storn and Price, is similar to GA in the sense that it uses same evolutionary operators like mutation, crossover, and selection for guiding the population towards the optimum solution.Ant colony optimization (ACO) [5] is another very popular CIA, which was initially proposed by Marco Dorigo in his Ph.D. thesis.The first ACO was aiming to search for an optimal path in a graph, based on the behavior of ants seeking a path between their colony and a source of food.
The original idea has since diversified to solve a wider class of numerical problems, and as a result, several problems have emerged, drawing on various aspects of the behavior of ants.Particle swarm optimization (PSO) [6], introduced in 1995, was intended for simulating birds flocking or fishes schooling.PSO optimizes a problem by having a population of particles and moving these particles around in the search space according to mathematical formulae over the particle's position and velocity.Innumerable variants and hybrids of these techniques have been proposed, and many more applications of CIAs to specific problems have been reported [7][8][9].In the basic methodology, however, these algorithms have some features in common: (i) looking for global solutions of optimization problems by means of random search with heuristic information; (ii) promoting problem solving efficiency by multipoint search strategies; and (iii) synthesizing the information involved in a number of currently obtained candidate solutions to produce subsequent search decision.Due to the similarity, these algorithms possess several common advantages such as the robustness against the faint mathematical descriptions and the odd mathematical properties of optimization problems, as well as the capability to attain global optima.But they also share some shortcomings.One of the most well-known disadvantages is that the algorithm parameters are determined by experience rather than theoretical guidance.Therefore, when the algorithm parameters are improperly set up, the search process is apt to sink into a stagnant status; that is, no any better candidate solutions that are superior to their parents will be yielded (or the candidate solutions converged too early, resulting in being suboptimal).
A number of theoretical studies have been conducted in order to seek the seeds of the stagnation problems.Rudolph [10] analyzed the convergence properties of the canonical GA (CGA) and proved that in the sense of probability, CGA does not converge to global optima for any selection of initial population, crossover operator, and objective function, but some variants of CGA with an elite reserve policy are shown to converge in probability.Xu et al. [11] considered a class of GA with the strategy in which parental populations are put into competition with their offspring.It is shown that a variant of CGA with such a strategy can reach global optima after a finite number of iterations in probability one.Rastegar [12] gave the lower boundaries of the probability in which the compact genetic algorithm (cGA) converges to global optima and proved that the algorithm converges almost surely to the global optima of a special class of functions as the learning rate is sufficiently small.In the studies on ACO, Gütjahr [13,14] first proved the convergence of the graph-based ant system by means of the graph theory.It is shown that under certain conditions, the candidate solutions generated in each iteration converge to the global solution of a given problem instance in a probability that can be made arbitrarily close to one.Stützle and Dorigo [15] and Badr and Fahmy [16] studied the convergence properties of ACO with the aid of the algebraic theory, but their work was incomplete due to too many hypotheses made in their proofs.Duan et al. [17] regarded the pheromone trail vector in ACO as a Markov chain and presented a new approach for analyzing the almost sure convergence properties of ACO based on the martingale theory.However, the pheromone trail vector is not selfcontained if the path information is unknown.Thus further discussions are needed for Duan's conclusions.The convergence of PSO is also an issue of concern in the work on CIAs.Clerc and Kennedy [18] analyzed a sole particle's trajectory in simplified PSO systems and led their analysis to a generalized algorithm model in which a set of coefficients are used to control the system convergence.Based on Clerc's analysis, the dynamic behavior and the convergence of the simplified (deterministic) PSO algorithm were analyzed by Trelea [19] with the aid of tools from the discrete-time dynamic system theory.Van den Bergh and Engelbrecht [20] investigated the influence of the inertia factor on particle trajectories in PSO and provided a formal proof that each particle converges to a stable point.Poli [21] presented a method that allowed one to exactly determine the sampling distribution of PSO and explained how it changes over time during stagnation for a large class of PSO variants.Liu et al. [22] proved that PSO converges to global optima in probability one based on an iterative function model and the probabilistic theory.Chen and Jiang [23] analyzed the behavior of PSO on the facet of particle interaction and proposed a statistical interpretation of PSO in order to capture the stochastic behavior of the entire swarm.Based on a social-only model, they derived the upper and lower boundaries of the expected particle norm.
The basic methodology in the aforementioned studies is that the convergence properties are first investigated from a specific implementation of GA, ACO, or PSO and then to be extended to a wider area of the corresponding algorithm.In this paper, the authors attempt to analyze the general convergence properties of these algorithms on the basis of their common features in methodology.The main attention is paid to the characteristics of heuristic search strategies which play crucial roles in problem solving.The purpose of this paper is to have a deeper understanding on the essence of the convergence of the CIAs and, based on which, to develop more efficient algorithms for complex problems optimization.
The rest of this paper is organized as follows.In Section 2, a computational model of a class of CIAs is built up on the basis of several common features of the heuristic search strategies adopted by GA, ACO, and PSO, and a set of definitions are given to describe the external and internal changes of a solution set updating sequence in CIAs.In Section 3, the concepts of convergence in probability and almost sure convergence are defined for the solution set updating sequences generated in the search process of the computational model, and their relations are discussed.In Section 4, the conditions under which the computational model converges almost surely to a space of weakly optimal solution sets are studied based on the Markov chain properties of the solution set updating sequences.Then the almost sure convergence to a strongly optimal solution space is proved for the computational model with the aid of the martingale theory.Several conclusive remarks are given in Section 5.

The CIA Model
In this section, the characteristics of solution sets representation and solution sets update in CIAs are analyzed, and based on that, the computational model of a class of the CIAs is established.Moreover, variation rate of set component and progress rate of solution set are defined, respectively, to describe the variety and the optimality of the solution sets generated in the search process of the model.
Without loss of generality, consider the optimization problem of the following type: where S is usually called a search (or solution) space, and () is an evaluation function satisfying 0 < () < ∞.
A problem-solving procedure observing the following principles is stipulated as the computational model of a class of the CIAs ("the CIA model" for short) under study in this paper.
Step 1. Transform the feasible space S into the coded space of variables.
Step 2. Generate a certain number of candidate solutions to composite an initial solution set.
Step 3. Produce a new generation of solution set based on the heuristic information involved in the current solution set and corresponding evaluation values, and set the size of the solution set fixed.
Step 4. Stop the iterations and output the best candidate solution within the current solution set if a predefined termination criterion is satisfied; otherwise go back to Step 3.
Obviously, the canonical forms of GA, ACO, and PSO and most of their derived algorithms obey the aforementioned computational rules, thus can be regarded as the realization instances of the CIA model.
We consider a set of candidate solutions ("solution set" for short) X = { 1 ,  2 , . . .,   } ∈ S  composed by  strings of length  in CIAs, where S is the search space of candidate solutions, and is the search space of solution sets of th order.A candidate solution is composed of  bit numbers as its components ("solution components" for short).The candidate solution and its component have the special meaning in different CIAs.Table 1 illustrates the expressions of solution candidate and solution component in GA, ACO, and PSO.Therefore, a solution set X generated by CIAs can be written in the following vector and matrix forms: where   = ( 1 ,  2 , . . .,   ) ∈ S is the th candidate solution in X, and   is the th solution component of the th candidate solution,  = 1, 2, . . ., ,  = 1, 2, . . ., .For the sake of convenience, the th vector which consists of the th components of all the candidate solutions in X is called the th set component, written as As we know, a new generation of solution set in the CIA model generally relies on the heuristic information involved in the current solution set rather than the previous solution sets.Thereby, a state transition sequence of solution set can be expressed as a Markov chain, denoted by {X(),  ≥ 0} [24].In essence, the search procedure of the CIA model can be regarded as the updating process of solution sets expressed by where (⋅) is the solution set updating operator or in other words, the heuristic search strategy adopted by the CIA model.It is instantiated by the evolutionary operations composed of selection, crossover, and mutation in GA; by the path search guided by pheromone updating and path selection in ACO; and by the particle motion adjusted by velocity and position updating in PSO.It is well known that "elite reserve" is a commonly used strategy in many realisation instances of the CIA model.In this strategy, the best candidate solution of the previous generation is always retained in the current generation, so the best candidate solution obtained so far will not be missed in the future search process.We assume that an elitist strategy is always adopted by the CIA model.
It is reasonable to estimate the evolution trend of a solution set by the change of its solution components.In order to quantify the variation of solution components, the solution set X is first pretreated as a canonical form in the following way (see Figure 1).First of all, according to the element values of set component  1 , solution set X can be divided into two parts: where Then, depending upon the element values of set component  2 ,  0 1 and  1 1 can be further decomposed into two parts, respectively, denoted by where The sorting process of the solution candidates in X.Let () be a descriptive function of the set component variation from the (-1)th generation to the th generation of solution set based on the canonical form, defined by where the th element of (),   () = [ 1 () ⊕  1 ( − 1), . . .,   () ⊕   ( − 1)]  .The sign "⊕" expresses the operator which is very similar to the logical exclusive-OR; that is, the output of two compared values is 0 only if both   ( − 1) and   () are the same.Equation (6) implies that an element of () is 1 if the corresponding components of X() and X( − 1) have different values; otherwise the element is 0.
Based on the descriptive function () we can define an index to measure the evolution trend of a solution set quantitatively.
Definition 1 (variation rate of set component).Let {X(),  ≥ 0} be the solution set updating sequence generated by the CIA model, and let ∑   be the sum of all the element values in   .We call   () = ∑   ()/ the variation rate of the th set component of solution set X evolving from X( − 1) to X().
From Definition 1 we know that in view of the fact that 0 ≤ ∑   () ≤ , 0 ≤   () ≤ 1, for all .The variation rate of a set component measures the variation of the set component by calculating the percentage of its elements which change their values from the previous generation to the current generation.Thus, if   () = 0, then all the elements in set component   keep unchanged.Otherwise, if   () = 1, it implies that all the values in   have changed from X( − 1) to X().
Without loss of generality, it is assumed that the objective function (⋅) of the optimization problem (1) is used as the evaluation function in the CIA model.In order to quantitatively estimate the optimality of the candidate solutions in a solution set, it is useful to introduce the concept of progress rate of solution set as follows.
From Definition 2, we know that 0 < () ≤ 1. () approaches 0 if most of the candidate solutions are far worse than the best candidate solution in X().On the other hand, () approaches 1 if the evaluation values of most of the candidate solutions in X() are very close to the evaluation value of the best candidate solution.

The Basic Concepts on the Convergence of the CIA Model
In this section, we first present two different notions of convergence of random variables in probability theory.Then, these concepts are introduced into the solution set updating sequence in the CIA model.Finally, the relationships among the different types of convergence are discussed.
There exist several different notions of convergence of random variables, and the convergence of sequences of random variables to some limit random variable is an important concept in probability theory.We assume that {  ,  = 1, 2, . ..} is a sequence of random variables, and  is a random variable.
Definition 3 (convergence in probability [25]).A sequence {  ,  = 1, 2, . ..}of random variables converges in probability towards if for all  > 0, Definition 4 (almost sure convergence [25]).A sequence {  ,  = 1, 2, . ..} of random variables converges almost surely or almost everywhere towards  means that This means that the values of   approach the value of , in the sense that events for which   does not converge to  have probability 0.
Almost sure convergence implies convergence in probability.Convergence in probability is the type of convergence established by the weak law of large numbers, while almost sure convergence is the type of convergence established by the strong law of large numbers.
The definition of convergence may be extended from random variables to more complex random elements in CIAs.Before giving the definitions of the convergence of the CIAs, we explain two different concepts of the space of optimal solution sets.
Let the global optimal value of the evaluation function () be denoted by  max = max ∈S (), and let the set of the global optimal solutions of problem (1) be represented as  = { ∈ S | for all  ∈ S, () ≥ ()}.Obviously, for all  ∈ , we have () =  max .
The solution set X ∈ S  satisfying f(X) =  max is called the weakly optimal solution set.All the weakly optimal solution sets produced by the CIA model in its search process constitute a space of weakly optimal solution set, denoted by WOS = {X ∈ S  | f(X) =  max }.The solution set X ∈ S  satisfying (X) =  max is called the strongly optimal solution set.All the strongly optimal solution sets constitute a space of strongly optimal solution set, denoted by SOS = {X ∈ S  | (X) =  max }.
From the aforementioned definitions, we know that ∃  ∈ X ∈ WOS, the equality (  ) =  max holds.Otherwise, if for all   ∈ X ∉ WOS, the inequality (  ) <  max can be deduced.Similarly, if for all   ∈ X ∈ SOS, we have (  ) =  max , and if ∃  ∈ X ∉ SOS, we can conclude that (  ) <  max .So it is easy to deduce the relationship among the space of strongly optimal solution sets, the space of weak optimal solution sets, and the coded space of solution sets, that is, SOS ⊂ WOS ⊂ S  .
In view of the aforementioned notions, there are four kinds of convergence related to a solution set updating sequence in the CIA model.
Based on the definitions, the almost surely weak convergence of the CIA model can also be defined by {lim  → ∞ f(X()) =  max } = 1, and the almost surely strong convergence of the CIA model can also be defined by {lim  → ∞ (X()) =  max } = 1.
According to the above definitions, we can deduce the relationship among the four types of convergence.(ii) Almost surely strong convergence ⇒ strong convergence in probability ⇒ weak convergence in probability.
Proof.Based on the definition of the space of optimal solution set, we have From ( 9) we can derive that Hence we have lim Owing to the fact that lim according to the monotonicity of probability, we obtain Thus, we get Based on Theorem 5, we know that almost surely strong convergence implies almost surely weak convergence and strong convergence in probability, and hence implies weak convergence in probability.However, the almost surely weak convergence and strong convergence in probability are not interdeducible.

Convergence Analysis of the CIA Model
In this section, the conditions under which the CIA model converges almost surely to the space of weakly optimal solution set are studied according to the Borel-Cantelli Lemma.And then the conditions under which the CIA model converges almost surely to the space of strongly op-timal solution sets are deduced based on the submartingale convergence theorem.
Lemma 6 (Borel-Cantelli Lemma [26]).Let {  ,  = 1, 2, . ..} be a sequence of events in a probability space and lim sup   Mathematical Problems in Engineering the set of outcomes that occur infinitely many times within the infinite sequence of events (  ), then lim sup The Borel-Cantelli Lemma states the facts that: (i) if the sum of the probabilities of the events   ,  = 1, 2, . .., is finite, then the probability in which infinitely many of the events occur is zero.Namely, if ∑ ∞ =1 {  } < ∞, then {lim sup  → ∞   } = 0; (ii) if the events   ,  = 1, 2, . .., are independent and the sum of the probabilities of the events diverges to infinity, then the probability in which infinitely many of the events occur is one.Namely, if ∑ ∞ =1 {  } = ∞ and the events   ,  = 1, 2, . .., are independent, then {lim sup  → ∞   } = 1.
Proof.Consider the case of X(0) ∈ WOS.By the definition in Section 3, we know Since the CIA model is assumed to adopt an elite reserve strategy, we get Combining ( 23) with ( 22) yields Thus, the theorem is valid for X(0) ∈ WOS.
Consider the case of X(0) ∉ WOS.Let   = {X() ∉ WOS}, then we have According to Lemma 7, we obtain From ( 25) and ( 26) and considering the fact that 0 <  < 1, we have Based on (27), the first part of Lemma 6 holds.Then we have The sequence of the maximal evaluation values and the sequence of the average evaluation values of the solution sets generated by the CIA model will be used to describe the evolution process of the solution sets for further discussion.The martingale theory is introduced into the Markov chain analysis of the solution set updating sequence instead of the traditional ergodic theory.The conditions under which the CIA model converges almost surely to the space of strongly optimal solution sets are deduced based on the submartingale convergence theorem and the related conclusion in Section 2.
Based on the concept of submartingale and Doob's upcrossing inequality, the following submartingale convergence theorem was deduced for random sequences.
The search process of the CIA model can be regarded as the progress of the maximal evaluation values of the solution sets generated in the process, expressed by the sequence { f(X()),  ≥ 0}; or the progress of the average evaluation values of the solution sets, expressed by the sequence {(X()),  ≥ 0}.Both of the sequences are Markov chains.Thus, the convergence analysis of the CIA model can be firstly transferred to the submartingale analysis of the maximal evaluation value sequence { f(X()),  ≥ 0} and then be further transferred to the submartingale analysis of the average evaluation value sequence {(X()),  ≥ 0}.

Lemma 10. WOS is a closed set.
Proof.Let {X(),  ≥ 0} be a solution set updating sequence generated by the CIA model.Since the elite reserve strategy is adopted in the CIA model, we have f(X( + 1)) ≥ f(X()).On the other hand, according to the definition in Section 3, for all Y ∈ WOS and for all Z ∉ WOS, we have f(Y) > f(Z).Thus X( + 1) ̸ = Z if X() = Y, which results in the conclusion that (X( + 1) = Z | X() = Y) = 0.Then, for all Y ∈ WOS, we have or equivalently Therefore, WOS is a closed set.
According to Lemma 10, we can prove that the maximal evaluation value sequence of solution set is a submartingale with respect to the corresponding solution set updating sequence.
Lemma 11.The maximal evaluation value sequence { f(X()),  ≥ 0} generated by the CIA model is a submartingale with respect to the solution set updating seq-uence {X(),  ≥ 0}.
Proof.According to the non-after-effect property of Markov chain, we can conclude that based on which we only need to prove that for all Y ∈ S  the following inequality is true: According to the theory of random process [26], we have For the maximal evaluation value sequence { f(X()),  ≥ 0}, we have f (X ( + 1) = Z) ≥ f (X () = Y) , ∀Z ∈ WOS.(34) Combining (34) with (33) yields By Lemma 10, we obtain From ( 35) and (36) it results in Based on Lemma 11, we can associate the average evaluation value sequence of solution set with the progress rate of solution set defined in Section 2 to obtain the following corollary.
Corollary 12.The average evaluation value sequence {(X()),  ≥ 0} is a submartingale with respect to the solution set updating sequence {X(),  ≥ 0} if the corresponding progress rate of the solution set satisfies (t + 1) ≥ (t), for all t ≥ 0.
Based on Lemma 11 and its corollary, the sufficient conditions can be obtained for the strong convergence of the solution set updating sequence generated by the CIA model.Proof.Let the complement of the set of the globally optimal solutions be denoted by   and {  ,  ≥ 0} a sequence of random events that at least a part of the candidate solutions in solution set X() belongs to   ; that is,   = {X() ∩   ̸ = 0} = {(X()) ̸ =  max }.Similarly, let    be the exclusive event sequence of   ; that is, there is no any candidate solution in solution set X() belonging to   , expressed by    = {X() ∩   = 0} = {(X()) =  max }.Let {  } and {   } be the probability measures on   and    , respectively.From Corollary 12, we know that {(X()),  ≥ 0} is a submartingale with respect to {X(t), t ≥ 0}.Then according to the properties of submartingale, it is known that  max ≥  ( (X ( + 1))) ≥  ( (X ())) . (41) According to (41) and the properties of expectation, we get In addition, in view of the fact that 0 ≤ (X()) ≤  max and according to the properties of expectation, we have Then according to Lemma 9, (43) results in the conclusion that ∃X(∞) = lim  X(), {lim  → ∞ X() = X(∞)} = 1.In other words, {X(),  ≥ 0} is almost surely convergent.
In the following we further prove that the average evaluation value sequence of {X(),  ≥ 0} converges to  max almost surely.From (42) it can be seen that (X( + 1)) =  max if the event    occurs.Namely,    surely triggers off the event   +1 .Thus the following conclusion holds for the event sequence {   ,  ≥ 0}: On the other hand, for any solution sets Y and Z, if Y ∩   ̸ = 0, then ∃  ∈ Y, (  ) <  max ; and if Z ∩   = 0, then for all   ∈ Z, (  ) =  max .Hence the following inequality holds: where  = min( max −(  )) and  is the size of solution set.
In addition, by the properties of conditional expectation we have and from the definition of   , we have Moreover, by the definition in Section 3, we know that Substituting ( 45), (47), and (48) into (46), we obtain Now calculating the sum for both sides of the inequality in (49) from  = 1 to , it yields Note that (50) holds for any .Then let  → ∞ in (50); we have It follows that {  }→0 and {   }→1 if ∑ ∞ =1 ∏  =1   () = ∞, which further leads to {  +1 } → 1 according to (44).Thus we have Remark 14.The method shown in this paper has a certain similarity to methods reported in [11] for GA and in [17] for ACO.More specifically, the elitist strategy has the same benefits as the strategy of the parents population competing with their offsprings, and the condition of variation rates of set component satisfying ∑ ∞ =1 ∏  =1   () = ∞ in Theorem 13 is expected to act in a similar manner as the condition of minimum mutation probability of population space satisfying ∑ ∞ =1   = ∞ in [11].However, the method reported in [11] only applies to GA rather than a class of CIAs.Furthermore, the variation rates of set component are measurable while minimum mutation probability of population space is not.Duan's convergence proof of ant colony algorithm is also based on the Markov chain and the martingale theory, but they ignore important information [17].Namely, the vector of pheromone values {(),  ≥ 1}, without the best-found path ω( − 1), is not a Markov process and cannot be defined in a self-contained way.However, the stochastic process with states {  = ((), ω( − 1)),  ≥ 1} is an inhomogeneous Markov process in discrete time [14].

Conclusions
CIAs is a relatively new interdisciplinary field of research, which has gained huge popularity in these days.One of the most important problems obsessing the researchers in the area of CIAs is the premature convergence or the stagnation of an algorithm.The solution to this problem is not only helpful to the setup of the parameters in an existing algorithm for fast convergence to global optima but also related to how to construct a more effective and efficient algorithm for complex problems optimization.In this paper, we first analyze the common characteristics shared by GA, ACO, and PSO, and based on that, a computational model is constructed for a class of CIAs.Two quantification indices, that is, the variation rate and the progress rate, are subsequently defined to estimate the evolution trend of the solution sets generated by CIAs.Secondly, we define four different notions of convergence of a solution set updating sequence generated by the CIA model, and their relations are discussed.Thirdly, a weak convergence criterion is presented by associating the almost surely weak convergence with the positive definiteness of the variation rate of set component.Finally, the conditions for strong convergence of the CIAs are derived based on submartingale convergence theorem.Two sufficient conditions are obtained for the almost surely strong convergence: (i) the progress rate of solution set does not diminish in the search process; and (ii) the summation of the Cartesian products of the variation rates of set component for all solution sets is infinite.
It is obvious that the conditions described in Theorems 8 and 13 are sufficient but not necessary; thus, there exists the possibility that these theorems are too conservative to be used in practice.Moreover, a fast convergence rate is absolutely critical for a wide range of applications of CIAs, but it cannot be estimated by using the results presented in this paper.Further research work will be conducted by the authors in an effort to solve these interesting and important problems.

Theorem 5 .
There are the following relations among the four types of convergence of the CIA model.

( i )
Almost surely strong convergence ⇒ almost surely weak convergence ⇒ weak convergence in probability.

Table 1 :
The expressions of solution candidate and solution component.