On the Convergence of Biogeography-Based Optimization for Binary Problems

Biogeography-based optimization (BBO)isan evolutionary algorithm inspiredby biogeography, which is thestudyof themigration of species between habitats. A finite Markov chain model of BBO for binary problems was derived in earlier work, and some significant theoretical results were obtained. This paper analyzes the convergence properties of BBO on binary problems based on the previously derived BBO Markov chain model. Analysis reveals that BBO with only migration and mutation never converges to the global optimum. However, BBO with elitism, which maintains the best candidate in the population from one generation to the next, converges to the global optimum. In spite of previously published differences between genetic algorithms (GAs) and BBO, this paper shows that the convergence properties of BBO are similar to those of the canonical GA. In addition, the convergence rate estimate of BBO with elitism is obtained in this paper and is confirmed by simulations for some simple representative problems.


Introduction
Mathematical models of biogeography describe the immigration and emigration of species between habitats.Biogeography-based optimization (BBO) was first presented in 2008 [1] and is an extension of biogeography theory to evolutionary algorithms (EAs).BBO is modeled after the immigration and emigration of species between habitats.One distinctive feature of BBO is that it uses the fitness of each candidate solution to determine its immigration and emigration rate.The immigration rate determines how likely a candidate solution is to change its decision variables, and the emigration rate determines how likely a candidate solution is to share its decision variables with other candidate solutions.Specifically, a candidate solution's emigration rate increases with fitness, and its immigration rate decreases with fitness.
Although BBO is a relatively new EA, it has demonstrated good performance on various unconstrained and constrained benchmark functions [2][3][4][5] and on real-world optimization problems such as sensor selection [1], economic load dispatch [6], robot controller tuning [7], satellite image classification [8], and power system optimization [9].In addition, Markov models have been derived for BBO on binary problems [10,11].Reference [12] discusses the conceptual, algorithmic, and performance differences between BBO and GAs using both Markov model comparisons and benchmark simulation results.These simulation and theoretical results confirm that BBO is a competitive evolutionary algorithm.But until now there have not been any theoretical results concerning its convergence properties.
We say that an optimization algorithm converges to the global optimum if the value of at least one of its candidate solutions, in the limit as the generation count approaches infinity, is equal to the global optimum of the optimization problem.Several mathematical tools have been used to analyze EA convergence [13][14][15][16][17]. Recent work includes the analysis of EA convergence using Markov's inequality, Chebyshev bounds, Chernoff bounds, and martingales for minimum spanning tree problems, maximum matching problems, 2 Mathematical Problems in Engineering scheduling problems, shortest path problems, Eulerian cycle problems, multiobjective problems, and others [18][19][20].
Markov chain models are still some of the most frequently used methods for the analysis of EA convergence.They have been widely used in a variety of EAs, including genetic algorithms (GAs) [21][22][23][24][25] and simulated annealing [4,26], to prove probabilistic convergence to the global optimum.A Markov chain is a random process which has a discrete set of possible states   ( = 1, 2, . . ., ).The probability that the system transitions from state   to   is given by   , which is called a transition probability.The  ×  matrix P = [  ] is called the transition matrix, where  is the total number of possible population distributions.A population distribution is a specific multiset of individuals with a cardinality that is equal to the population size.A Markov state in [11] represents a BBO population distribution.Each state represents a particular population distribution, that is, how many individuals at each point of the search space there are in the population.Probability   is the probability that the population transitions from the th population distribution to the th population distribution in one generation.
This paper analyzes the global convergence properties of BBO as applied to optimization problems with binary search spaces, based on a previously derived BBO Markov model, and obtains the convergence rate estimate using homogeneous finite Markov chain properties.Section 2 gives a brief review of BBO and its Markov transition probabilities.Section 3 gives some basic definitions, obtains BBOs convergence properties, and obtains the convergence rate estimate.Section 4 confirms the theory using simple numerical simulation.The convergence properties and convergence rates derived here are not surprising in view of previous EA convergence results, but this paper represents the first time that such results have been formalized for BBO.Finally, Section 5 presents some concluding remarks and directions for future work.

Biogeography-Based Optimization (BBO)
This section presents a review of the biogeography-based optimization (BBO) algorithm with migration and mutation (Section 2.1) and then provides a review of the Markov transition probability of BBO populations (Section 2.2).

Overview of Biogeography-Based
Optimization.This section provides an overview of BBO.The review in this section is very general because it applies to optimization problems with either real domains, integer domains, binary domains, or combinations thereof.
BBO is a new optimization approach, inspired by biogeography theory, to solve general optimization problems.A biogeography habitat corresponds to a candidate solution to the optimization problem.A multiset of biogeography habitats corresponds to a population of candidate solutions.Habitat suitability index (HSI) in biogeography corresponds to the goodness of a candidate solution, which is also called fitness in standard EA notation.Like other EAs [27], BBO probabilistically shares information between candidate solutions to improve candidate solution fitness.In BBO, each candidate solution is comprised of a set of features, which are also called independent variables or decision variables in the optimization literature.Note that the decision variables can be taken from sets of real numbers, integers, binary numbers, or combinations thereof, depending on the problem.Each candidate solution immigrates features from other candidate solutions based on its immigration rate and emigrates features to other candidate solutions based on its emigration rate.BBO consists of two main steps: migration and mutation.
Migration.Migration is a probabilistic operator that is intended to improve a candidate solution   .For each feature of a given candidate solution   , the candidate solution's immigration rate   is used to probabilistically decide whether or not to immigrate.If immigration occurs, then the emigrating candidate solution   is probabilistically chosen based on the emigration rate   .Migration is written as follows: where  is a candidate solution feature.In BBO, each candidate solution   has its own immigration rate   and emigration rate   .A good candidate solution has relatively high  and low , while the converse is true for a poor candidate solution.According to [1], the immigration rate   and emigration rate   of the candidate solution   are based on linear functions and are calculated as where fitness denotes candidate solution fitness value, which is normalized to the range [0, 1].The probabilities of immigrating to   and of emigrating from   are calculated as where  is the population size.
Mutation.Mutation is a probabilistic operator that randomly modifies a candidate solution feature.The purpose of mutation is to increase diversity among the population, just as in other EAs.Mutation of the th candidate solution is implemented as shown in Algorithm 1.
In the mutation logic (Algorithm 1), rand (, ) is a uniformly distributed random number between  and ,   is the mutation rate, and   and   are the lower and upper search bounds of the th independent variable.The above logic mutates each independent variable with a probability of   .If mutation occurs for a given independent variable, then that independent variable is replaced with a random number within its search domain.
A description of one generation of BBO is given in Algorithm 2. The search space is the set of all bit strings   consisting of  bits each.Therefore, the cardinality of the search space is  = 2  .Suppose that BBO is currently in the th generation.Based on the previously derived Markov chain model for BBO [11], the probability that migration results in the th candidate solution   are equal to   at generation  + 1 is given by where 1 0 is the indicator function on the set 0 (i.e., 1 0 () = 1 if  = 0, and 1 0 () = 0 if  ̸ = 0),  denotes the index of the candidate solution feature (i.e., the bit number),   denotes the immigration rate of candidate solution   ,   denotes the emigration rate of candidate solution   , and V  denotes the number of   individuals in the population.The notation   () in (4) denotes the set of search space indices  such that the th bit of   is equal to the th bit of   .That is,   () = { :   () =   ()}.Note that the first term in the product on the right side of (4) denotes the probability that  ,+1 () =   () if immigration of the th candidate solution feature does not occur, and the second term denotes the probability if immigration of the th candidate solution feature does occur.
Example 1.To clarify the notations in (4), an example is presented.We use the notation V  to denote the number of   individuals in the population.Suppose we have a two-bit problem ( = 2,  = 4) with a population size  = 3.The search space consists of the bit strings  = { 1 ,  2 ,  3 ,  4 } = {00, 01, 10, 11}.Suppose that the individuals in the current population are  = { 2 ,  4 ,  4 } = {01, 11, 11}.Then we have V 1 = 0, V 2 = 1, V 3 = 0, and V 4 = 2.To clarify the notation   (), we now explain how to calculate  1 (1).We arbitrarily number bits from left to right; that is, in any given bit string, bit 1 is the leftmost bit and bit 2 is the rightmost bit.From the definition of   () we see that Since  1 = 00, we see that  1 (1) = 0 (i.e., the leftmost bit is 0).Then (5) can be written as But   (1) = 0 for   ∈ {00, 01}, which in turn indicates that   (1) =  1 (1) for  ∈ [1,2]; therefore,  1 (1) = {1, 2}.Continuing this process, we see that Mutation.Mutation operates independently on each candidate solution by probabilistically reversing each bit in each candidate solution.Suppose that the event that each bit of a candidate solution is flipped is stochastically independent and occurs with probability   ∈ (0, 1).Then the probability that candidate solution   mutates to become   can be written as where  is the number of bits in each candidate solution and   represents the Hamming distance between bit strings   and   .

Convergence of Biogeography-Based Optimization
The previous section reviewed the BBO algorithm and its Markov model.In this section, which comprises the main contribution of this paper, we use the results of Section 2 to analyze the convergence behavior of BBO.Section 3.1 gives some basic foundations of Markov transition matrices, including notation and basic theorems that we will need later.Section 3.2 reviews previously published Markov theory as it relates to BBO.This leads to Section 3.3, which obtains some important properties of the BBO transition matrix.Section 3.4 uses transition matrix properties to analyze BBO convergence to the solution of a global optimization problem.This leads to Section 3.5, which uses the BBO convergence analysis to obtain an estimate of the convergence rate.to be homogeneous.Given an initial probability distribution of states (0) as a row vector, the probability distribution of the Markov chain after  steps is given by () = (0)P  .Therefore, a homogeneous finite Markov chain is completely specified by (0) and P, and the limiting distribution as  → ∞ depends on the structure of P. For homogeneous finite Markov chains, we have the following two theorems [28,29].

Preliminary Foundations of Markov
Theorem 2 (see [28, page 123]).Let P be a primitive stochastic matrix of order ; that is, all of the elements of P  are positive for some integer .Then P  converges as  → ∞ to a stochastic matrix which has all nonzero entries.That is, for all ,  ∈ [1, ], where  = ( 1 , . . .,   ), and   ̸ = 0 for 1 ≤  ≤ .
We will use these theorems in Section 3.3 to derive important properties of the BBO transition matrix and in Section 3.4 to derive BBO convergence properties.

BBO Markov Theory.
In previous work [11], the transition probability of BBO with migration and mutation was obtained.This provides us with the probability Pr( | V) of transitioning in one generation from population vector , where V  is the number of candidate solutions   in the population and  is the size of the search space, to population vector u = [ 1 ,  2 , . . .,   ].BBO can be described as a homogeneous finite Markov chain: the state of BBO is defined as the population vector, so the element   of the state transition matrix P is obtained by computing Pr( | V) for each possible v and each possible u.Namely,   = Pr( | V) denotes the probability that the th population vector, denoted as v, transitions to the th population vector, denoted as u, where ,  ∈ [1, ].Note that the cardinality of the state space is || = , where  is the total number of possible populations; that is,  is the number of possible vectors v and the number of possible vectors u.The number  can be calculated in several different ways, as discussed in [11].
Let the  ×  matrix M = (  ) and the  ×  matrix U = (  ) be intermediate transition matrices corresponding to only migration and only mutation, respectively, where  is the population size and  is the cardinality of the search space.Note that   = Pr( ,+1 =   ) ≥ 0 and   = Pr(  →   ) > 0. That is,   is the probability that the th individual in the population transitions to the th individual in the search space when only migration is considered, and   is the probability that the th individual in the search space transitions to the th individual in the search space when only mutation is considered.We can use [11] to obtain the transition probability of the th population state vector v to the th population state vector u as where  ≡ {J ∈  × :   ∈ {0, 1} , where   (v) is a single element of the product of M and U.The matrix composed of the   (v) elements can be represented as [  (v)] = MU  , where  ∈ [1, ] and  ∈ [1, 𝑛].  (v) denotes the probability that the th migration trial followed by mutation results in candidate solution   .
Note that   (v) is a scalar, and transition matrix P = (  ) is a  ×  matrix, each element of which can be obtained by (11).
Example 4.Here we use a simple example based on [11] to clarify (11).Consider a simple BBO experiment in which a trial of migration and mutation can result in one of four possible outcomes  1 ,  2 ,  3 , and  4 with probabilities  1 ,  2 ,  3 , and  4 , respectively.Index  refers to the migration trial number (i.e., the "For each   " loop in Algorithm 2).Assume that the total number of trials (i.e., the population size ) is equal to 2. Suppose that the probabilities are given as follows: In this example, we calculate the probability that  1 and  4 occur after two migration trials.In order to calculate this probability, let c = [ 1 ,  2 ,  3 ,  4 ] denote a vector of random variables, where   is the total number of times that   occurs after two migration trials.Based on (11), we use  1 = 1,  2 = 0,  3 = 0, and  4 = 1 to obtain where According to (13), J belongs to  if it satisfies the following conditions: (1) J is a 2×4 matrix; (2) each element of J is either 0 or 1; (3) the elements in each row of J add up to 1; and (4) the elements in the th column of J add up to   .
Note from [11] that the cardinality of  is The number of matrices J (t) that satisfy these conditions is calculated as , and the J (t) matrices are found as follows: Substituting these matrices into (12) gives =  11  24 +  14  21 = 0.08. (16)

BBO Transition Matrix Properties.
Recall that the migration probability and mutation probability can be calculated by ( 3) and ( 8), respectively:   = Pr ( ,+1 =   ) ≥ 0 and   = Pr (  →   ) > 0. Therefore, M is a nonnegative stochastic matrix; although it is not a transition matrix since it is not square, each row sums to 1.We also see that U is a positive left stochastic matrix; that is, each of its columns sums to 1.We now present two theorems that show that there is a nonzero probability of obtaining any individual in the search space from any individual in a BBO population after migration and mutation.This means that there is a nonzero probability of transitioning from any population vector u to any other population vector v in one generation, which means that the BBO transition matrix is primitive.
where  is given in (11).So the transition matrix P = (  ) of BBO is positive.Therefore, P is primitive since every positive transition matrix is primitive.

Corollary 7.
There exists a unique limiting distribution for the states of the BBO Markov chain.Also, the probability that the Markov chain is in the th state at any time is nonzero for all  ∈ [1, ].
Proof.Corollary 7 is an immediate consequence of Theorems 2 and 6.

Convergence Properties of BBO.
Before we obtain the convergence properties of BBO, some precise definitions of the term convergence are required [15].Assume that the search space of a global optimization problem is  with cardinality || = .Further assume that the BBO algorithm with population size  consists of both migration and mutation, as shown in Algorithm 2. We use the notation  * () to denote an arbitrary element of  * () (i.e., one of the best individuals in the population at generation ).Because of migration and mutation,  * () and its fitness will change randomly over time.As  → ∞, the convergence, or lack of convergence, of  * () to the subset  * indicates whether or not the BBO algorithm is globally convergent.That is, BBO is said to converge if Pr ( lim Note that  * () is not necessarily unique.However, Definition 8 states that the BBO algorithm is globally convergent if and only if (17) holds for every  * ().Clearly, the evolution of  * () is a homogeneous finite Markov chain, which we call an  * ()-chain.
Now we sort all the states of  in order of descending fitness; that is,  = { 1 , . . .,   }, and ( 1 ) ≥ ( 2 ) ≥ ⋅ ⋅ ⋅ ≥ (  ).We define  as the set of indices of ; that is,  = {1, 2, . . ., }.Further, we define  * as the elements {} of  such that   ∈  * ; that is,   ∈  * for all  ∈  * .This leads to the following definition.Definition 9. Let P = (p  ) be the transition matrix of an  * ()-chain, where p for ,  ∈ [1, ] is the probability that  * () =   transitions to  * ( + 1) =   .The BBO algorithm converges to a global optimum if and only if  * () transitions from any state  ∈  to  * as  → ∞ with probability one, that is, if As noted earlier, there may be more than one  * ()-chain since more than one element of the search space may have a globally maximum fitness.Definition 9 states that the BBO algorithm converges to a global optimum if and only if (18) holds for every  * ()-chain.Also note that P depends on the other individuals in the population at generation .Definition 9 states that the BBO algorithm converges to a global optimum if and only if (18) holds for every possible P transition matrix for every  * ()-chain.
Theorem 10.If the transition matrix P = (p  ) of an  * ()chain is a positive stochastic matrix, then BBO with migration and mutation does not converge to any of the global optima.
Proof.Since every positive matrix is also a primitive one, it follows by Theorem 2 that the limiting distribution of P is unique with all nonzero entries.Therefore, for any  ∈ , where we use the notation  −  * to denote all elements of  that do not belong to  * .We see that (18) is not satisfied, which completes the proof.Proof.From Theorem 3, we see that, for all ,  ∈ , where  = ( 1 , . . .,  | * | , 0, . . ., 0),   ̸ = 0 for 1 ≤  ≤ | * |, and It follows directly that, for any  ∈ , We see that ( 18) is satisfied which completes the proof.
Theorems 10 and 11 can be applied directly to determine the global convergence of BBO if the structure of the transition matrix of the Markov chain can be determined, as we will show in the remainder of this section.In particular, we will formalize the observation that the transition matrix of BBO without elitism satisfies the conditions of Theorem 10 (as stated in Theorem 6).We will further show that the transition matrix of the  * ()-chain of BBO with elitism satisfies the conditions of Theorem 11.
Elitism.We now discuss a modified BBO which uses elitism, an idea which is also implemented in many EAs.There are many ways to implement elitism, but here we define elitism as the preservation of the best individual at each generation in a separate partition of the population space.This enlarges the population size by one individual; the elite individual increases the population size from  to  + 1.However, note that the population size is still constant (i.e., equal to  + 1) from one generation to the next.The elite individual does not take part in recombination or mutation but is maintained separately from the other -members of the population.At each generation, if an individual in the member main population is better than the elite individual, then the elite individual is replaced with a copy of the better individual.
Relative to a standard -member BBO population, elite BBO increases the number of possible population distributions by a factor of , which is the search space size.That is, each possible population distribution of the -member main population could also include one of  elite individuals.The number of possible population distributions increases by a factor of , from  (see Section 3.1) to .We order these new states so that each group of  states has the same elite individual.Also, the elite individual in the th group of  states is the th best individual in the search space, for  = 1, . . ., .
The elitist-preserving process can be represented by an upgrade transition matrix O, which contains the probabilities that each population distribution of the ( + 1)-member population transitions to some other population distribution after the elitist-preserving step.That is, the element in the th row and th column of O, denoted as (, ), is the probability that the th population distribution transitions to the th population distribution after the step in which the elite individual is replaced with the individual from the member main population.The upgrade matrix is similar to the one in [29]; it does not include the effects of migration or mutation but only includes the elitism-preserving step.The upgrade matrix only includes the probability of changing the elite individual; it does not include the probability of changing the -member main population, since it does not include migration or mutation.If there are no individuals in the -member main population that are better than the elite individual, then the elite individual does not change.The structure of the upgrade matrix O can be written as where each O  matrix is  × , where  is the number of population distributions in an EA with a population size of  and search space cardinality of .O 11 is the identity matrix since the first  population distributions have the global optimum as their elite individual, and the elite individual can never be improved from the global optimum.
Matrices O  with  ≥ 2 are diagonal matrices composed of all zeros and ones.Since the population distributions are ordered by grouping common elite individuals and since elite individuals in the population distribution ordering are in order of decreasing fitness, the super block diagonals in O are zero matrices as shown in (22); that is, there is zero probability that the th population distribution transitions to the th population distribution if  < .So the Markov chain of elite BBO can be described as where P is the × transition matrix described in Section 3.1.
Example 12.To explain the update matrix O described in (22), a simple example is presented.Suppose there exists a search space consisting of  = 3 individuals which are  = { 1 ,  2 ,  3 } where the fitness of  1 is the lowest and the fitness of  3 is the highest.Suppose the main population size is  = 1, so the elitist population size is  + 1 = 2.So there are nine possible populations before the elitist-preserving step, and they are Note that the first element in each population is the elite individual and the last  element ( = 1 in this example) is the main population.Also note that the populations are ordered in such a way that the first three have the most fit individual as their elite individual, the next three have the second most fit individual as their elite individual, and the last three have the least fit individual as their elite individual.The update matrix O is a 9 × 9 matrix.
The population  1 = { 3 ,  3 } transitions to the population  1 = { 3 ,  3 } with probability 1; that is, (1, 1) = 1.Population  1 cannot transition to any other population   ( ̸ = 1); that is, (1,) = 0 for  ̸ = 1.Similarly, population  2 = { 3 ,  2 } transitions to  2 with probability 1 since the elite  3 is better than the main-member population  2 ; therefore, (2, 2) = 1, and (2, ) = 0 for  ̸ = 2. Continuing with this reasoning, we obtain the O matrix as follows: where each O  matrix is 3 × 3 and the blank elements are 0. Now we consider the convergence of the  * ()-chain, which is the sequence of elite individuals in the elite BBO algorithm.If the elite individual is equal to the global optimum, we call this an absorbing state of the  * ()-chain.Recall that the elite individual in elite BBO can only be replaced by one with better fitness.Therefore, the  * ()-chain of elite BBO contains three classes of states: (1) at least one absorbing state, (2) nonabsorbing states which transition to absorbing states in one step, and (3) nonabsorbing states which transition to nonabsorbing states in one step.So the transition matrix P of the  * ()-chain, which we introduced in ( 18)-( 21), can be written as where I  is a  ×  unit matrix corresponding to optimal individuals ( is the number of optima), R is a matrix of order (|| − ) ×  corresponding to nonabsorbing states that transition to absorbing states (|| is the cardinality of the state space , so || −  is the number of nonabsorbing states), and Q is a matrix of order (|| − ) × (|| − ) corresponding to nonabsorbing states that transition to nonabsorbing states.The matrix P of ( 25) has the same structure as the matrix P in Theorem 11.It follows from Theorem 11 that the  * ()-chain of elite BBO is globally convergent.These results are similar to the canonical GA [29], which is proven to never converge to the global optimum, but elitist variants of which are proven to converge to the global optimum.We sum up these results in the following corollary.

Corollary 13. BBO with migration and mutation does not converge to any of the global optima, but elite BBO, which preserves the best individual at each generation, converges to the global optimum.
Proof.This is an immediate consequence of Theorems 6 and 10 (the nonconvergence of BBO without elitism), Theorem 11 (the convergence of BBO with elitism), and the discussion above.

Convergence Rate.
The previous subsection analyzed the convergence properties of elite BBO, and this subsection discusses its convergence rate.The transition matrix of elite BBO after  steps can be found from (25) as follows: where If ‖Q‖ < 1 the limiting distribution of the Markov chain of BBO can be found from P∞ , which can be written as Modified BBO with elitism has been proven to converge to a global optimum in the previous subsection, and there exists a limiting distribution  * = (0) P∞ , where (0) = [ Note that elite BBO is guaranteed to converge to a global optimum regardless of the initial state.In addition, note that we can improve the convergence rate bound by decreasing the parameter .That is, reducing the number of nonabsorbing states which transition to other nonabsorbing states can accelerate the convergence of elite BBO.In spite of differences between GAs and BBO [12], we see from Theorem 14 that the convergence rate of BBO with elitism is very similar to that of GAs [22,Theorem 5].

Simulation Results
Theorem 14 gives the upper bound of the convergence rate estimate of elite BBO.In this section, we use simulation experiments to confirm this theorem.Note that in (28) the parameter  is a norm:  = ‖Q‖.Here we define ‖ ⋅ ‖ as the infinity norm ‖ ⋅ ‖ ∞ ; that is, ‖Q‖ ∞ = max  (∑  =1   ), where   is the element in the th row and the th column of matrix Q.Now note that the transition matrix P in (25) can be obtained from ( 11) and ( 23) using elementary matrix transformations.We can thus use Theorem 11 to check for BBO convergence, and we can use Theorem 14 to estimate  11) and ( 28), and the generation number of first finding an alloptimal population using BBO and GA, averaged over 25 Monte Carlo simulations.
the convergence rate of BBO.That is, we define (t) −  * as the error between a BBO population distribution and a distribution that includes at least one optimal solution.We then define the convergence criteria as an arbitrarily small error (e.g., ‖(t) −  * ‖ = 10 −6 ).We can then estimate the time  to convergence from (28) as follows: ≈ log  10 −6 .
Test functions are limited to three-bit problems with a search space cardinality of eight and a population size of four.The three fitness functions that we examine are where  1 is a unimodal one-max problem,  2 is a multimodal problem, and  3 is a deceptive problem.Note that all three problems are to be maximized.Fitness values are listed in binary order, so the first element of each fitness function corresponds to the bit string 000, the second element corresponds to the bit string 001, and so on.For the BBO parameters, we use a maximum immigration rate and maximum emigration rate of 1, and we use linear migration curves as described in (2).We test elite BBO with three different mutation rates which are applied to each bit in each individual at each generation: 0.1, 0.01, and 0.001.Note that we do not test with a zero-mutation rate because the theory in this paper requires that the mutation rate be positive (see Theorem 5).Convergence is not guaranteed unless the mutation rate is positive.Numerical calculations show that the transition matrices for these three problems satisfy the convergence conditions of Theorem 11, which indicates that the BBO algorithm converges to one or more of the global optima.As a heuristic test of Theorem 14, we use simulations to record the generation number of first obtaining a population in which all individuals are optimal, and all results are computed from 25 independent runs.Tables 1, 2, and 3 show comparisons of the theoretical convergence time , the corresponding parameter  11) and ( 28), and the generation number of first finding an alloptimal population using BBO and GA, averaged over 25 Monte Carlo simulations.The table shows the convergence time  in seconds, the corresponding  based on ( 11) and (28), and the generation number of first finding an alloptimal population using BBO and GA, averaged over 25 Monte Carlo simulations.
, and the generation number of first finding an all-optimal population, averaged over 25 independent simulations.Tables 1-3 show time to convergence and time to finding an optimum, for both BBO and GA.The table confirms the statement following Theorem 14 that the convergence behavior of BBO is similar to that of GA.The tables show that GA converges slightly faster than BBO for high mutation rates, but BBO converges slightly faster for low mutations, and this latter behavior is more important in practice because low mutations rates provide faster convergence.
Several things are notable about the results in Tables 1-3.First, the mutation rate affects the convergence rate of BBO.For all test problems, the convergence rate improves when the mutation rate decreases.We can accelerate the convergence of BBO by decreasing the mutation rate.This may provide practical guidance for BBO tuning for realworld problems.Second, by analyzing the relationship of the parameter  and the convergence time  in Tables 1-3, we see that the convergence time  is exponentially related to the parameter , as predicted by Theorem 14.Third, the theoretical results and simulation results match well for most of the test problems, which confirms the convergence rate estimate provided by Theorem 14.
The three-bit problems analyzed above are small, and the results do not tell us how convergence rates scale with the problem dimension.Also, the transition matrix grows faster than exponentially with the problem dimension and the population size [11], so realistically sized problems cannot be directly analyzed with the methods in this paper.However, our methods could be used to study the effect of BBO tuning parameters on small problems, which could provide guidance for larger, real-world problems.Also, similar population distributions could be grouped into the same Markov model state to reduce the transition matrix dimension of large problems to a manageable size [16], which could make the methods in this paper practical for realistically sized problems.

Conclusion
In this paper we modeled BBO as a homogeneous finite Markov chain to study convergence, and we obtained new theoretical results for BBO.The analysis revealed that BBO with only migration and mutation does not converge to the global optimum.However, an elite version of BBO that maintains the best solution in the population from one generation to the next converges to a population subset containing at least one globally optimal solution.In other words, BBO with elitism will converge to the global optimum in any binary optimization problem.
In addition, an upper bound for the BBO convergence rate was obtained in Theorem 14.We used simulations to confirm this theorem for a unimodal one-max problem, a multimodal problem, and a deceptive problem.The results in this paper are similar to those of the canonical GA [29], and so our results are not surprising, but this paper represents the first time that such results have been formalized for BBO.
The results in this paper are limited to binary problems but can easily be extended to discrete problems with any finite alphabet.This is due to the simple fact that any discrete problem can be reduced to a binary problem.
For future work there are several important directions.First, it is of interest to study how to improve BBO convergence time and robustness based on these results.The second important direction for future work is to study the asymptotic convergence of variations of BBO, including partial emigration-based BBO, total immigration-based BBO, and total emigration-based BBO [12].The theorems in this paper provide the foundation to study these variations, so we do not need additional theoretical tools to analyze their convergence.The third important direction for future work is to develop hybrid BBO, which combines BBO with other EAs, and study their convergence behaviors using the theory presented here.Finally, it would be of interest to extend these results to continuous optimization problems, which are the types of problems to which real-world implementations of BBO are typically applied.

Theorem 11 .
If the transition matrix P = (p  ) of an  * ()chain is a stochastic matrix with the structure P = [ C 0 R Q ], where C is a positive stochastic matrix of order | * |, and R, Q ̸ = 0, then the BBO algorithm converges to one or more of the global optima.
Theory.A finite Markov chain is a random process which has a finite number of possible state values  = {  } ( = 1, 2, . . ., ), where  is the total number of states, which is the cardinality ||.The probability that the system transitions from state   to   at time step  is given by  For each   , define emigration rate   proportional to fitness of   ,   ∈ [0, 1] For each   , define immigration rate   = 1 −   For each   For each candidate solution feature  Use   to probabilistically decide whether to immigrate to   (see (2) and (3)) If immigrating then Use {  } to probabilistically select the emigrating solution   (see (2) and (3))   () ←   () End if Next candidate solution feature Probabilistically decide whether to mutate   Next solution Algorithm 2: One generation of the BBO algorithm. is the entire population of candidate solutions,   is the th candidate solution, and   () is the th feature of   .
(), which is called the transition probability.The  ×  matrix P = (  ()) is called the transition matrix, where   ∈ [0, 1] for ,  ∈ [1, ], and ∑  =1   = 1 for all .The P matrix is called stochastic because the elements in each row sum to 1.If the transition probability is independent of , that is,   ( 1 ) =   ( 2 ) for all ,  ∈ [1, ] and for all  1 and  2 , then the Markov chain is said

Table 1 :
Convergence rate comparisons for the three-bit unimodal one-max problem  1 .

Table 2 :
Convergence rate comparison for the three-bit multimodal problem  2 .

Table 3 :
Convergence rate comparison for the three-bit deceptive problem  3 .