Multithreshold Segmentation by Using an Algorithm Based on the Behavior of Locust Swarms

As an alternative to classical techniques, the problem of image segmentation has also been handled through evolutionary methods. Recently, several algorithms based on evolutionary principles have been successfully applied to image segmentation with interesting performances. However, most of themmaintain two important limitations: (1) they frequently obtain suboptimal results (misclassifications) as a consequence of an inappropriate balance between exploration and exploitation in their search strategies; (2) the number of classes is fixed and known in advance.This paper presents an algorithm for the automatic selection of pixel classes for image segmentation. The proposed method combines a novel evolutionary method with the definition of a new objective function that appropriately evaluates the segmentation quality with respect to the number of classes.The new evolutionary algorithm, called Locust Search (LS), is based on the behavior of swarms of locusts. Different to the most of existent evolutionary algorithms, it explicitly avoids the concentration of individuals in the best positions, avoiding critical flaws such as the premature convergence to suboptimal solutions and the limited exploration-exploitation balance. Experimental tests over several benchmark functions and images validate the efficiency of the proposed technique with regard to accuracy and robustness.


Introduction
Image segmentation [1] consists in grouping image pixels based on some criteria such as intensity, color, and texture and still represents a challenging problem within the field of image processing.Edge detection [2], region-based segmentation [3], and thresholding methods [4] are the most popular solutions for image segmentation problems.
Among such algorithms, thresholding is the simplest method.It works by considering threshold (points) values to adequately separate distinct pixels regions within the image being processed.In general, thresholding methods are divided into two types depending on the number of threshold values, namely, bilevel and multilevel.In bilevel thresholding, only a threshold value is required to separate the two objects of an image (e.g., foreground and background).On the other hand, multilevel thresholding divides pixels into more than two homogeneous classes that require several threshold values.
The thresholding methods use a parametric or nonparametric approach [5].In parametric approaches [6,7], it is necessary to estimate the parameters of a probability density function that is capable of modelling each class.A nonparametric technique [8][9][10][11] employs a given criteria such as the between-class variance or the entropy and error rate, in order to determine optimal threshold values.
A common method to accomplish parametric thresholding is the modeling of the image histogram through a Gaussian mixture model [12] whose parameters define a set of pixel classes (threshold points).Therefore, each pixel that belongs to a determined class is labeled according to its corresponding threshold points with several pixel groups gathering those pixels that share a homogeneous grayscale level.

Mathematical Problems in Engineering
The problem of estimating the parameters of a Gaussian mixture that better model an image histogram has been commonly solved through the Expectation Maximization (EM) algorithm [13,14] or gradient-based methods such as Levenberg-Marquardt, LM [15].Unfortunately, EM algorithms are very sensitive to the choice of the initial values [16], meanwhile gradient-based methods are computationally expensive and may easily get stuck within local minima [17].
As an alternative to classical techniques, the problem of Gaussian mixture identification has also been handled through evolutionary methods.In general, they have demonstrated to deliver better results than those based on classical approaches in terms of accuracy and robustness [18].Under these methods, an individual is represented by a candidate Gaussian mixture model.Just as the evolution process unfolds, a set of evolutionary operators are applied in order to produce better individuals.The quality of each candidate solution is evaluated through an objective function whose final result represents the similarity between the mixture model and the histogram.Some examples of these approaches involve optimization methods such as Artificial Bee Colony (ABC) [19], Artificial Immune Systems (AIS) [20], Differential Evolution (DE) [21], Electromagnetism Optimization (EO) [22], Harmony Search (HS) [23], and Learning Automata (LA) [24].Although these algorithms own interesting results, they present two important limitations.(1) They frequently obtain suboptimal approximations as a consequence of a limited balance between exploration and exploitation in their search strategies.(2) They are based on the assumption that the number of Gaussians (classes) in the mixture is preknown and fixed; otherwise, they cannot work.The cause of the first limitation is associated with their evolutionary operators employed to modify the individual positions.In such algorithms, during their evolution, the position of each agent for the next iteration is updated yielding an attraction towards the position of the best particle seen so far or towards other promising individuals.Therefore, as the algorithm evolves, these behaviors cause that the entire population rapidly concentrates around the best particles, favoring the premature convergence and damaging the appropriate exploration of the search space [25,26].The second limitation is produced as a consequence of the objective function that evaluates the similarity between the mixture model and the histogram.Under such an objective function, the number of Gaussians functions in the mixture is fixed.Since the number of threshold values (Gaussian functions) used for image segmentation varies depending on the image, the best threshold number and values are obtained by an exhaustive trial and error procedure.
On the other hand, bioinspired algorithms represent a field of research that is concerned with the use of biology as a metaphor for producing optimization algorithms.Such approaches use our scientific understanding of biological systems as an inspiration that, at some level of abstraction, can be represented as optimization processes.
In the last decade, several optimization algorithms have been developed by a combination of deterministic rules and randomness, mimicking the behavior of natural phenomena.Such methods include the social behavior of bird flocking and fish schooling such as the Particle Swarm Optimization (PSO) algorithm [27] and the emulation of the differential evolution in species such as the Differential Evolution (DE) [28].Although PSO and DE are the most popular algorithms for solving complex optimization problems, they present serious flaws such as premature convergence and difficulty to overcome local minima [29,30].The cause for such problems is associated with the operators that modify individual positions.In such algorithms, during the evolution, the position of each agent for the next iteration is updated yielding an attraction towards the position of the best particle seen so far (in case of PSO) or towards other promising individuals (in case of DE).As the algorithm evolves, these behaviors cause that the entire population rapidly concentrates around the best particles, favoring the premature convergence and damaging the appropriate exploration of the search space [31,32].
Recently, the collective intelligent behavior of insect or animal groups in nature has attracted the attention of researchers.The intelligent behavior observed in these groups provides survival advantages, where insect aggregations of relatively simple and "unintelligent" individuals can accomplish very complex tasks using only limited local information and simple rules of behavior [33].Locusts (Schistocerca gregaria) are a representative example of such collaborative insects [34].Locust is a kind of grasshopper that can change reversibly between a solitary and a social phase, with clear behavioral differences among both phases [35].The two phases show many differences regarding the overall level of activity and the degree to which locusts are attracted or repulsed among them [36].In the solitary phase, locusts avoid contact to each other (locust concentrations).As consequence, they distribute throughout the space, exploring sufficiently over the plantation [36].On the other hand, in the social phase, locusts frantically concentrate around those elements that have already found good food sources [37].Under such a behavior, locusts attempt to efficiently find better nutrients by devastating promising areas within the plantation.
This paper presents an algorithm for the automatic selection of pixel classes for image segmentation.The proposed method combines a novel evolutionary method with the definition of a new objective function that appropriately evaluates the segmentation quality with regard to the number of classes.The new evolutionary algorithm, called Locust Search (LS), is based on the behavior presented in swarms of locusts.In the proposed algorithm, individuals emulate a group of locusts which interact to each other based on the biological laws of the cooperative swarm.The algorithm considers two different behaviors: solitary and social.Depending on the behavior, each individual is conducted by a set of evolutionary operators which mimics different cooperative conducts that are typically found in the swarm.Different to most of existent evolutionary algorithms, the behavioral model in the proposed approach explicitly avoids the concentration of individuals in the current best positions.Such fact allows avoiding critical flaws such as the premature convergence to suboptimal solutions and the incorrect exploration-exploitation balance.In order to automatically define the optimal number of pixel classes (Gaussian functions in the mixture), a new objective function has been also incorporated.The new objective function is divided into two parts.The first part evaluates the quality of each candidate solution in terms of its similarity with regard to the image histogram.The second part penalizes the overlapped area among Gaussian functions (classes).Under these circumstances, Gaussian functions that do not "positively" participate in the histogram approximation could be easily eliminated in the final Gaussian mixture model.
In order to illustrate the proficiency and robustness of the proposed approach, several numerical experiments have been conducted.Such experiments are divided into two sections.In the first part, the proposed LS method is compared to other well-known evolutionary techniques over a set of benchmark functions.In the second part, the performance of the proposed segmentation algorithm is compared to other segmentation methods which are also based on evolutionary principles.The results in both cases validate the efficiency of the proposed technique with regard to accuracy and robustness.
This paper is organized as follows: in Section 2 basic biological issues of the algorithm analogy are introduced and explained.Section 3 describes the novel LS algorithm and its characteristics.A numerical study on different benchmark function is presented in Section 4 while Section 5 presents the modelling of an image histogram through a Gaussian mixture.Section 6 exposes the LS segmentation algorithm and Section 7 the performance of the proposed segmentation algorithm.Finally, Section 8 draws some conclusions.

Biological Fundamentals and Mathematical Models
Social insect societies are complex cooperative systems that self-organize within a set of constraints.Cooperative groups are good at manipulating and exploiting their environment, defending resources and breeding, yet allowing task specialization among group members [38,39].A social insect colony functions as an integrated unit that not only possesses the ability to operate at a distributed manner but also undertakes a huge construction of global projects [40].It is important to acknowledge that global order for insects can arise as a result of internal interactions among members.
Locusts are a kind of grasshoppers that exhibit two opposite behavioral phases: solitary and social (gregarious).Individuals in the solitary phase avoid contact to each other (locust concentrations).As consequence, they distribute throughout the space while sufficiently exploring the plantation [36].In contrast, locusts in the gregarious phase gather into several concentrations.Such congregations may contain up to 10 10 members, cover cross-sectional areas of up to 10 km 2 , and a travelling capacity up to 10 km per day for a period of days or weeks as they feed causing a devastating crop loss [41].The mechanism to switch from the solitary phase to the gregarious phase is complex and has been a subject of significant biological inquiry.Recently, a set of factors has been implicated to include geometry of the vegetation landscape and the olfactory stimulus [42].
Only few works [36,37] that mathematically model the locust behavior have been published.Both approaches develop two different minimal models with the goal of reproducing the macroscopic structure and motion of a group of locusts.Considering that the method in [36] focuses on modelling the behavior for each locust in the group, its fundamentals have been employed to develop the algorithm that is proposed in this paper.
2.1.Solitary Phase.This section describes how each locust's position is modified as a result of its behavior under the solitary phase.Considering that x   represents the current position of the th locust in a group of  different elements, the new position x +1  is calculated by using the following model: with Δx  corresponding to the change of position that is experimented by x   as a consequence of its social interaction with all other elements in the group.
Two locusts in the solitary phase exert forces on each other according to basic biological principles of attraction and repulsion (see, e.g., [36]).Repulsion operates quite strongly over a short length scale in order to avoid concentrations.Attraction is weaker and operates over a longer length scale, providing the social force that is required to maintain the group's cohesion.Therefore, the strength of such social forces can be modeled by the following function: Here,  is a distance,  describes the strength of attraction, and  is the typical attractive length scale.We have scaled the time and the space coordinates so that the repulsive strength and length scale are both represented by the unity.We assume that  < 1 and  > 1 so that repulsion is stronger and features in a shorter-scale, while attraction is applied in a weaker and longer-scale; both facts are typical for social organisms [21].The social force exerted by locust  over locust  is where   = ‖x  − x  ‖ is the distance between the two locusts and d  = (x  − x  )/  is the unit vector pointing from x  to x  .The total social force on each locust can be modeled as the superposition of all of the pairwise interactions: The change of position Δx  is modeled as the total social force experimented by x   as the superposition of all of the pairwise interactions.Therefore, Δx  is defined as follows: In order to illustrate the behavioral model under the solitary phase, Figure 1 presents an example, assuming a population of  three different members ( = 3) which adopt a determined configuration in the current iteration .As a consequence of the social forces, each element suffers an attraction or repulsion to other elements depending on the distance among them.Such forces are represented by s 12 , s 13 , s 21 , s 23 , s 31 , and s 32 .Since x 1 and x 2 are too close, the social forces s 12 and s 13 present a repulsive nature.On the other hand, as the distances ‖x 1 − x 3 ‖ and ‖x 2 − x 3 ‖ are quite long, the social forces s 13 , s 23 , s 31 , and s 32 between x 1 ↔ x 3 and x 2 ↔ x 3 all belong to the attractive nature.Therefore, the change of position Δx 1 is computed as the vector resultant between s 12 and s 13 (Δx 1 = s 12 + s 13 ) is S 1 .The values Δx 2 and Δx 3 are also calculated accordingly.
In addition to the presented model [36], some studies [43][44][45] suggest that the social force s  is also affected by the dominance of the involved individuals x  and x  in the pairwise process.Dominance is a property that relatively qualifies the capacity of an individual to survive, in relation to other elements in a group.The locust's dominance is determined by several characteristics such as size, chemical emissions, and location with regard to food sources.Under such circumstances, the social force is magnified or weakened depending on the most dominant individual that is involved in the repulsion-attraction process.

Social Phase.
In this phase, locusts frantically concentrate around the elements that have already found good food sources.They attempt to efficiently find better nutrients by devastating promising areas within the plantation.In order to simulate the social phase, the food quality index Fq  is assigned to each locust x  of the group as such index reflects the quality of the food source where x  is located.
Under this behavioral model, each of the  elements of the group is ranked according to its corresponding food quality index.Afterwards, the  elements featuring the best food quality indexes are selected ( ≪ ).Considering a concentration radius   that is created around each selected element, a set of  new locusts is randomly generated inside   .As a result, most of the locusts will be concentrated around the best  elements.Figure 2 shows a simple example of the behavioral model under the social phase.In the example, the configuration includes eight locusts ( = 8), just as it is illustrated by Figure 2(a) that also presents the food quality index for each locust.A food quality index near to one indicates a better food source.Therefore, Figure 2(b) presents the final configuration after the social phase, assuming  = 2.

The Locust Search (LS) Algorithm
In this paper, some behavioral principles drawn from a swarm of locusts have been used as guidelines for developing a new swarm optimization algorithm.The LS assumes that entire search space is a plantation, where all the locusts interact to each other.In the proposed approach, each solution within the search space represents a locust position inside the plantation.Every locust receives a food quality index according to the fitness value of the solution that is symbolized by the locust's position.As it has been previously discussed, the algorithm implements two different behavioral schemes: solitary and social.Depending on each behavioral phase, each individual is conducted by a set of evolutionary operators which mimics the different cooperative operations that are typically found in the swarm.The proposed method formulates the optimization problem in the following form: maximize/minimize  (l) , l = ( 1 , . . .,   ) ∈ R  subject to l ∈ S, (6) where  : R  → R is a nonlinear function whereas S = {l ∈ R  |   ≤   ≤   ,  = 1, . . ., } is a bounded feasible search space, which is constrained by the lower (  ) and upper (  ) limits.
In order to solve the problem formulated in (6), a population L  ({l  1 , l  2 , . . ., l   }) of  locusts (individuals) is evolved inside the LS operation from the initial point ( = 0) to a total  number of iterations ( = ).Each locust l   ( ∈ [1, . . ., ]) represents an -dimensional vector {  ,1 ,   ,2 , . . .,   , } where each dimension corresponds to a decision variable of the optimization problem to be solved.The set of decision variables constitutes the feasible search space S = {l   ∈ R  |   ≤   , ≤   }, where   and   correspond to the lower and upper bounds for the dimension , respectively.The food quality index that is associated with each locust l   (candidate solution) is evaluated through an objective function (l   ) whose final result represents the fitness value of l   .In the LS algorithm, each iteration of the evolution process consists of two operators: (A) solitary and (B) social.Beginning by the solitary stage, the set of locusts is operated in order to sufficiently explore the search space.On the other hand, the social operation refines existent solutions within a determined neighborhood (exploitation) subspace.

Solitary Operation (A).
One of the most interesting features of the proposed method is the use of the solitary operator to modify the current locust positions.Under this approach, locusts are displaced as a consequence of the social forces produced by the positional relations among the elements of the swarm.Therefore, near individuals tend to repel each other, avoiding the concentration of elements in regions.On the other hand, distant individuals tend to attract to each other, maintaining the cohesion of the swarm.A clear difference to the original model in [20] considers that social forces are also magnified or weakened depending on the most dominant (best fitness values) individuals that are involved in the repulsion-attraction process. and the other  − 1 individuals in the swarm.In the original model, social forces are calculated by using (3).However, in the proposed method, it is modified to include the best fitness value (the most dominant) of the individuals involved in the repulsion-attraction process.Therefore, the social force, that is exerted between l   and l   , is calculated by using the following new model: where (  ) is the social force strength defined in (2) and d  = (l   − l   )/  is the unit vector pointing from l   to l   .Besides, rand(1, −1) is a randomly generated number between 1 and −1.Likewise, (l   , l   ) is the dominance function that calculates the dominance value of the most dominant individual from l   and l   .In order to operate (l   , l   ), all the individuals from L  ({l  1 , l  2 , . . ., l   }) are ranked according to their fitness values.The ranks are assigned so that the best individual receives the rank 0 (zero) whereas the worst individual obtains the rank  − 1.Therefore, the function (l   , l   ) is defined as follows: where the function rank() delivers the rank of the individual.According to (8), (l   , l   ) yields a value within the interval (1, 0).Its maximum value of one in (l   , l   ) is reached when either individual l   or l   is the best element of the population L  regarding their fitness values.On the other hand, a value close to zero is obtained when both individuals l   and l   possess quite bad fitness values.Figure 3 shows the behavior of (l   , l   ) considering 100 individuals.In the figure, it is assumed that l   represents one of the 99 individuals with ranks between 0 and 98 whereas l   is fixed to the element with the worst fitness value (rank 99).
Under the incorporation of (l   , l   ) in ( 7), social forces are magnified or weakened depending on the best fitness value (the most dominant) of the individuals involved in the repulsion-attraction process.Finally, the total social force on each individual l   is modeled as the superposition of all of the pairwise interactions exerted over it: Therefore, the change of position Δl  is considered as the total social force experimented by l   as the superposition of all of the pairwise interactions.Thus, Δl  is defined as follows: After calculating the new positions P ({p 1 , p 2 , . . ., p  }) of the population L  ({l  1 , l  2 , . . ., l   }), the final positions F ({f 1 , f 2 , . . ., f  }) must be calculated.The idea is to admit only the changes that guarantee an improvement in the search strategy.If the fitness value of p  ((p  )) is better than l   ((l   )), then p  is accepted as the final solution.Otherwise, l   is retained.This procedure can be resumed by the following statement (considering a minimization problem): In order to illustrate the performance of the solitary operator, Figure 4 presents a simple example with the solitary operator being iteratively applied.A population of 50 different members ( = 50) is assumed which adopt a concentrated configuration as initial condition (Figure 4

Social Operation (B).
The social procedure represents the exploitation phase of the LS algorithm.Exploitation is the process of refining existent individuals within a small neighborhood in order to improve their solution quality.
The social procedure is a selective operation which is applied only to a subset E of the final positions F (where E ⊆ F).The operation starts by sorting F with respect to fitness values, storing the elements in a temporary population B = {b 1 , b 2 , . . ., b  }.The elements in B are sorted so that the best individual receives the position b 1 whereas the worst individual obtains the location b  .Therefore, the subset E is integrated by only the first  locations of B (promising solutions).Under this operation, a subspace   is created around each selected particle f  ∈ E. The size of   depends on the distance   which is defined as follows: where   and   are the upper and lower bounds in the th dimension and  is the number of dimensions of the optimization problem, whereas  ∈ [0, 1] is a tuning factor.Therefore, the limits of   can be modeled as follows: where    and    are the upper and lower bounds of the th dimension for the subspace   , respectively.
Considering the subspace   around each element f  ∈ E, a set of ℎ new particles (M ℎ  = {m 1  , m 2  , . . ., m ℎ  }) are randomly generated inside bounds fixed by (13).Once the ℎ samples are generated, the individual l +1  of the next population L +1 must be created.In order to calculate l +1  , the best particle m   , in terms of its fitness value from the ℎ samples (where is better than f  according to their fitness values, l +1  is updated with m   ; otherwise, f  is selected.The elements of F that have not been processed by the procedure (f  ∉ E) transfer their corresponding values to L +1 with no change.
The social operation is used to exploit only prominent solutions.According to the proposed method, inside each subspace   , ℎ random samples are selected.Since the number of selected samples at each subspace is very small (typically ℎ < 4), the use of this operator substantially reduces the number of fitness function evaluations.
In order to demonstrate the social operation, a numerical example has been set by applying the proposed process to a simple function.Such function considers the interval of −3 ≤  1 ,  2 ≤ 3 whereas the function possesses one global maxima of value 8.1 at (0, 1.6).Notice that  1 and  2 correspond to the axis coordinates (commonly  and ).For this example, a final position population F of six 2-dimensional members ( = 6) is assumed.Figure 5 shows the initial configuration of the proposed example, with the black points representing half of the particles with the best fitness values (the first three elements of B,  = 3) whereas the grey points (f 2 , f 4 , f 6 ∉ E) correspond to the remaining individuals.From Figure 5, it can be seen that the social procedure is applied to all black particles (f , and m 2 5 , for each black point inside of their corresponding subspaces  1 ,  3 , and  5 .Considering the particle f 3 in Figure 7, the particle m 2  3 corresponds to the best particle (m  3 ) from the two randomly generated particles (according to their fitness values) within  3 .Therefore, the particle m  3 will substitute f 3 in the individual l +1  3 for the next generation, since it holds a better fitness value than f 3 ((f 3 )<(m  3 )).The LS optimization procedure is defined over a bounded search space S. Search points that do not belong to such area are considered to be infeasible.However, during the evolution process, some candidate solutions could fall outside the search space.In the proposed approach, such infeasible solutions are arbitrarily placed with a random position inside the search space S.

Complete LS Algorithm.
LS is a simple algorithm with only seven adjustable parameters: the strength of attraction , the attractive length , number of promising solutions , the population size , the tuning factor , the number of random samples ℎ, and the number of generations .The operation of LS is divided into three parts: initialization of the solitary and social operations.In the initialization ( = 0), the first population In the evolution process, the solitary (A) and social (B) operations are iteratively applied until the number of iterations  =  has been reached.The complete LS procedure is illustrated in Algorithm 1.

Discussion about the LS Algorithm.
Evolutionary algorithms (EA) have been widely employed for solving complex optimization problems.These methods are found to be more powerful than conventional methods that are based on formal logics or mathematical programming [46].In the EA algorithm, search agents have to decide whether to explore unknown search positions or to exploit already tested positions in order to improve their solution quality.Pure exploration degrades the precision of the evolutionary process but increases its capacity to find new potentially solutions.On the other hand, pure exploitation allows refining existent solutions but adversely drives the process to local optimal solutions.Therefore, the ability of an EA to find a global optimal solution depends on its capacity to find a good balance between the exploitation of found-so-far elements and the exploration of the search space [47].
Most of swarm algorithms and other evolutionary algorithms tend to exclusively concentrate the individuals in the current best positions.Under such circumstances, such algorithms seriously limit their exploration-exploitation capacities.
Different to most of existent evolutionary algorithms, in the proposed approach, the modeled behavior explicitly avoids the concentration of individuals in the current best positions.Such fact allows not only to emulate the cooperative behavior of the locust colony in a good realistic way but also to incorporate a computational mechanism to avoid critical flaws that are commonly present in the popular evolutionary algorithms, such as the premature convergence and the incorrect exploration-exploitation balance.
It is important to emphasize that the proposed approach conducts two operators (solitary and social) within a single iteration.Such operators are similar to those that are used by other evolutionary methods such as ABC (employed bees, onlooker bees, and scout bees), AIS (clonal proliferation operator, affinity maturation operator, and clonal selection operator), and DE (mutation, crossover, and selection), which are all executed in a single iteration.

Numerical Experiments over Benchmark Functions
A comprehensive set of 13 functions, all collected from [48][49][50], has been used to test the performance of the LS approach as an optimization method.Tables 11 and 12 11 and 12.
We have applied the LS algorithm to 13 functions whose results have been compared to those produced by the Particle Swarm Optimization (PSO) method [27] and the Differential Evolution (DE) algorithm [28], both considered as the most popular algorithms for many optimization applications.In all comparisons, the population has been set to 40 ( = 40) individuals.The maximum iteration number for all functions has been set to 1000.Such stop criterion has been selected to maintain compatibility to similar works reported in the literature [48,49].
The parameter setting for each of the algorithms in the comparison is described as follows: (1) PSO: in the algorithm,  1 =  2 = 2 while the inertia factor () is decreased linearly from 0.9 to 0.2.(2) DE: the DE/Rand/1 scheme is employed.The parameter settings follow the instructions in [28,51].The crossover probability is CR = 0.9 and the weighting factor is  = 0.8.(3) In LS,  and  are set to 0.6 and , respectively.
Besides,  is fixed to 20 (/2), ℎ = 2,  = 0.6 whereas  and  are configured to 1000 and 40, respectively.Once such parameters have been experimentally determined, they are kept for all experiments in this section.
In the comparison, three indexes are considered: the average best-so-far solution (ABS), the standard deviation (SD), and the number of function evaluations (NFE).The first two indexes assess the accuracy of the solution whereas the last one measures the computational cost.The average best-so-far solution (ABS) expresses the average value of the best function evaluations that have been obtained from 30 independent executions.The standard deviation (SD) indicates the dispersion of the ABS values.Evolutionary methods are, in general, complex pieces of software with several operators and stochastic branches.Therefore, it is difficult to conduct a complexity analysis from a deterministic perspective.Under such circumstances, it is more appropriate .Such functions maintain a narrow curving valley that is hard to optimize, in case the search space cannot be explored properly and the direction changes cannot be kept up with [54].For this reason, the performance differences are directly related to a better trade-off between exploration and exploitation that is produced by LS operators.In the practice, a main goal of an optimization algorithm is to find a solution as good as possible within a small time window.The computational cost for the optimizer is represented by its NFE values.According to Table 1, the NFE values that are obtained by the proposed method are smaller than its A nonparametric statistical significance proof known as Wilcoxon's rank sum test for independent samples [55,56] has been conducted with an 5% significance level, over the "average best-so-far" data of Table 1.Table 2 reports the  values produced by Wilcoxon's test for the pairwise comparison of the "average best-so-far" of two groups.Such groups are formed by LS versus PSO and LS versus DE.As a null hypothesis, it is assumed that there is no significant difference between mean values of the two algorithms.The alternative hypothesis considers a significant difference between the "average best-so-far" values of both approaches.All  values reported in the table are less than 0.05 (5% significance level) which is a strong evidence against the null hypothesis, indicating that the LS results are statistically significant and that it has not occurred by coincidence (i.e., due to the normal noise contained in the process).

Multimodal Test Functions.
Multimodal functions possess many local minima which make the optimization a difficult task to be accomplished.In multimodal functions, the results reflect the algorithm's ability to escape from local optima.We have applied the algorithms over functions  8 to  13 where the number of local minima increases exponentially as the dimension of the function increases.The dimension of such functions is set to 30.The results are averaged over 30 runs, with performance indexes being reported in Table 3 as follows: the average best-so-far solution (ABS), the standard deviation (SD), and the number of function evaluations (NFE).Likewise,  values of the Wilcoxon signed-rank test of 30 independent runs are listed in Table 4. From the results, it is clear that LS yields better solutions than others algorithms for functions  9 ,  10 ,  11 , and  12 , in terms of the indexes ABS and SD.However, for functions  8 and  13 , LS produces similar results to DE.The Wilcoxon rank test results, that are presented in Table 6, confirm that  3, the NFE values obtained by the proposed method are smaller than those produced by other optimizers.The reason of this remarkable performance is associated with its two operators: (i) the solitary operator allows a better particle distribution in the search space, increasing the algorithm's ability to find the global optima and (ii) the use of the social operation provides a simple exploitation operator that intensifies the capacity of finding better solutions during the evolution process.

Gaussian Mixture Modelling
In this section, the modeling of image histograms through Gaussian mixture models is presented.Let one consider an image holding  gray levels [0, . . .,  − 1] whose distribution is defined by a histogram ℎ() represented by the following formulation: where   denotes the number of pixels with gray level  and Np the total number of pixels in the image.Under such circumstances, ℎ() can be modeled by using a mixture () of Gaussian functions of the form: where  symbolizes the number of Gaussian functions of the model whereas   is the a priori probability of function .  and   represent the mean and standard deviation of the th Gaussian function, respectively.Furthermore, the constraint ∑  =1   = 1 must be satisfied by the model.In order to evaluate the similarity between the image histogram and a candidate mixture model, the mean square error can be used as follows: where  represents the penalty associated with the constrain ∑  =1   = 1.Therefore,  is considered as the objective function which must be minimized in the estimation problem.In order to illustrate the histogram modeling through a Gaussian mixture, Figure 6 presents an example, assuming three classes, that is,  = 3. Considering Figure 6(a) as the image histogram ℎ(), the Gaussian mixture (), that is shown in Figure 6(c), is produced by adding the Gaussian functions  1 (),  2 (), and  3 () in the configuration presented in Figure 6(b).Once the model parameters that better model the image histogram have been determined, the final step is to define the threshold values   ( ∈ [1, . . ., ]) which can be calculated by simple standard methods, just as it is presented in [19][20][21].

Segmentation Algorithm Based on LS
In the proposed method, the segmentation process is approached as an optimization problem.Computational optimization generally involves two distinct elements: (1) a search strategy to produce candidate solutions (individuals, particles, insects, locust, etc.) and ( 2) an objective function that evaluates the quality of each selected candidate solution.Several computational algorithms are available to perform the first element.The second element, where the objective function is designed, is unquestionably the most critical.In the objective function, it is expected to embody the full complexity of the performance, biases, and restrictions of the problem to be solved.In the segmentation problem, candidate solutions represent Gaussian mixtures.The objective function  (17) is used as a fitness value to evaluate the similarity presented between the Gaussian mixture and the image histogram.Therefore, guided by the fitness values ( values), a set of encoded candidate solutions are evolved using the evolutionary operators until the best possible resemblance can be found.Over the last decade, several algorithms based on evolutionary and swarm principles [19][20][21][22] have been proposed to solve the problem of segmentation through a Gaussian mixture model.Although these algorithms own certain good performance indexes, they present two important limitations.
(1) They frequently obtain suboptimal approximations as a consequence of an inappropriate balance between exploration and exploitation in their search strategies.(2) They are based on the assumption that the number of Gaussians (classes) in the mixture is preknown and fixed; otherwise, they cannot work.
In order to eliminate such flaws, the proposed approach includes (A) a new search strategy and (B) the definition of a new objective function.For the search strategy, the LS method (Section 4) is adopted.Under LS, the concentration of individuals in the current best positions is explicitly avoided.Such fact allows reducing critical problems such as the premature convergence to suboptimal solutions and the incorrect exploration-exploitation balance.Since the proposed approach aims to automatically select the number of Gaussian functions  in the final mixture (), the objective function must be modified.The new objective function  new is defined as follows: where  is a scaling constant.The new objective function is divided into two parts.The first part  evaluates the quality of each candidate solution in terms of its similarity with regard to the image histogram (17).The second part  penalizes the overlapped area among Gaussian functions (classes), with  being defined as follows: where  and  represent the number of classes and the gray levels, respectively.The parameters   () and   () symbolize the Gaussian functions  and , respectively, that are to be evaluated on the point  (gray level) whereas the elements   and   represent the a priori probabilities of the Gaussian functions  and , respectively.Under such circumstances, mixtures with Gaussian functions that do not "positively" participate in the histogram approximation are severely penalized.
Figure 7 illustrates the effect of the new objective function  new in the evaluation of Gaussian mixtures (candidate solutions).From the image histogram (Figure 7(a)), it is evident that two Gaussian functions are enough to accurately

Gray level
Frequency p 1 (x) Gray level approximate the original histogram.However, if the Gaussian mixture is modeled by using a greater number of functions (e.g., four as it is shown in Figure 7(b)), the original objective function  is unable to obtain a reasonable approximation.
Under the new objective function  new , the overlapped area among Gaussian functions (classes) is penalized.Such areas, in Figure 7(c), correspond to  12 ,  23 , and  34 , where  12 represents the penalization value produced between the Gaussian function  1 () and  2 ().Therefore, due to the penalization, the Gaussian mixture shown in Figures 7(b) and 7(c) provides a solution of low quality.On the other hand, the Gaussian mixture presented in Figure 7(d) maintains a low penalty; thus, it represents a solution of high quality.From Figure 7(d), it is easy to see that functions  1 () and  4 () can be removed from the final mixture.This elimination could be performed by a simple comparison with a threshold value , since  1 () and  4 () have a reduced amplitude ( 1 () ≈  2 () ≈ 0).Therefore, under  new , it is possible to find the reduced Gaussian mixture model starting from a considerable number of functions.Since the proposed segmentation method is conceived as an optimization problem, the overall operation can be reduced to solve the formulation of (20) by using the LS algorithm: where   ,   , and   represent the probability, mean, and standard deviation of the class .It is important to remark that the new objective function  new allows the evaluation of a candidate solution in terms of the necessary number of Gaussian functions and its approximation quality.Under such circumstances, it can be used in combination with any other evolutionary method and not only with the proposed LS algorithm.

Complete Segmentation Algorithm.
Once the new search strategy (LS) and objective function ( new ) have been defined, the proposed segmentation algorithm can be summarized by Algorithm 2. The new algorithm combines operators defined by LS and operations for calculating the threshold values.
(Line 1) The algorithm sets the operative parameters , , , , , , , and .They rule the behavior of the segmentation algorithm.(Line 2) Afterwards, the population L 0 is initialized considering  different random Gaussian mixtures of  functions.The idea is to generate an -random Gaussian mixture subjected to the constraints formulated in (20).The parameter  must hold a high value in order to correctly segment all images (recall that the algorithm is able to reduce the Gaussian mixture to its minimum expression).(Line 3) Then, the Gaussian mixtures are evolved by using the LS operators and the new objective function  new .This process is repeated during  cycles.(Line 8) After this procedure, the best Gaussian mixture l   is selected according to its objective function  new .(Line 9) Then, the Gaussian mixture l   is reduced by eliminating those functions whose amplitudes are lower than  (  () ≤ ).(Line 10) Then, the threshold values   from the reduced model are calculated.(Line 11) Finally, the calculated   values are employed to segment the image.Figure 8 shows a flowchart of the complete segmentation procedure.The proposed segmentation algorithm permits to automatically detect the number of segmentation partitions (classes).Furthermore, due to its remarkable search capacities, the LS method maintains a better accuracy than previous algorithms based on evolutionary principles.However, the proposed method presents two disadvantages: first, it is related to its implementation which in general is more complex than most of the other segmentators based on evolutionary basics.The second refers to the segmentation procedure of the proposed approach which does not consider any spatial pixel characteristics.As a consequence, pixels that may belong to a determined region due to its position are labeled as a part of another region due to its gray level intensity.Such a fact adversely affects the segmentation performance of the method.

Segmentation Results
This section analyses the performance of the proposed segmentation algorithm.The discussion is divided into three parts: the first one shows the performance of the proposed LS segmentation algorithm while the second presents a comparison between the proposed approach and others segmentation algorithms that are based on evolutionary and swam methods.The comparison mainly considers the capacities of each algorithm to accurately and robustly approximate the image histogram.Finally, the third part presents an objective evaluation of segmentation results produced by all algorithms that have been employed in the comparisons.

Performance of LS Algorithm in Image Segmentation.
This section presents two experiments that analyze the performance of the proposed approach.Table 5 presents the algorithm's parameters that have been experimentally determined and kept for all the test images through all experiments.
First Image.The first test considers the histogram shown by Figure 9(b) while Figure 9(a) presents the original image.After applying the proposed algorithm, just as it has been configured according to parameters in Table 5, a minimum model of four classes emerges after the start from Gaussian 29.9 10.1 mixtures of 10 classes.Considering 30 independent executions, the averaged parameters of the resultant Gaussian mixture are presented in Table 6.One final Gaussian mixture (ten classes), which has been obtained by LS, is presented in Figure 10.Furthermore, the approximation of the reduced Gaussian mixture is also visually compared with the original histogram in Figure 10.On the other hand, Figure 11 presents the segmented image after calculating the threshold points.
Second Image.For the second experiment, the image in Figure 12 is tested.The method aims to segment the image by using a reduced Gaussian mixture model obtained by the LS approach.After executing the algorithm according to parameters from Table 5, the resulting averaged parameters of the resultant Gaussian mixture are presented in Table 6.In order to assure consistency, the results are averaged considering 30 independent executions.Figure 13 shows the approximation quality that is obtained by the reduced Gaussian mixture model in (a) and the segmented image in (b).(1) (3) that are reported in the literature [18].The parameter setting for each of the segmentation algorithms in the comparison is described as follows: (1)  + ABC [19]: in the algorithm, its parameters are configured as follows: the abandonment limit = 100,  = 0.05 and limit = 30.
(4) In  new + LS, the method is set according to the values described in Table 5.
In order to conduct the experiments, a synthetic image is designed to be used as a reference in the comparisons.The main idea is to know in advance the exact number of classes (and their parameters) that are contained in the image so that the histogram can be considered as a ground truth.The synthetic image is divided into four sections.Each section corresponds to a different class which is produced by setting each gray level pixel V  to a value that is determined by the following model: where  represents the section, whereas   and   are the mean and the dispersion of the gray level pixels, respectively.The comparative study employs the image of 512 × 512 that is shown in Figure 14(a) and the algorithm's parameters that have been presented in Table 7.In the comparison, the discussion focuses on the following issues: first of all, accuracy; second, convergence; and third, computational cost.
Convergence.This section analyzes the approximation convergence when the number of classes that are used by the algorithm during the evolution process is different to the actual number of classes in the image.Recall that the proposed approach automatically finds the reduced Gaussian mixture which better adapts to the image histogram.
In the experiment, the methods,  + ABC,  + AIS, and  + DE, are executed considering Gaussian mixtures composed of 8 functions.Under such circumstances, the number of classes to be detected is higher than the actual number of classes in the image.On the other hand, the proposed algorithm maintains the same configuration of Table 5.Therefore, it can detect and calculate up to ten classes ( = 10).
As a result, the techniques  + ABC,  + AIS, and  + DE tend to overestimate the image histogram.This effect can be seen in Figure 15(a), where the resulting Gaussian functions are concentrated within actual classes.Such a behavior is a consequence of the evaluation that is considered by the original objective function , which privileges only the approximation between the Gaussian mixture and the image histogram.This effect can be graphically illustrated by Figure 16(a) that shows the pixel misclassification produced by the wrong segmentation of Figure 14( It is evident from Figure 15 that the techniques,  + ABC,  + AIS, and  + DE, all need an a priori knowledge of the number of classes that are contained in the actual histogram in order to obtain a satisfactory result.On the other hand, the proposed algorithm is able to find a reduced Gaussian mixture whose classes coincide with the actual number of classes that are contained in the image histogram.
Accuracy.In this section, the comparison among the algorithms in terms of accuracy is reported.Most of the reported comparisons [19][20][21][22][23][24][25][26] are concerned about comparing the parameters of the resultant Gaussian mixtures by using real images.Under such circumstances, it is difficult to consider a clear reference in order to define a meaningful error.Therefore, the image defined in Figure 14 has been used in the experiments because its construction parameters are clearly defined in Table 7.
Since the parameter values from Table 7 act as ground truth, a simple averaged difference between them and the values that are computed by each algorithm could be used as comparison error.However, as each parameter maintains different active intervals, it is necessary to express the differences in terms of percentage.Therefore, if Δ is the parametric difference and  the ground truth parameter, the percentage error Δ% can be defined as follows: In the segmentation problem, each Gaussian mixture represents a -dimensional model where each dimension corresponds to a Gaussian function of the optimization problem to be solved.Since each Gaussian function possesses three parameters   ,   , and   , the complete number of parameters is 3 ⋅  dimensions.Therefore, the final error  produced by the final Gaussian mixture is where  V ∈ (  ,   ,   ).
In order to compare accuracy, the algorithms,  + ABC,  + AIS,  + DE, and the proposed approach are all executed over the image shown by Figure 14(a).The experiment aims to estimate the Gaussian mixture that better approximates the actual image histogram.Methods  + ABC,  + AIS, and  + DE consider Gaussian mixtures composed of 4 functions ( = 4).In case of the  new + LS method, although the algorithm finds a reduced Gaussian mixture of four functions, it is initially set with ten functions ( = 10).Table 8 presents the final Gaussian mixture parameters and the final error .The final Gaussian mixture parameters have been averaged over 30 independent executions in order to assure consistency.A close inspection of Table 8 reveals that the proposed method is able to achieve the smallest error () in comparison to the other algorithms.Figure 16 presents the histogram approximations that are produced by each algorithm whereas Figure 17 shows their correspondent segmented images.Both illustrations present the median case obtained throughout 30 runs. Figure 18 exhibits that  + ABC,  + AIS, and  + DE present different levels of misclassifications which are nearly absent in the proposed approach case.
Computational Cost.The experiment aims to measure the complexity and the computing time spent by the  + ABC,   9 shows the averaged measurements after 30 experiments.It is evident that the  + ABC and  + DE are the slowest to converge (iterations) and the  + AIS shows the highest computational cost (time elapsed) because it requires operators which demand long times.On the other hand, the  new + LS shows an acceptable trade-off between its convergence time and its computational cost.Therefore, although the implementation of  new + LS in general requires more code than most of other evolution-based segmentators, such a fact is not reflected in the execution time.Finally, Figure 19 below shows the segmented images as they are generated by each algorithm.It can be seen that the proposed approach generate more homogeneous regions whereas  + ABC,  + AIS, and  + DE present several artifacts that are produced by an incorrect pixel classification.

Performance Evaluation of the Segmentation Results
. This section presents an objective evaluation of segmentation results that are produced by all algorithms in the comparisons.The ill-defined nature of the segmentation problem makes the evaluation of a candidate algorithm difficult [57].Traditionally, the evaluation has been conducted by using some supervised criteria [58] which are based on the computation of a dissimilarity measure between a segmentation result and a ground truth image.Recently, the use of unsupervised measures has substituted supervised indexes for the objective evaluation of segmentation results [59].They  Evaluation Criteria.In this paper, the unsupervised index ROS proposed by Chabrier et al. [60] has been used to objectively evaluate the performance of each candidate algorithm.This index evaluates the segmentation quality in terms of the homogeneity within segmented regions and the heterogeneity among the different regions.ROS can be computed as follows: where  quantifies the homogeneity within segmented regions.Similarly,  measures the disparity among the regions.A segmentation result  1 is considered better than  2 , if ROS  1 > ROS  2 .The interregion homogeneity characterized by  is calculated considering the following formulation: where   represents the number of partitions in which the image has been segmented.  symbolizes the number of where   is the average gray level in the partition .
Experimental Protocol.In the comparison of segmentation results, a set of four classical images has been chosen to integrate the experimental set (Figure 20).The segmentation methods used in the comparison are  + ABC [19],  + AIS [20], and  + DE [21].From all segmentation methods used in the comparison, the proposed  new + LS algorithm is the only one that has the capacity to automatically detect the number of segmentation partitions (classes).In order to conduct a fair comparison, all algorithms have been proved by using the same number Figure 21 presents the segmentation results obtained by each algorithm considering the experimental set from Figure 20.On the other hand, Table 10 shows the evaluation of the segmentation results in terms of the ROS index.Such values represent the averaged measurements after 30 executions.From them, it can be seen that the proposed  new + LS method obtains the best ROS indexes.Such values indicate that the proposed algorithm maintains the best balance between the homogeneity within segmented regions and the heterogeneity among the different regions.From Figure 21, it can be seen that the proposed approach generates more homogeneous regions whereas  + ABC,  + AIS, and  + DE present several artifacts that are produced by an incorrect pixel classification.

Conclusions
Despite the fact that several evolutionary methods have been successfully applied to image segmentation with interesting results, most of them have exhibited two important limitations: (1) they frequently obtain suboptimal results (misclassifications) as a consequence of an inappropriate balance between exploration and exploitation in their search strategies; (2) the number of classes is fixed and known in advance.
In this paper, a new swarm algorithm for the automatic image segmentation, called the Locust Search (LS), has been presented.The proposed method eliminates the typical flaws presented by previous evolutionary approaches by combining a novel evolutionary method with the definition of a new objective function that appropriately evaluates the segmentation quality with respect to the number of classes.In order to illustrate the proficiency and robustness of the proposed approach, several numerical experiments have been conducted.Such experiments have been divided into two parts.First, the proposed LS method has been compared to other well-known evolutionary techniques on a set of benchmark functions.In the second part, the performance of the proposed segmentation algorithm has been compared to other segmentation methods based on evolutionary principles.The results in both cases validate the efficiency of the proposed technique with regard to accuracy and robustness.
Several research directions will be considered for future work such as the inclusion of other indexes to evaluate similarity between a candidate solution and the image histogram, the consideration of spatial pixel characteristics in the objective function, the modification of the evolutionary LS operators to control the exploration-exploitation balance, and the conversion of the segmentation procedure into a multiobjective problem.

Figure 1 :
Figure 1: Behavioral model under the solitary phase.

Figure 4 :
Figure 4: Examples of different distributions.(a) Initial condition, (b) distribution after applying 25, (c) 50 and (d) 100 operations.The green asterisk represents the minimum value so far.
(a)).As a consequence of the social forces, the set of elements tends to distribute throughout the search space.Examples of different distributions are shown in Figures 4(b), 4(c), and 4(d) after applying 25, 50, and 100 different solitary operations, respectively.

Figure 5 :
Figure 5: Operation of the social procedure.

6. 1 .
New Objective Function  new .Previous segmentation algorithms based on evolutionary and swarm methods use (17) as objective function.Under these circumstances, each candidate solution (Gaussian mixture) is only evaluated in terms of its approximation with the image histogram.

Figure 7 :
Figure 7: Effect of the new objective function  new in the evaluation of Gaussian mixtures (candidate solutions).(a) Original histogram, (b) Gaussian mixture considering four classes, (c) penalization areas, and (d) Gaussian mixture of better quality solution.

Figure 11 :
Figure 11: Image segmented with the reduced Gaussian mixture.

Figure 12 :
Figure 12: (a) Original second image used on the first experiment and (b) its histogram.

7. 2 .Figure 13 :
Figure 13: (a) Segmentation result obtained by LS for the first image and (b) the segmented image.

Figure 14 :
Figure 14: (a) Synthetic image used in the comparison and (b) its histogram.

Figure 15 :Figure 16 :
Figure 15: Convergence results.(a) Convergence of the following methods:  + ABC,  + AIS, and  + DE considering Gaussian mixtures of 8 classes.(b) Convergence of the proposed method (reduced Gaussian mixture).

Figure 19 :
Figure 19: Images employed in the computational cost analysis.

Figure 21 : 1 (
Figure 21: Segmentation results using in the evaluation.
[1]sent the benchmark functions used in our experimental study.Such functions are classified into two different categories: unimodal test functions (Table11) and multimodal test functions (Table12).In these tables,  is the function dimension;  opt is the minimum value of the function, with S being a subset of   .The optimum locations (x opt ) for functions in Tables11 and 12are in [0]  , except for  5 ,  12 , and  13 with x opt in[1] and  8 in [420.96] .A detailed description of optimum locations is given in Tables

Table 1 :
Minimization results from the benchmark functions test in Table11with  = 30.Maximum number of iterations = 1000.Functions  1 to  7 are unimodal functions.The results for unimodal functions over 30 runs are reported in Table1considering the average best-so-far solution (ABS), the standard deviation (SD), and the number of function evaluations (NFE).According to this table, LS provides better results than PSO and DE for all functions in terms of ABS and SD.In particular, the test yields the largest performance difference in functions  4 - 7 4.1.Unimodal Test Functions.

Table 2 :
values produced by Wilcoxon's test that compares LS versus PSO and DE over the "average best-so-far" values from Table1.

Table 3 :
Minimization results from the benchmark functions test in Table12with  = 30.Maximum number of iterations = 1000.

Table 4 :
values produced by Wilcoxon's test comparing LS versus PSO and DE over the "average best-so-far" values from Table3.- 12 , whereas, from a statistical viewpoint, there is no difference between results from LS and DE for  8 and  13 .According to Table

Table 5 :
Parameter setup for the LS segmentation algorithm.

Table 6 :
Results of the reduced Gaussian mixture for the first and the second image.

Table 7 :
Employed parameters for the design of the reference image.

Table 8 :
Results of the reduced Gaussian mixture in terms of accuracy.
J + ABC