A Global Multilevel Thresholding Using Differential Evolution Approach

Otsu’s function measures the properness of threshold values in multilevel image thresholding. Optimal threshold values are necessary for some applications and a global search algorithm is required. Differential evolution (DE) is an algorithm that has been used successfully for solving this problem. Because the difficulty of a problem grows exponentially when the number of thresholds increases, the ordinaryDE fails when the number of thresholds is greater than 12. An improvedDE, using a newmutation strategy, is proposed to overcome this problem. Experiments were conducted on 20 real images and the number of thresholds varied from 2 to 16. Existing global optimization algorithms were compared with the proposed algorithms, that is, DE, rank-DE, artificial bee colony (ABC), particle swarm optimization (PSO), DPSO, and FODPSO. The experimental results show that the proposed algorithm not only achieves a more successful rate but also yields a lower threshold value distortion than its competitors in the search for optimal threshold values, especially when the number of thresholds is large.


Introduction
Thresholding is the simplest and most commonly used method of image segmentation.It can be bilevel or multilevel [1].Both of these types can be classified into parametric and nonparametric approaches [1].Surveys of thresholding techniques for image segmentation can be found in [2][3][4][5][6][7].The surveys revealed that Otsu's method is a commonly used technique [4,8].This method finds the optimal thresholds by maximizing the weighted sum of between-class variances (BCV) [9].The BCV function is also called Otsu's function.However, the solution finding process is an exhaustive search and it is a very time-consuming process because the complexity grows exponentially with the number of thresholds.
Multilevel image thresholding based on Otsu's function has been used as a benchmark for comparing the capability of evolutionary algorithms (EA).The EA is a nongradient based optimization algorithm.Several algorithms have been widely applied to solve multilevel thresholding.A group of successful works were based on a combination of Otsu's function with some state-of-the-art algorithms: PSO [10], DE [11], ABC [12], and FOSPSO [13].Kulkarni and Venayagamoorthy [14] showed that PSO was faster than Otsu's method in searching the optimal thresholds of multilevel image thresholding.Akay [15] presented a comprehensive comparative study of the ABC and PSO algorithms.The results showed that the ABC algorithm with both the between-class variance and the entropy criterion can be efficiently used in multilevel thresholding.Hammouche et al. [16] focused on solving the image thresholding problem by combining Otsu's function with metaheuristic techniques, that is, genetic algorithm (GA), PSO, DE, ant colony, simulated annealing, and Tabu search.Their results revealed that DE was the most efficient with respect to the quality of solution.Osuna-Enciso et al. [17] presented an empirical comparative study of the ABC, PSO, and DE algorithms to perform image thresholding using a mixture of Gaussian functions.The results showed that the DE algorithm was superior in performance in minimizing the Hellinger distance and used less evaluations of the Hellinger distance.Ghamisia et al. [18] showed that a global optimal search for optimal threshold values of Otsu's function was essential for the multilevel segmentation of multispectral and hyperspectral images.

Mutation Operators.
The differential mutation operator is one of the three operators of DE.The mutation operator is applied to generate the mutant vector V  for each target vector   in the current population.A mutant vector is generated according to where the randomly chosen indexes, random indexes,  1 ,  2 ,  3 ∈ {1, 2, . . ., NP} are mutually different random integer indices and they are also different from the running index .Further, ,  1 ,  2 , and  3 are different so that NP ≥ 4.  is a real and constant factor,  ∈ [ 0, 2], which controls the amplification of the differential variation;   1 , is called the base vector,   2 , is called the terminal vector,   3 , is called the other vector, and (  2 , −   3 , ) is called the difference vector.
There have been many proposed mutation strategies for DE [20,21].Each different strategy has different characteristics and is suitable for a set of problems.However, the choice of the best mutation operators for DE is difficult for a specific problem [22][23][24].The "DE/rand/1/bin" strategy has been widely used in DE literature [25][26][27][28].It is more reliable than the strategies based on the best-so-far solution such as "DE/best/1" and "DE/current-to-best/1".However, "DE/rand/1/bin" has slower convergence.Simply put, it has high exploration but low exploitation abilities.

Selection.
Selection determines whether the target or the trial vector survives to the next generation.The selection operation is described as where () is the objective function to be minimized.Therefore, if the objective of the new trial vector, ( , ), is equal to or less than the objective of the old trial vector, ( , ), then  ,+1 is set to  , ; otherwise, the old value  , is retained.
The pseudocode of basic DE with "DE/rand/1/bin" strategy is shown in Algorithm 1.
The function rndint[1, ] returns a uniformly distributed random integer number between 1 and D. rndreal  [0, 1) is a uniformly distributed random real value of [0, 1).The word "better" in line 17 means "less than" if the problem requires minimization, see (12) and its explanation, and it means "greater than, " if the problem requires maximization.The best  , , where  is the maximum number of generations, is the solution of the algorithm.The word "best" also depends on the type of problem.

The Proposed DE with Onlooker Ranking-Based Mutation Operator
In 2013 Gong and Cai [19] proposed a rank-DE algorithm.They claimed that probabilistically selecting the vectors   1 and   2 in the mutation operator from the better population can improve the exploitation ability of basic DE.To the best of the authors' knowledge, rank-DE may, however, also lead to premature convergence (this will be shown in the experiments).That means that the rank-DE has too much exploitation ability.Furthermore, it cannot balance between the exploration and the exploitation abilities.In order to balance between the two abilities, we propose DE with the onlooker and ranking-based mutation operator, named ()-DE.The proposed algorithm is an improvement of the rank-DE by homogenizing the rank-DE with the onlooker phase of ABC algorithm.The detail of the ()-DE algorithm is described as follows.

Ranking Assignment.
To perform the maximization, the fitness of each vector is sorted in ascending order (i.e., from worst to best).Then, the rank of the th vector,   , is assigned based on its sorted ordering as follows: order = order, order = 1, 2, . . ., NP.
As a result, the best vector in the current population will obtain the highest ranking, that is, NP.

Probabilistic Selection.
After assigning the ranking for each vector, the selection probability   of the th vector   is calculated as

A New Strategy for Base Vector, Terminal Point, and the Other Vector Selections
Definition 1 (a worse population and a better population).
Let  be a real value and 0 ≤  < 1.A population having probability less than  is called a worse population and a population having probability greater than or equal to  is called a better population.
In the rank-DE, the base vector   1 and the terminal point   2 were based on their selection probabilities.The other vector in the mutation operator,   3 , is selected randomly as in the original DE algorithm.The vectors with higher rankings (higher selection probabilities) are more likely to be chosen as the base vector or the terminal point in the mutation operator.
Our investigation revealed that if both   1 and   2 vectors of rank-DE were chosen from better vectors, then the distribution of the target vector may collapse quickly and possibly lead to premature convergence.Accordingly, when the rank-DE was applied to a very difficult problem, it could not reach the optimal solution.
If the steps of the DE algorithm are compared with the ABC algorithm, the population in the current generation can be considered as the employed bees and the population in the next generation can be considered as the onlooker bees.To follow the concept of ABC, a new vector,   1 , which is called the base vector, chooses a food source with respect to the probability that is computed from the fitness values of the current population.The probability value,   , of which   is chosen by a base vector   1 can be calculated by using the expression given in (14).After a base source   1 for a new vector is probabilistically chosen, both   2 and   3 are also chosen in the same manner as the terminal point and the other vector selections in the rank-DE.The target vector is created by a mutation formula of DE.The mutant vector   is created after the target vector is crossed with a randomly selected vector, and then the fitness value is computed.As in the ordinary DE, a greedy selection is applied between   and   .Hence, the new population contains better sources and positive feedback behavior appears.This idea can be expressed as pseudocode, as in Algorithm 2. Since the selection of   1 is the onlooker selection and the selections of   2 and   3 are brought from the rank-DE, then the algorithm is called onlooker and ranking-based vector selection.
The pseudocode of onlooker and ranking-based vector selection is shown in Algorithm 2. The differences between the original ranking-based and onlooker and ranking based selection are highlighted by "⇐".

The DE with Onlooker-Ranking-Based Mutation Operator.
The procedures in Sections 4.1, 4.2, and 4.3 are combined together to create a better DE algorithm.The parameter 0 ≤  < 1 determines the fraction of the worse population to be eliminated.When  = 0 there is no worse population; each single vector in the current population will act as the base vector.If 0 <  < 1, then each single vector having a probability less than  is a worse vector and will not be selected as the base vector.If  is not a constant or is outside [0, 1), each single base vector is an onlooker bee.Accordingly, the name of the algorithm is Onlooker() Ranking-Base Differential Evolution (()-DE).To achieve the global solution, a user can set a proper value for  to control the balance of the exploration and exploitation abilities of the algorithm.The pseudocode of ()-DE is shown in Algorithm 3 and the differences between the rank-DE and ()-DE are highlighted by "⇐".

Experimental Setup.
The global multilevel thresholding problem deals with finding optimal thresholds within the range [0,  − 1] that maximize the BCV function.The dimension of the optimization problem is the number of thresholds, , and the search space is [0, −1]  .The parameter  of ()-DE is rndreal[0, 1) or is set to be one of 0.0, 0.1, . . ., 0.9.The variation of the proposed ()-DE was implemented and compared with the existing metaheuristics that performed image thresholding, that is, PSO, DPSO, FODPSO, ABC, and several variations of DE algorithms.All the methods were programmed in Matlab R2013a and were run on a personal computer with a 3.4 GHz CPU, 8 GB RAM with Microsoft Windows 7 64-bit operating system.The experiments were conducted on 20 real images.The 19 images, namely, starfish, mountain, cactus, butterfly, circus, snow, palace, flower, wherry, waterfall, bird, police, ostrich, viaduct, fish, houses, mushroom, snow mountain, and snake, were taken from the Berkeley Segmentation Dataset and Benchmark [29].The last image, namely, Riosanpablo, is a satellite image "New ISS Eyes see Rio San Pablo", March 1, 2013 (http://visibleearth.nasa.gov/view.php?id=80561).Each image has a unique gray level histogram.These original images and their histograms are depictedin Figure 1.An experiment of an image with a specific number of thresholds is called a "subproblem."The number of thresholds investigated in the experiments was 2, 3, . . ., 16.Thus, there are 20 × 15 subproblems per algorithm.Each subproblem was repeated 50 times and each time is called a run.
To compare with PSO, ABC, and DEs algorithms, the objective function evaluation is computed for NP ×   , where NP is population size and   is the number of generations.A population of PSO and the DEs calls Otsu's function one time per generation.The population size in the PSO and DEs algorithms was set to 50.A bee in the ABC calls Otsu's function two times per generation; therefore their number of food sources were set to a half of the PSO's size, that is, 25.The stopping criteria were set by the maximum amount of generations .In this experiment,  was set to 50, 100, 150, 200, 300, 400, 600, 800, 1000, 1500, 2000, 3000, 4000, 5000, and 6000 when  was 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, and 16, respectively.For the PSO, DPSO, and FODPSO algorithms, the parameters were set as per the suggestion in [30] and is shown in Table 1.The other control parameter of the ABC algorithm, limit, was set to 50 [15].The control parameters  and CR of the DE algorithms were set to 0.5 and 0.9, respectively [31,32].

Comparison Strategies and Metrics.
To compare the performance of different algorithms, there are three metrics: (1) the convergence rate of algorithms was compared by the average of generations (NG), a lower NG means a faster convergence rate; (2) the stability of algorithms was compared by the average of the success rate, (SR HM ), a higher SR HM means higher stability; (3) the reliability was compared by the threshold value distortion measure (TVD), a lower TVD means higher reliability.The details of the three metrics are described as follows.
When all 50 runs of an algorithm performing on an image with a specific number of thresholds are terminated, the outcomes will be analyzed.Run 'th is called a successful run if there is a generation of  ≤ G such that BCV  () ≥ VTR and the number of generations (NG) of the successful run is recorded.Thus, the number can be defined by if  is a successful run and otherwise undefined.
The average of NG  from those successful runs is represented by NG as follows: The ratio of success rate (SR) for which the algorithm succeeds to reach the VTR for each subproblem is computed as SR = number of successful runs total number of runs .
The experiments were conducted on 20 images.The arithmetic mean (AM) of NG (NG AM ) over the entire set of images with a specific number of thresholds is calculated as where  is the total number of images.NG AM is shown in Table 3.The worst-case scenario is that there is no successful run for a subproblem; this subproblem is called an "unsuccessful subproblem." If an algorithm encounters this scenario, the subproblem will be grouped by its number of thresholds and the number of images in the group will be counted and assigned to .These scenarios will be represented by NA(), as shown in Tables 3 and 4.  The average of the success rate over the entire dataset with a specific number of thresholds (SR HM ) is averaged by the Harmonic mean, HM, as follows: The SR HM is very important in measuring the stability of an algorithm and it means the ratio of runs that are achieving the target solution.Because the evolutionary methods are based on stochastic searching algorithms, the solutions are not the same in each run of the algorithm and depend on the search ability of the algorithm.Therefore, the SR HM is vital in evaluating the stability of the algorithms.The comparison of the stability gives us valuable information in terms of the ratio representing the success rates (SR HM ).A higher SR HM means better stability of the algorithm.
An algorithm producing SR HM < 0.5 means that more than 50 percent of the independent runs of the algorithm cannot reach the global solution.Thus, the algorithm that yields SR HM < 0.5 should not be selected to solve the problem.The experiments were conducted for the number of thresholds varying from 2 to 16.These experiments contained the maximum number of thresholds such that the algorithm yields SR HM ≥ 0.5, which is represented by  0.5 in Table 4. Furthermore, the experiments also contained the maximum number of thresholds that the algorithm can solve; and above this value there was the case such that all 50 runs of some subproblems missed the VTR.This number is represented by  max .In this case the success rate was zero and the associated SR HM was zero too.And the definitions of the two values are presented in (  Let  be the number of thresholds.The reliability of a solution is measured by threshold value distortion measure (TVD) and is computed as where  * is the threshold value producing the VTR,   is the threshold value obtained from the algorithm, and 1 { *  ̸ =    } is the indicator function, which is equal to 1 when  *  ̸ =    and is zero otherwise.TVD is zero if the algorithm can reach the VTR in every run.The lower the TVD the more reliable the algorithm is.

The Value to Reach (VTR).
Following the completion of all of the experiments the best values of the between-class variance and the corresponding thresholds were collected

Results Produced by Local Search Method.
The multithresh function of the Matlab toolbox was conducted on the same images and number of thresholds as the other search methods.The capabilities of solving the optimal solution between a local search and a global search will be discussed here.This is the reason we focused on the global search, that is, the proposed (0.0)-DEalgorithm.Table 2 shows the between-class variances and threshold values of the "mountain" image.These values were the best outcomes of 50 runs produced by the multithresh function in the Matlab R2013a toolbox and by the proposed (0.0)-DEalgorithm.
The terminated condition of the multithresh function was set by "MaxFunEvals" = 500000.That is the multithresh function performs more function calls than that of the (0.0)-DEalgorithm.It can be seen from columns 3 and 5 that all the BCVs produced by the (0.0)-DEalgorithm are better than the BCVs produced by the multithresh function; the difference of the BCVs is shown in column 7. The differences in the thresholds from the two algorithms, shown in column 8, tended to be large if the number of thresholds increased.
Figure 2 shows the graph of the TVD of all the images and thresholds.These results are in the same pattern of the results of the "mountain" image in Table 2.That means the ability to search for the optimal solution of the proposed global search algorithm is higher than that of the multithresh function, especially when the number of thresholds is large.This goes to illustrate the difficulty of the problem.The problem with this kind is that it can be multimodal [33] or can be a nearly flat top surface [34].The multithresh function solves the problem by performing the Nelder-Mead Simplex   NA( 11) NA( 9) NA( 5) NA( 1  method [35], which is a local search method that cannot guarantee an optimal solution.Thus, its solutions are inferior to the solution produced by the algorithm using a global search.

Convergence
The algorithm with the highest  max , that is, the lowest number of unsuccessful subproblems and the lowest average of generation is the winner.The ranking of the algorithms depends on the ordering of (a 1 , b 1 , c 1 ) and (a 2 , b 2 , c 2 ) as follows.
First, rank on a 1 and a 2 .Since both a 1 and a 2 are numeric, the higher value has the higher rank.Second, rank on b 1 and b 2 .If a 1 is 16, then b 1 must be NA(0).If a 1 is less than 16, then b 1 must be NA (number of unsuccessful subproblems) and b 2 has the same characteristics as b 1 .Perform the order of the two numeric values in reverse; the lower value has the higher rank.
Third, rank on c 1 and c 2 .They are numeric but the lower is better; perform the order of the two numeric values in reverse.The lower value has the higher rank.
When the ordering is finished, assign the numeric value of "1" to the object having the highest rank, assign the numeric value of "2" to the first runner up, and so on.These values are represented in the last row of the table.
If the row having  =  must be ranked, it can be done in the same manner as above with some minor modifications.If  is less than  max , the values of the triple pair will be , NA(0), NG AM .If  is greater than  max , the values of the triple pair will be ( max , NA(), ∞).Thus the ranking can now be performed.
From the ranking results, see the second last row of Table 3; the convergence rate can be ranked from best to worst in the following order: (0.As can be seen from Table 3, the DE algorithm cannot complete the task when  > 12 and the rank-DE algorithm cannot complete the task when  > 9. Thus, rank-DE cannot compete with DE on searching for global multilevel thresholding. In order for ()-DE to outperform DE then  must be in the range of [0.1, 0.6] or set to rndreal[0, 1).

Stability Analysis.
The harmonic mean of the success rate (SR HM ) for each specific number of thresholds was computed and is presented in Table 4.The results of each algorithm are represented in the corresponding column's name.The second row from the bottom shows the harmonic mean of the success rate of each algorithm for all threshold levels.
In each column, the cells containing SR HM start from the row associated with  = 2 to the row associated with  =  0.5 .The cells from the row associated with  =  0.5 + 1 to the row associated with  =  max are filled by SR HM .The cells from the row associated with  =  max +1 to the row associated with  = 16 are the cells that have SR HM ; these cells will be excluded from the comparison.The second last row of the table is filled with the triple: ( 0.5 ,  max , HM (SR HM of  = 2 until  =  max )) .(24) The algorithm with the highest  0.5 , the highest  max , and the highest average success rate is the winner.The ranking of the algorithms depends on the ordering of (a 1 , b 1 , c 1 ) and (a 2 , b 2 , c 2 ) as follows.
First, rank on a 1 and a 2 .
Second, rank on b 1 and b 2 .
Third, rank on c 1 and c 2 .
Because they are numeric the higher value has the higher rank.When the ordering is finished, the numeric value of "1" is assigned to the object having the highest rank, the numeric value of "2" is assigned to the first runner up, and so on.
If the row having  =  must be ranked, it can be done in the same manner as above with some minor modifications.If  is less than or equal to  0.5 , the values of the triple pair will be (, , SR HM ).If  0.5 <  ≤  max , then the values of the triple pair will be ( 0.5 , , SR HM ).If  is greater than  max , the values of the triple pair will be ( 0.5 ,  max , SR HM ).Thus, the ranking can now be performed.
From the ranking results, see Table 4, the success rate can be ranked from best to worst in the following order: (0.0)-DE,(0.As can be seen from Table 4, the DE algorithm has an SR HM ≥ 0.5 until  = 9 and the rank-DE algorithm has an SR HM ≥ 0.5 until  = 7.This result confirms that rank-DE cannot compete with DE on searching for global multilevel thresholding.If the correct  is selected, the proposed algorithm can work very well.For  ≤ 0.5, ()-DE has a higher rank than DE.The multithresh function cannot compete with any of the other algorithms.It can also be seen that the proposed (0.0)-DEalgorithm has the best stability because its SR HM is greater than 0.5 when  = 2 to 16.

Reliability Comparison.
The threshold value distortion a.k.a.TVD for each specific threshold is computed, shown in Table 5 and depicted in Figure 2. The results of each algorithm are illustrated in the corresponding column's name.The second last row of Table 5 shows the slope or the approximated growth rate, , of the TVD of each algorithm for all threshold levels.The  is the slope of the robust linear regression computed by the Matlab function "robustfit." The lower slope exhibits the better reliability.The  of each algorithm is sorted in descending order.From the results, the reliability can be ranked from best to worst in the following order: (0.0)-DE,(0.We can see from these results that rank-DE has a higher approximated TVD growth rate than DE.()-DE with  ≤ 0.5 and (rand)-DE are still better than DE.(0.0)-DEproduced the best result with a very flat slope and a very low -intercept.The multithresh function yielded a higher growth rate of solution distortion and a higher -intercept than the other algorithms.The higher growth rate of solution distortion means the quality of solution drops very fast if the number of thresholds increases.The higher -intercept means that the solution distortion at the lowest number of thresholds is high.Thus, the global optimization algorithm is required for solving the multilevel thresholding.

Figure 2 :
Figure 2: The threshold value distortion (TVD) of algorithms versus number of thresholds.

Table 2 :
The best values of the between-class variance and thresholds of the "mountain" image produced by the multithresh function of the Matlab toolbox and by the proposed O(0.0)R-DE algorithm.The number of thresholds varies from 2 to 16.

Table 3 :
The average of mean number of generation (NG AM ) and ranks of the methods.

Table 4 :
The average success rate (SR HM ) and ranks of the methods.

Table 5 :
The average of threshold value distortion (TVD) and ranks of the methods.
Rate Comparison.The number of generations (NG) is a measure used for the convergence rate comparisons.If the target value, VTR, is achieved in a lesser number of generations (NG), it means a faster convergence rate for the algorithm.Table3shows the average of NG (NG AM ) for each specific number of thresholds.The results of each algorithm are represented in the corresponding column's name.In each column, the cell containing NG AM starts from the row associated with  = 2 until the row associated with  =  max .The cells associated with  =  max + 1 to the row associated with  = 16 are filled by NA().The second last row of the table is filled by the triple: