A Multiple Pheromone Table Based Ant Colony Optimization for Clustering

. Ant colony optimization (ACO) is an efficient heuristic algorithm for combinatorial optimization problems, such as clustering. Because the search strategy of ACO is similar to those of other well-known heuristics, the probability of searching particular regions will be increased if better results are found and kept. Although this kind of search strategy may find a better approximate solution, it also has a high probability of losing the potential search directions. To prevent the ACO from losing too many potential search directions at the early iterations, a novel pheromone updating strategy is presented in this paper. In addition to the “original” pheromone table used to keep track of the promising information, a second pheromone table is added to the proposed algorithm to keep track of the unpromising information so as to increase the probability of searching directions worse than the current solutions. Severalwell-knownclusteringdatasetsareusedtoevaluatetheperformanceoftheproposedmethodinthispaper.Theexperimental resultsshowthattheproposedmethodcanprovidebetterresultsthanACOandotherclusteringalgorithmsintermsofquality.


Introduction
Partitional clustering is a classical NP-hard problem [1][2][3].The goal of this problem is to divide a set of patterns into several different groups based on their features.Moreover, the basic concept of clustering is simply to put similar data together.For example, consider four flowers with different colors: red, yellow, blue, and purple.They can be split up into warm color group with red and yellow flowers and cool color group with blue and purple flowers.Clustering is important because technologies relevant to it can be applied to a large number of practical applications, such as bioinformation [4,5], document analysis [6,7], human face recognition [8,9], and search engine [10][11][12].
Jain and Dubes [1] divided the clustering process into several steps: pattern representation, definition of a pattern proximity measure appropriate for the data domain, clustering or grouping, data abstraction, and assessment of output.Pattern representation refers to the number of data, groups, and other features available to the clustering algorithm.Pattern proximity uses objective function (or fitness function) to measure the difference of patterns.Two kinds of measurements are usually used for evaluating the results of clustering.The first kind considers the degree of closeness between patterns.For instance, one of the well-known measurements is to calculate the sum of distances between patterns and centroids to which the patterns belong.The second kind considers the difference among the groups.One widely used measurement is to compute the sum of distances among centroids.A large sum implies that groups are quite different from each other.
Since clustering technologies can be used in our daily life, a large number of search methods were presented to solve the partitional clustering problem of which -means [13,14] is one of the most widely used, for it is simple and easy to implement.Initially, -means generates a set of centroids, the number of which is predefined.Then, the patterns will be assigned to the nearest groups (centroids).At this step, patterns assigned to the same group are regarded as in the same group.After the assignment process, -means will then recalculate the centroid of each group by averaging the positions of patterns in that group.The assignment and update processes will be repeated until the positions of all centroids do not change or the changes are less than a predefined threshold.Although -means are simple and easy to implement, the result is extremely sensitive to the initial solution.Another drawback is that -means is easily falling into local optima, and there is simply no way to escape from the local optima.
Therefore, many researches attempt to solve the clustering problem using different metaheuristic algorithms.Tabu search (TS) [15] uses a tabu list to keep track of solutions that have been tried recently so as to avoid searching for the same solutions repeatedly in the near future.Genetic algorithm (GA) [16,17] uses the crossover and mutation operations to share information available between chromosomes to enlarge the search region.Recently, another promising research trend has been using particle swarm optimization (PSO) [18,19] that takes into account both the population experience and the individual experience to obtain a better clustering result than traditional clustering algorithms do.Another swarm intelligence, ant colony optimization (ACO) [20], has also been applied to the clustering problem, by having the ants put pheromone on the paths they passed through so as to share the information learned by each ant.This strategy can also lead the algorithm to find good solutions.
Generally speaking, ACO is a powerful search method, especially for combinatorial optimization problems, because all the search information can be shared and accumulated on the pheromone table.However, for some optimization problems, ACO also suffered from the problem of falling into local optima at early iterations even though it has a better chance to select different search directions.Also, paths with higher pheromone concentration do not always lead to the final goal.Hence, we present in this paper an ACO-based clustering algorithm, called multiple pheromone table based ant colony optimization (MPTACO), to solve this problem (this is basically an extended version of our previous study [21]).A second pheromone table is added to the ACO to keep track of the unpromising information so that the ACO can try to find better clustering results by using this unpromising information.To evaluate its performance, the proposed method is compared with four state-of-the-art algorithms: standard -means [14], genetic -means algorithm [16], ant colony system (ACS) [22], and ACODPT [21].
The remainder of the paper is organized as follows.Section 2 gives a brief introduction to the ACO-based algorithms and improvement on them.Section 3 describes in detail the proposed algorithm.Performance evaluation of the proposed algorithm is presented in Section 4, and the conclusion is drawn in Section 5.
for the other ants to find the food.The ants tend to select the path with higher pheromone concentration.For this reason, a path with higher pheromone concentration will lead more ants to go through the path, implying that the path has a larger chance to find food.
Algorithm 1 gives an outline of the ACO algorithm.In the solution construction process, the ants need to decide which paths to take.To realize this concept, the probability of choosing (, ) as the next path is defined as follows: where   () denotes the set of candidates which ant  can reach from the current position ; (, ) is the pheromone value on the path (, ); (, ) is the inverse of the distance of the path (, );  and  are the weight of exploration and exploitation, respectively.The operator for updating the pheromone concentration (, ) on each path (, ) is defined as follows: where  denotes the pheromone decay which is a value in the range (0, 1);  is the number of ants;   is the length of the tour created by ant .
In a later research [22], Dorigo et al. presented another algorithm called ant colony system (ACS) which is an extended version of [23].ACS enhances the convergence of the original ACO.The algorithm gives ants a certain chance to directly select the path with the highest pheromone concentration as follows: where  denotes a random value and  0 is a threshold for determining the strategy to be used.
The first strategy selects the best next partial solution in the case  ≤  0 while the second strategy selects the next partial solution S determined by the following probability distribution: For this reason, the ant will not always check the probability of each path.Therefore, ACS can speed up the convergence process, thus reducing some of the computation time.

Ant Colony Optimization for
Clustering and Improvements.Shelokar et al. [24] presented an ACO-based clustering algorithm which uses pheromone trails.The main idea is transferring the meaning of information in the solutions.Different from the original version of ACO for the traveling salesman problem (TSP) in which the solutions record the order of the cities the salesman travels, for the clustering problem, the solutions [24] record the cluster numbers to which the patterns are assigned, by using an  by  pheromone matrix where  is the number of nodes and  is the number of centroids.The matrix records the pheromone concentration on the path between each pattern and each centroid.As illustrated in Figure 1, unlike the pheromone table which is represented by a matrix, each solution is encoded as a onedimensional array, the indices of which correspond to the pattern number while the value of each element corresponds to the cluster to which that pattern is assigned.The example shows that pattern 1 is assigned to centroid 3, pattern 2 is assigned to centroid 5, pattern 3 is assigned to centroid 8, and so on.By keeping track of to which cluster each pattern is assigned as illustrated in Figure 1, ACO can be applied to the clustering problem.Moreover, the local search method is also added to speed up the convergence.
In [25], different strategies are added to the ACO to decide the solution.One of the strategies is that ants only check part of the paths instead of all the paths.Another strategy is using a threshold of pheromone to split patterns into different groups and checking the groups at the end.If only a few patterns are in a group and the centroid of that group is close to the centroid of another group, the two groups are merged.In [26], several additional steps are used in the ACO.First, the root cluster, which contains all the patterns, is split into several smaller clusters, and each pattern is assigned to a suitable cluster.Then, clusters which are close to each other are merged.Finally, dissimilar patterns are removed from their clusters, and new clusters for these patterns are constructed.
More recently, Tiwari et al. [27] used two strategies to enhance the performance of ACO.One is setting the current value of pheromone to the initial value once every 50 iterations while the other is setting the pheromones on all paths to the initial value if the pheromones on the paths remain intact for the past 10 iterations in a row.Akarsu and Karahoca [28] used sequential backward selection (SBS) to reduce the dimensions of patterns.They use Manhattan distance as the fitness function and consider the pheromone only when the ants select paths.Jiang et al. [29] split the structure of solutions into two parts.The first part is used to decide which patterns will be included in the following computation.The second part is used to record the relationship between patterns and clusters.Some researches are focused on selfadaptive clustering.For instance, Liu and Fu [30] presented a method which uses Jaccard index to decide a suitable cluster number.Niknam and Amiri [31] presented a method which combines fuzzy adaptive particle swarm optimization (FAPSO), ACO, and -means algorithm.The experiment results show that these methods get a better performance.

Local Minima Problem of Clustering Algorithm.
Since the clustering problem is a traditional optimization problem, a large number of clustering algorithms have been presented for several years [1].As we mentioned in Section 1, -means is one of the most well-known clustering algorithms, but its search process may fall into local minima because it is very sensitive to the randomly generated initial solution and it does not have any mechanism to escape the local optima [32].Metaheuristic algorithms provide an alternative way to prevent the search process from falling into local minima.One of them is the simulated annealing [33] which has a small chance to accept a solution that is worse than the current solution as the new search direction.In [34], tabu search used the so-called tabu list to avoid the search process from falling into local minima.Different from simulated annealing and tabu search, genetic algorithm (GA) [35] used multiple search directions to keep the search direction from falling into local minima or searching the same region repeatedly.Also because GA has a mutation operator, a small percentage of subsolutions can be disturbed.This implies that GA has a chance to escape the local minima.Particle swarm optimization [36] and ant colony optimization [23] also used multiple search directions to search for the solution; they will not easily fall into local minima at early iterations.In summary, most clustering algorithms will confront the dilemma of local minima; therefore, how to mitigate this problem has become an active research topic in recent years.Using the metaheuristic algorithm alone and combining it with traditional clustering algorithms are two promising solutions to keep the search process from falling into local minima at early iterations [3].

The Proposed Method
In this section, we will describe in detail the proposed algorithm MPTACO in three steps: first the concept, then the proposed algorithm, and finally an example.

Concept.
In general, most ACO-based algorithms, even most metaheuristic algorithms, focus on the positive information.For example, for the ACO, ants move along the trail with higher pheromone; for GA, fitter chromosomes survive; for PSO, particles move toward the best particle.However, searching the so-called good directions does not always lead to good results.In fact, the trajectory of search is tortuous when dealing with complex problems.To prevent the ACO from choosing only paths that have high pheromone values and remove paths that have low pheromone values so as to increase the search diversity on the convergence process, the proposed algorithm adds a second pheromone table to keep track of negative information.The consequence is that the proposed algorithm has a better opportunity to select worse partial solutions during the search process so as to find solutions better than those the original ACO can find.As we mentioned in Section 2.3, although most metaheuristic algorithms have mechanisms to avoid their search process from falling into local minima at early iterations, they do not guarantee that their search process will not fall into local minima.Because the search direction of metaheuristic algorithms typically is toward a result that is better in terms of the fitness value (i.e., the value returned by a fitness or objective function), if the algorithm cannot find a better result for a long while before it finds the global optima, the search process might get stuck at a particular position or region.This phenomenon is called falling into local optima.Most of the metaheuristic algorithms have some strategies to escape the local optima.
According to our observation, the search process of ACO will move toward particular solutions (or paths), because the pheromone values associated with these solutions will be strengthened if they are better than the other solutions ACO finds.One of the reasons is that ACO keeps only the promising solutions.To avoid the search process from moving toward particular search directions, a different way to construct the solutions is considered in this paper, that is, taking into account the information of unpromising solutions ACO finds.As shown in Figure 2,   denotes the th pattern,   is the th centroid, and a solid line is the assignment of a pattern to a centroid.This example shows that pattern  1 is assigned to centroid  3 , pattern  2 is assigned to centroid  1 , and pattern  3 is assigned to centroid  2 .The ant picks first path ( 1 ,  3 ) and then path ( 2 ,  1 ).Afterwards, the proposed algorithm eliminates the path ( 3 ,  3 ), based on values in the negative pheromone table and the selection strategy, and then selects path ( 3 ,  2 ).The dashed line denotes the virtual path, which is used to show that the ant moves continuously.In the following subsection, we will discuss briefly the concept first and then explain how the ants choose paths.

Algorithm.
As shown in Figure 3, the major difference between the proposed method and the ACO is in that MPTACO employs two tables, which we refer to as the positive pheromone table and the negative pheromone table, in the transition phase to decide which path an ant is supposed to take.The positive pheromone table is the only pheromone table in ACO, which is used to select paths in the candidate list, whereas the negative pheromone table is used by MPTACO to eliminate paths from the candidate list.3.2.1.Initialization.In the initialization phase, that is, initially, the value of each path in the positive pheromone table is set equal to  0 while the value of each path in the negative pheromone is set equal to 1.Moreover, since the negative pheromone table is used to keep track of the probability of selection, all the paths will be in the candidate list at the very beginning.For each ant, the initial partial solution is set randomly.In other words, each pattern is assigned to a cluster randomly, and then the centroids are calculated.The proposed method represents solutions in such a way that it is the same as described in [24], that is, to which centroid each pattern belongs.Instead of binary encoding, integer encoding is used for two reasons: (1) it is easier to understand the meaning of the solutions, and (2) it is shorter to integer encode the solutions.But, on the other side, integer encoding usually takes more memory space than binary encoding in storing the information.

Transition.
This phase decides the paths in which ants move.In other words, this phase decides the relationship between patterns and centroids.Two kinds of strategies are used for the transition of the proposed algorithm.Equation (5) shows how the decision is made in the proposed algorithm for the transition phase: where   is the number used to decide which strategy to choose, (, ) and (, ) are the same as defined in ACS,  is the weight to control the influence of distance, and  is a random variable that is used to select the next partial solution based on the following probability distribution: with   (, ) representing the probability of pattern  of ant  belonging to centroid  in the next partial solution,    the set of paths which ant  can reach from pattern , and    , generated by (7), the set of paths which will not be selected:  where   is a random number uniformly distributed in the range of 0 to 1.It will be regenerated every time the algorithm checks the negative pheromone.Like ACS, the ants have some chances to select the path with the highest positive pheromone concentration.The negative pheromone used in the second strategy is the probability to determine whether a path will be added to the candidate list or not.Then, the ants will select the paths in terms of the positive pheromone from the candidate list.This implies that the proposed algorithm has an opportunity to eliminate the paths which are really bad.The remaining paths will be chosen with a higher probability.Moreover, the main difference between MPTACO and ACODPT [21] is in the transition phases of them.For ACODPT, three search strategies are used by each ant.For MPTACO, the search strategies for each ant have been simplified to two, and   , which is defined as is used to increase the search diversity at early iterations of the proposed algorithm.Figure 4 gives a simple example to illustrate how the transition phase works.First, the example shows that a number of paths can be selected, each of which is associated with a pair of numbers denoted by / where  denotes the positive pheromone and  is the negative pheromone.For instance, the positive pheromone of path 1 is 10.0 while the negative pheromone is 1.0.The first strategy will select the path with the most positive pheromone, so path 1 is selected.The second strategy assumes that path 3 has been eliminated so that the probability of choosing the remaining paths increases.
Here we assume that path 4 is selected in this example.
In general, the proposed algorithm will search more broadly at early iterations and more deeply at later iterations.For this reason, the proposed algorithm has a larger chance to select the first strategy at early iterations and the second strategy at later iterations.Which strategy is taken depends on the threshold   defined above.

Means Update.
The proposed algorithm will then update the positions of centroids by using the current partial solution for each ant as follows: where   denotes the th pattern of th centroid and   is the number of patterns belonging to centroid .

Local Pheromone
Update.Same as described in [23], the positive pheromone table will be updated by using the following equation: where (, ) denotes the pheromone value associated with the path (, ),  is the evaporation degree, and  0 is the initial pheromone on each path.This phase updates the positive pheromone on the paths which are taken by the ants.If a path is taken by more ants, the positive pheromone on the path will be higher.

Global Pheromone
Update.This phase will update both the positive and the negative pheromone tables.The algorithm will choose the best ant so far to update the positive pheromone table.The positive pheromone on the paths passed through by the best ant will be enhanced.The best ant means the ant with the best fitness value.For example, if the sum of squared errors (SSE) is used as the fitness value, the ant with the smallest SSE will be the best.The pheromone update on all the paths (, ) of the best ant is defined as where  denotes the influence degree of the best ant and  best is the total distance which the best ant has moved.As for the updating of negative pheromone table, the proposed algorithm will select the worst ant so far to update the negative pheromone on the paths through which it passed.Same as the best ant, the worst ant means the ant with the worst fitness value.For every path (, ) the worst ant has passed through, the negative pheromone will be updated as follows: where  is a constant in the range [0, 1], indicating the evaporation rate of negative pheromone.Besides, the proposed algorithm will also update the negative pheromone on all the paths (, ) which are passed through by the best ant so far as follows: Because   (, ) is used as a probability to decide whether the path (, ) should be put into the candidate list or not, the value of   (, ) has to be in the range [0, 1].That is, if   (, ) is larger than 1, it will be set to 1.A larger negative pheromone of the path (, ) means that the path (, ) has a higher probability of being put into the candidate list.The path passed through by the best ant means that it may be the best choice of the final solution.So the selection probability of the path will increase.

Termination Criterion.
The termination criteria can vary.For instance, the search process can be terminated if the total distance of any ant is shorter than a threshold or if the difference between ants is less than a threshold.In this paper, the search process will be terminated if the number of iterations has reached a predefined number.

Summary.
In ACO, only one kind of pheromone is used.All the ants use the same kind of pheromone to decide the search directions.However, using only one kind of pheromone to decide the search directions may neglect some search directions which have the potential to lead to better results, and it may also lead the search process to fall into local optima easily.For this reason, ACODPT adds the second kind of pheromone, which is referred to as the negative pheromone, to improve the quality of the end result.The negative pheromone is used to eliminate the search directions, thus increasing the chance the remaining search directions are selected.This way, the proposed method can have a higher probability to search for all the potential directions.Compared to ACODPT and MPTACO, the number of decision strategies in the transition operator of ACODPT is three, but it has been reduced to two in MPTACO.More important, the transition operator of MPTACO can select the decision strategies automatically according to the search time instead of the probability.
Comparing to the hill-climbing method, ACO, ACODPT, and MPTACO do not always search for better solutions.The selection strategies give them chances to select worse directions.If the algorithm always accepts a better solution just like the hill-climbing method does, it will fall into local optima at early iterations because it does not search for other possibilities in a large search space.Moreover, the pheromone tables and the corresponding strategies make the proposed method capable of mitigating the problem of falling into local optima quickly.The datasets used in the experiments are taken from UCI [37], the details of which are as described in Table 1.

Simulation Results.
To evaluate the performance of MPTACO, we compare it with four state-of-the-art clustering algorithms: standard -means [14], genetic -means algorithm (GKA) [16], ant colony system (ACS) [22], and ACODPT [21].The reasons that these algorithms are chosen for the purpose of comparison are in order.Standard -means is a simple, easy to implement, and widely used algorithm for the clustering problem.It is a single-solution-based algorithm and has the characteristic of not being able to escape the local optima.For this reason, -means can be used to compare the difference when using strategies that can escape the local optima.GKA for clustering leverages the strength of genetic algorithm (GA) and -means; thus, it is not only capable of The parameter settings of each algorithm are as follows.Except -means, the population sizes of GKA, ACS, and MPTACO are fixed at 30.For the ACS and MPTACO, the population size is the number of ants.For the GKA, the crossover operator is replaced by a 2-iteration -means operator, the mutation probability is 0.01, and the number of generations is 1,000.For the ACS and MPTACO,  is set to 0.1,  to 11.0,  to 0.1, and  to 0.01.For the ACS,  0 is set to 1.All the algorithms are run for 30 times, and the averages are taken as the experimental results.Tables 2 and 3 show, respectively, the quality and the computation time.The numbers in bold indicate which algorithm gives the best result for the dataset evaluated.The quality is measured by the sum of squared errors (SSE) defined as and the numbers after ± are the standard deviation.This implies that the smaller the value, the better the result.Table 2 shows that the proposed method outperforms the other algorithms evaluated in most cases in terms of the quality.-means gets the worst results for all datasets because it has no strategy to escape the local optima.As for GKA, it combines GA and -means to get rid of the weakness of means.This is why GKA can find a better result, even the best result for some of the datasets.ACS has a strong global search capability, as is reflected in complex datasets like abalone.Nevertheless, the local search capability of ACS is not as powerful as that of GKA.As such, for some simple datasets, the result is worse than that of GKA.The proposed method beats ACS not only for complex datasets but also for simple datasets.For some datasets, the proposed method provides a worse result than that of GKA and ACODPT.This can be easily justified by observing that the proposed method lacks the local search strategy to fine-tune the result.
Table 3 shows the computation time of each algorithm.Undoubtedly, -means takes the least time because it only searches toward the best direction.The performance of the proposed method is the worst in terms of the computation time.This is basically a trade-off between quality and computation time.For applications the computation time of which is not at a premium, but the quality is a major concern, then the proposed algorithm is a good choice.
Besides the previous experiments, we are also interested in the scales of the values in the datasets.It is obvious that attributes of datasets may influence the clustering result because of the different scales of the values in the datasets.Table 4 shows the results after the attributes of all the datasets are normalized to be in the range [0, 1].As can be easily seen from the results, the best value of some datasets has been changed.This implies that the scale of attributes can influence the clustering result.

Impact of Parameters.
In this experiment, we test the influence of the parameter , which is used to control the proportion between positive pheromone and distance.Figure 5 shows the results.The four datasets tested are abalone, yeast, normalized abalone, and normalized yeast.The table shows that a larger  implies a better quality.This means that the distance has a stronger influence on the result than the positive pheromone.This is because SSE is used as the fitness function, which considers only distances between patterns  and centroids.If a different fitness function is used, the result may be different.
in ( 8) is used to automatically decide the search strategy according to the total number of iterations and the current iteration number.At the beginning of the search procedure, the diversification is respected to discover more regions   (, ) = 1 (7) } (8) While (the termination criterion is not met) { (9) //Transition (10) Foreachant ( 11) Foreachpattern (12) Assign  to centroid  using ( 5), (6), and (7) (13) //Means update (14) Foreachant (15) For each centroid  (16) Update the position of  using (9) (17) //Local pheromone update (18) Foreachant (19) For each pair of pattern  and centroid  (20) I f( , ) is selected by ant  (21) U p d a t e (, ) using (10) (22) //Golbal pheromone update (23) For each pair of pattern  and centroid  { (24) I f( , ) is selected by the best ant (25) U p d a t e(, ) and   (, ) using ( 11) and ( 13) (26) I f( , ) is selected by the worst ant (27) U p d a t e  (, ) using ( 12 that may have better local optima.The algorithm will have a higher chance to choose the search direction which is not the best at the moment.As the time passes by, the search procedure will focus on the intensification.In the end of the search procedure, the algorithm tries to converge to the local optima of the regions.In other words, this strategy makes the algorithm search widely at the beginning but deeply in the end.

Conclusions and Future Works
In this paper, we proposed an efficient algorithm, called MPTACO, to enhance the quality of the ACO-based algorithms for clustering.The idea is adding a second table to record the negative pheromone and to eliminate the most impossible paths so as to increase the probability of choosing worse paths in the table.It can prevent the ants from falling into local optima again and again.In addition, MPTACO can automatically adjust the convergence process according to the number of iterations carried out so far.The experimental results show that the proposed method can get a better result in most cases, especially for large-scale and complex datasets.Our future goal is to reduce the computation time of the proposed method while at the same time enhancing its quality.Also, another concern is how to use the negative pheromone table more efficiently.

Pseudocode of MPTACO
Pseudocode 1 gives the pseudocode of the proposed algorithm MPTACO.As the pseudocode shows, the positive and negative pheromones of each path will be initialized to  0 and 1, respectively, before the main loop is performed repeatedly until the termination criterion is met, as follows.First, all the ants are transited to the new states in the transition phase, as lines (10)- (12) show.Next, in the means update phase, MPTACO will update the centroids of each ant, as lines ( 14)- (16) show.Then, in the local pheromone update phase, the positive pheromone values will be updated, as lines ( 18)- (21) show.Finally, in the global pheromone update phase, the best and worst ants will be found and the positive and negative pheromone values on the paths they passed through will be updated, as lines ( 23)-( 28) depict.Once the termination criterion is met, MPTACO will output the best result (line (30)) and terminate the search process.

Figure 4 :
Figure 4: Example for illustrating the transition.
Datasets.The experiments are conducted on a PC with 2.67 GHz Intel Core i7-920 CPU and 4 GB of memory running Fedora 12 with Linux 2.6.31.5-127.Also, the programs are written in C and compiled by gcc version 4.4.220091027 (Red Hat 4.4.2-7).

Figure 5 :
Figure 5: The experiment result of .

Table 4 :
Quality for normalized datasets.