A Clustering Approach Using Cooperative Artiﬁcial Bee Colony Algorithm

. Artiﬁcial Bee Colony (cid:2) ABC (cid:3) is one of the most recently introduced algorithms based on the intelligent foraging behavior of a honey bee swarm. This paper presents an extended ABC algorithm, namely, the Cooperative Article Bee Colony (cid:2) CABC (cid:3) , which signiﬁcantly improves the original ABC in solving complex optimization problems. Clustering is a popular data analysis and data mining technique; therefore, the CABC could be used for solving clustering problems. In this work, ﬁrst the CABC algorithm is used for optimizing six widely used benchmark functions and the comparative results produced by ABC, Particle Swarm Optimization (cid:2) PSO (cid:3) , and its cooperative version (cid:2) CPSO (cid:3) are studied. Second, the CABC algorithm is used for data clustering on several benchmark data sets. The performance of CABC algorithm is compared with PSO, CPSO, and ABC algorithms on clustering problems. The simulation results show that the proposed CABC outperforms the other three algorithms in terms of accuracy, robustness, and convergence speed.


Introduction
Swarm Intelligence SI is an innovative artificial intelligence technique for solving complex optimization problems.In recently years, many SI algorithms have been proposed, such as Ant Colony Optimization ACO 1 , Particle Swarm Algorithm PSO 2 , and Bacterial Foraging Optimization BFO 3 .Artificial Bee Colony ABC algorithm is a new swarm intelligent algorithm that was first introduced by Karaboga in Erciyes University of Turkey in 2005 4 , and the performance of ABC is analyzed in 2007 5 .The ABC algorithm imitates the behaviors of real bees in finding food sources and sharing the information with other bees.Since ABC algorithm is simple in concept, easy to implement, and has fewer control parameters, it has been widely used in many fields.Until now, ABC has been applied successfully to some engineering problems, such as constrained optimization problems 6 , neural networks 7 , and clustering 8 .However, like other stochastic optimization algorithms, such as PSO and Genetic Algorithm GA , as the dimensionality of the search space increases, ABC algorithm possesses a poor convergence behavior.Cooperative search is one of the solutions to this problem, which has been extensively studied in the past decade.Potter proposed cooperative coevolutionary genetic algorithm CCGA 9 , Van den Bergh and Engelbrecht proposed cooperative particle swarm optimizer, called CPSO 10 , and Chen et al. proposed cooperative bacterial foraging optimization 11 .This paper applies Potter's cooperative search technique to the ABC, resulting in a new cooperative ABC algorithm, namely CABC.In order to evaluate the performance of the CABC, we compared the performance of the CABC algorithm with that of ABC, PSO, and CPSO on a set of well-known benchmark functions such as Rosenbrock, Griewank, Rastrigin, Ackley, and Schwefel.From the simulation results, the CABC algorithm shows remarked performance improvement over the other algorithms in all benchmark functions.
Data clustering is the process of grouping data into a number of clusters.The goal of data clustering is to make the data in the same cluster share a high degree of similarity while being very dissimilar to data from other clusters.Clustering algorithms have been applied to a wide range of problems, such as data mining 12 , data analysis, pattern recognition 13 , and image segmentation 14 .Clustering algorithms can be simply classified as hierarchical clustering and partitional clustering 15 .This paper mainly focuses on partitional clustering.Partitional clustering algorithm divides data vectors into a predefined number of clusters by optimizing some certain criterion.The most popular partitional clustering algorithm is Kmeans.In the past three decades, K-means clustering algorithm has been used in various domains.However, K-means algorithm is sensitive to the initial states and always converges to the local optimum solution.In order to overcome this problem, many methods have been proposed, such as Zhang and hsu have introduced K-harmonic means KHM  In this paper, a CABC algorithm is applied to solve the clustering problem, which has been tested on a variety of data sets.The performance of the CABC on clustering is compared with results of the ABC, PSO, and CPSO algorithms on the same data sets.The above data sets are provided from the UCI database 23 .
The rest of the paper is organized as follows.In Section 2 we will introduce the original ABC algorithm.Section 3 will discuss cooperative search methods, and our cooperative implementations of the ABC algorithm will be presented.The details of CABC algorithm will be given in this section.Section 4 tests the algorithms on the benchmarks, and the results obtained are presented and discussed.In Section 5, the cluster analysis problem is discussed.The application of CABC algorithm on clustering is shown in Section 6, and the performance of CABC algorithm is compared with PSO, CPSO, and ABC algorithms on clustering problem in this section.Finally, conclusions are given in Section 7.

The Original ABC Algorithm
The artificial bee colony algorithm is a new population-based metaheuristic approach, initially proposed by Karaboga 4,5 and further developed by Karaboga and Basturk 6, 7 .It has been used in various complex problems.The algorithm simulates the intelligent foraging behavior of honey bee swarms.The algorithm is very simple and robust.In the ABC algorithm, the colony of artificial bees is classified into three categories: employed bees, onlookers, and scouts.Employed bees are associated with a particular food source that they are currently exploiting or are "employed" at.They carry with them information about this particular source and share the information to onlookers.Onlooker bees are those bees that are waiting on the dance area in the hive for the information to be shared by the employed bees about their food sources, and then make decision to choose a food source.A bee carrying out random search is called a scout.In the ABC algorithm, the first half of the colony consists of the employed artificial bees and the second half includes the onlookers.For every food source, there is only one employed bee.In other words, the number of employed bees is equal to the number of food sources around the hive.The employed bee whose food source has been exhausted by the bees becomes a scout.The position of a food source represents a possible solution to the optimization problem and the nectar amount of a food source corresponds to the quality fitness of the associated solution represented by that food source.Onlookers are placed on the food sources by using a probability-based selection process.As the nectar amount of a food source increases, the probability value with which the food source is preferred by onlookers increases, too 4, 5 .The main steps of the algorithm are given below.
In the initialization phase, the ABC algorithm generates a randomly distributed initial food source positions of SN solutions, where SN denotes the size of employed bees or onlooker bees.Each solution x i i 1, 2, . . ., SN is a D-dimensional vector.Here, D is the number of optimization parameters.And then evaluate each nectar amount fit i .In the ABC algorithm, nectar amount is the value of benchmark function.
In the employed bees' phase, each employed bee finds a new food source v i in the neighborhood of its current source x i .The new food source is calculated using the following expression: where k ∈ 1, 2, . . ., SN and j ∈ 1, 2, . . ., D are randomly chosen indexes, and k / i. φ ij is a random number between −1, 1 .And then employed bee compares the new one against the current solution and memorizes the better one by means of a greedy selection mechanism.
In the onlooker bees' phase, each onlooker chooses a food source with a probability which is related to the nectar amount fitness of a food source shared by employed bees.Probability is calculated using the following expression:

2.2
In the scout bee phase, if a food source can not be improved through a predetermined cycles, called "limit", it is removed from the population, and the employed bee of that food source becomes scout.The scout bee finds a new random food source position using the

Cooperative ABC
In the ABC algorithm, the goal of each individual bee is to produce the best solution.From expression 2.1 , we can see that the new food source is produced by a random neighborhood of current food position and a random single dimension of D-dimensional vector.This will bring about a problem that an individual may have discovered a good dimension, but the fitness of the individual is computed by using D-dimensional vector, hence we know it is very probable that the individual is not the best solution in the end, and the good dimension which the individual has found will be abandoned.For example, there are two vectors x 1 10, 8 and x 2 8, 10 , and the fitness function f x x 2 , where the global minimizer of the fitness function is 0, 0 .Set x 1 and x 2 as the current positions.And x 1 8, 6 and x 2 4, 12 represent the final positions which x 1 and x 2 can find.We can see that f x 1 100 is smaller than f x 2 160, so we select x 1 as the solution of the function.However, we notice that the first dimension of x 2 is 4 which is closer to the global minimizer than the first dimension of x 1 .Because x 2 is not the best solution, the first dimension of x 2 is not selected.If we can save the good dimension, the final solution will become x 1 4, 6 and the quality of solution will be significantly improved.Therefore, we hope to find every best dimension in all individuals.We need each individual's contribution to the best solution.
The same problem is met in the GA and PSO algorithms.Cooperative search technique has been applied to GA and PSO 9, 10 .To produce a good solution vector, all the populations must cooperate.And the information from all the populations need to be used.Therefore, we apply cooperative search to solve the problem in the ABC algorithm and propose the Cooperative ABC algorithm.In the CABC algorithm, we set a super best solution vector, namely, gbest and its each component of D-dimensional is the best in all populations.For gbest: g 1 , g 2 , . . ., g i , . . ., g D g i corresponds to the ith component of the gbest.The fitness function is represented by f. in algorithm 1 The main steps of CABC algorithm are given.
In the initialization phase, we evalute the fitness of the initial food source positions and set the position which has the best fitness as the initial gbest.
In the employed bees' and onlooker bees' phase, we use the j component of each individual to replace the corresponding component of the gbest to find the best position of the j component.Gbest do not influence the employed and onlooker bees finding new food sources.It is a virtual bee.It just saves the best one of each component.After all phases, the best solution achieved by all individuals and the gbest will be compared.

Benchmark Functions
In order to compare the performance of the proposed CABC algorithm with ABC, PSO, and CPSO, we used six well-known benchmark functions.One of the benchmark functions is unimodal and the others have a number of local minima 25 .
The first function is Sphere function and its global minimum value is 0 at 0, 0, . . ., 0 .Initialization range for the function is −5.12, 5.12 .It is a unimodal function and easy to solve.The Sphere function is defined as The second function is Rosenbrock function and its global minimum value is 0 at 1, 1, . . ., 1 .Initialization range for the function is −15, 15 .It is a multimodal function.It has a narrow valley near the global optimum and it is difficult to converge the global optimum.The Rosenbrock function is defined as The third function is Griewank function and its global minimum value is 0 at 0, 0, . . ., 0 .Initialization range for the function is −600, 600 .This function has a product term, introducing an interdependency between the variables, thereby making it difficult to reach the global optimum.The Griewank function is defined as The fourth function is Rastrigin function and its global minimum value is 0 at 0, 0, . . ., 0 .Initialization range for the function is −15, 15 .It is a complex multimodal function, and it has a large number of local optima.Thereby it is easy for optimization algorithm to trap in local optimum.The Rastrigin function is defined as The fifth function is Ackley function and its global minimum value is 0 at 0, 0, . . ., 0 .Initialization range for the function is −32.768,32.768 .This function has one narrow global optimum.The second exponential term that covers its surface with many local minima.The Ackley function is defined as The sixth function is Schwefel function and its global minimum value is 0 at 420.9867, 420.9867, . . ., 420.9867 .Initialization range for the function is −500, 500 .The Schwefel function is very complex.There are a lot of peaks and valleys on its surface.There is a second best minimum far from the global minimum and many algorithms fall into it:

Parameter Settings for the Involved Algorithms
In the experiment all functions are tested on thirty dimensions, and the population size of all algorithms was 100.The PSO algorithm we used is the standard PSO.In PSO and CPSO algorithm, inertia weight ω varies from 0.9 to 0.7 linearly with the iterations and the acceleration factors c 1 and c 2 being both 2.0 26 .In order to compare the different algorithms, a fair time measure must be selected.It was, therefore, decided to use the number of function evaluations FEs as a time measure 10 .Thereby FEs in a time measure is our termination criterion.

Simulation Results for Benchmark Functions
The experimental results, including the best, worst, average, and standard deviation of the function values found in 30 runs are proposed in Table 1 and all algorithms were terminated after 100,000 function evaluations.
From Table 1, the CABC algorithm is better than the other algorithms on four benchmark functions Sphere, Rastrigin, Ackley, Schwefel while the ABC algorithm shows better performance than the other algorithms on two benchmark functions Rosenbrock, Griewank .The PSO converges very slowly and its performance is very bad on four benchmark functions Griewank, Rastrigin, Ackley, Schwefel , as can be seen in Figure 1.
On Sphere function, all algorithms perform very well.However, Table 1 shows that the performance of CABC is much better than the others'.The speed of convergence of CABC is much faster, as can be seen in Figure 1 a .On Rosenbrock function, the ABC algorithm shows better performance in terms of values of average, best, and standard deviation than the other methods.The CABC algorithm is a little worse than the ABC.We can see that the CABC, PSO, and CPSO algorithm converge fast at first, but they become trapped in a local minimum very soon, as can be seen in Figure 1 b .
On Griewank function, the CABC algorithm performs much worse than ABC, but it is better than CPSO and PSO algorithms.Table 1 shows that the best value of the fitness function for CABC algorithm is 0 much smaller than this of the other four methods.Nevertheless, CABC algorithm is easyly trapped at local optimum, so the average value is worse than ABC.In the initialization phase, the CABC and CPSO algorithms perform well, but their performance rapidly deteriorates when they run about 30,000 function evaluations, as can be seen in Figure 1 c .From the results of Rastrigin and Ackley, we can observe that the ability of exploiting the optimum of the CABC algorithm is very strong.The CABC is definitely better than the other four methods for these two test functions.On Rastrigin function, the CPSO is more successful in terms of all values than ABC.However, on Ackley function, ABC algorithm outperforms CPSO.We can see that PSO algorithm is easy trapped in local optimum on Rastrigin and Ackley function.From Figures 1 d and 1 e , we can observe that the CABC algorithm is able to continue improving its solution on these two functions.
On Schwefel function, it is a very complex function and there is a second best minimum that many algorithms fall into.The performance of ABC and PSO algorithms deteriorate in optimizing this function and the CPSO algorithm performs a little better than them.However, the CABC algorithm shows better performance on this, as can be seen in Figure 1 f .From Table 1, we notice that the average, best, worst, and standard deviation values of the CABC algorithm are almost the same.That means that CABC falls into this position every time.

Discrete Dynamics in Nature and Society
From the results above, it is concluded that the proposed algorithm can be efficiently used for multivariable, multimodal function optimization.For that reason, in this paper, we apply our CABC algorithm to solve clustering problem.

K-Means Algorithm
As mentioned above, the goal of data clustering is grouping data into a number of clusters and K-means algorithm is the most popular clustering algorithm.In this section, we briefly describe the K-means algorithm.Let X x 1 , x 2 , . . ., x n be a set of n data and let each data vector be a p-dimensional vector.Let C {c 1 , c 2 , . . ., c k } be a set of K clusters and K denotes the number of cluster centroids which is provided by the user.In K-means algorithm, firstly, randomly initialize the K cluster centroid vectors and then assign each data vector to the class with the closest centroid vector.In this study, we will use Euclidian metric as a distance metric.The expression is given as follows:

5.1
After all data being grouped, recalculate the cluster centroid vectors using where n j is the number of data vectors which belong to cluster j.After the above process, reassign the data to the new cluster centroids and repeat the process until a criterion is satisfied.In this study, the criterion is when the maximum number of iterations has been exceeded.To know whether the partition is good or not, a measure for partition must be defined.A popular performance function for measuring goodness of the partition is the total within-cluster variance or the total mean-square quantization error MSE 27, 28 , which is defined as follows: Because K-means algorithm is sensitive to the initial states and always converges to the local optimum solution, more population-based stochastic search algorithms are presented.In this paper, we will use CABC algorithm to solve clustering problem.

CABC Algorithm on Clustering
In the CABC algorithm, each individual represents a solution in K dimensional space.The number of dimension is equal to the number of clusters.Each component of an individual

For data vector x i
Calculate the Euclidean distance by using 5.1 Assign x i to the closest centroid cluster c j .Calculate the measure function using equation 5.3 EndFor.Return value of the fitness function.represents a cluster centroid and each cluster centroid is a p-dimensional vector.In the initialization phase, we use maximum and minimum value of each component of the data set which is to be grouped as CABC algorithm individuals' initialization range.And initial solution is randomly generated in this range.We use expression 5.3 to calculate the fitness function of individuals.Here the main steps of the fitness function are given in algorithm 3

Data Clustering Experimental Results
To evaluate performance of the proposed CABC approach for clustering, we compare the results of the K-means, PSO, CPSO, ABC, and CABC clustering algorithms using six different data sets which are selected from the UCI machine learning repository 23 .
Motorcycle data N 133, d 2, K 4 : the Motorcycle benchmark consists of a sequence of accelerometer readings through time following a simulated motorcycle crash during an experiment to determine the efficacy of crash helmets.
Iris data N 150, d 4, K 3 : this data set with 150 random samples of flowers from the iris species setosa, versicolor, and virginica collected by Anderson 1935 .From each species there are 50 observations for sepal length, sepal width, petal length, and petal width in cm.This data set was used by Fisher 1936 in his initiation of the linear-discriminantfunction technique 29 .
Wine data N 178, d 13, K 3 : this is the wine data set, which is also taken from MCI laboratory.These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.The analysis determined the quantities of 13 constituents found in each of the three types of wines.There are 178 instances with 13 numeric attributes in wine data set.All attributes are continuous.There is no missing attribute value 29 .
Contraceptive Method Choice N 1473, d 10, K 3 : this data set is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey.The samples are married women who were either not pregnant or do not know if they were at the time of interview.The problem is to predict the current contraceptive method choice no use, longterm methods, or short-term methods of a woman based on her demographic and socioeconomic characteristics 29 .
Wisconsin breast cancer N 683, d 9, K 2 , which consists of 683 objects characterized by nine features: clump thickness, cell size uniformity, cell shape uniformity, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses.There are two categories in the data: malignant 444 objects and benign 239 objects 29 .For every data set, each algorithm is applied 30 times individually with random initial solution.The parameters of all algorithms are set like Section 4. Table 2 summarizes the intracluster distances, as defined in 5.3 , obtained from all algorithms for the data sets above.The average, best, and worst solution of fitness from the 30 simulations, and standard deviation are presented in Table 2. Figure 2 shows the search progresses of the average values found by four SI algorithms over 30 runs for six data sets.Figure 3 shows the original data distribution of MotorCycle and Iris data sets and the clustering result by CABC algorithm.
From the values in Table 2, we can conclude that the results obtained by CABC are clearly better than the other algorithms for all data sets; CPSO is a little better than PSO; the K-means is the worst for all data sets.
For MotorCycle data set, the optimum of the fitness function for all algorithms, except K-means, is 2.0606e 003.That means they all can found the global solution.However, the standard deviations for them are 3.2158e −001, 1.9118e 001, 4.6354e 001, and 1.3340e 001, respectively.From the standard deviation, we can see that the CABC algorithm is better than the other methods.
For Iris data set, CABC and ABC provide the optimum value and small standard deviation in compare to those of obtained by the other methods.The average values of the fitness function for CABC and ABC are 9.4603e 001 and 9.4607e 001, respectively; the standard deviations for CABC and ABC algorithms are less than 1.That means CABC and ABC converge to the global optimum most of the times.For the CABC, note that I adopted

Conclusion
In this paper, based on the cooperative approaches, a novel Article Bee Colony ABC algorithm is presented, namely, Cooperative Article Bee Colony CABC .Because of cooperation, the final solution is produced using information from all the populations.This results in a significant improvement in the performance in terms of solution quality and convergence speed.In order to demonstrate the performance of the CABC algorithm, we compared the performance of the CABC with those of ABC, PSO, and CPSO optimization algorithms on several benchmark functions.From the simulation results, it is concluded that the proposed algorithm has the ability to attain the global optimum; moreover, the CABC definitely outperforms the original ABC.
Because the CABC algorithm can be efficiently used for multivariable, multimodal function optimization, we apply it to solve clustering problems.The algorithm has been tested on several well-known real data sets.To evaluate the performance of the CABC algorithm on clustering problems, we compare it with the original ABC, PSO, CPSO, and kmeans.From the experimental results, we can see that the proposed optimization algorithm is better than the other algorithms in terms of average value and standard deviations of fitness function.Additionally, the simulation result illustrates that the proposed CABC optimization algorithm can be considered as a viable and efficient method to solve optimization problems.
16 , and Bezdeck has proposed fuzzy c-means FCM clustering 17 .Over the last decade, more and more stochastic, population-based optimization algorithms have been applied to clustering problems.For instance, Shelokar et al. have introduced an evolutionary algorithm based on ACO algorithm for clustering problem 18, 19 , Merwe et al. have presented PSO to solve the clustering problem 20, 21 , and Karaboga and Ozturk have used the ABC algorithm 22 .
and upper bounds of parameter j, respectively.These steps are repeated through a predetermined number of cycles, called Maximum Cycle Number MCN , or until a termination criterion is satisfied 4, 5, 24 .

Figure 1 :
Figure 1: The median convergence characteristics of fitness functions.a Sphere function.b Rosenbrock function.c Griewank function.d Rastrigin function.e Ackley function.f Schwefel function.

Algorithm 3 :
Main steps of the fitness function.

Figure 2 :
Figure 2: Comparing the convergence of the proposed CABC-based clustering with other approaches in terms of total mean-square quantization error a MotorCycle b Iris.c Wine. d CMC. e Cancer.f Glass.
Initialize the food source positions x i , i 1 • SN 2.3 Evaluate the nectar amount fitnessfit i of food sources and find the best food source which is the initial value of gbest SN Replace thejcomponent of the gbest by using thejcomponent of bee i Calculate the f newgbest g 1 , g 2 , . . ., x ij , . . ., g D If f newgbest better than f gbest Then gbest is replaced by newgbest For employed bee i produce new food source positions vi by using 2.1 Calculate the value fit i i Replace the j component of the gbest by using the j component of bee i Calculate the f newgbest g 1 , g 2 , . . ., x ij , . . ., g D If f newgbest better than f gbest Then gbest is replaced by newgbest For onlooker bee i produce new food source positions vi by using 2.1 Calculate the value fit i

Table 1 :
Results comparison of different optimal algorithms for 30 runs.

Table 2 :
Comparison of intracluster distances for the five clustering algorithms.