A Hybrid Monkey Search Algorithm for Clustering Analysis

Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.


Introduction
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis was originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 and Tryon in 1939 and famously used by Cattell beginning of 1943 [1] for trait theory classification in personality psychology. Many clustering methods have been proposed; it is divided into two main categories: hierarchical and partitional. The -means clustering method [2] is one of the most commonly used partitional methods. However the results of -means solving the clustering problem highly depend on the initial solution and it is easy to fall into local optimal solutions. Zhang et al. have proposed an improved -means clustering algorithm called -harmonic means [3]. But the accuracy of the results obtained by the method is not high.
In order to overcome this problem, many scholars began to solve the problem using metaheuristic algorithms. In 1991, Colorni et al. have presented ant colony optimization (ACO) algorithm based on the behavior of ants seeking a path between their colony and a source of food. Then Shelokar et al. and Kao and Cheng solved the clustering problem using the ACO algorithm [4,5]. Niknam et al. have proposed an efficient hybrid evolutionary algorithm based on combining ACO and SA (simulated annealing algorithm, 1989 [6]) for clustering problem [7,8]. Kennedy and Eberhart have proposed particle swarm optimizer (PSO) algorithm which simulates the movement of organisms in a bird flock or fish school in 1995 [9]. The algorithm also has been adopted to solve this problem by Omran et al. and Merwe and Engelbrecht [10,11]. Kao et al. have presented a hybrid approach according to combination of the -means algorithm, Nelder-Mead simplex search, and PSO for clustering analysis [12]. Niknam et al. have presented a hybrid evolutionary algorithm based on PSO and SA to solve the clustering problem [13]. Niknam has proposed an efficient hybrid approach based on PSO, ACO, and -means called PSO-ACO-K approach for cluster analysis [14]. In 2005, the artificial bee colony (ABC) algorithm is described by Karaboga [15] and it has been adopted to solve this problem by Karaboga and Ozturk [16]. Zou et al. have proposed a cooperative artificial bee colony algorithm to solve the clustering problem and experiment on synthetic and real life datasets to evaluate the performance [17]. Voges and Pope have used an evolutionary-based rough clustering algorithm for the clustering problem [18]. 2 The Scientific World Journal Monkey algorithm (MA) is a new type of swarm intelligent algorithm. It was put forward by Ruiqing and Wansheng [19] in 2008 which is used in solving large-scale, multimodal optimization problem. The method derives from the simulation of mountain-climbing processes of monkeys. It consists of three processes: climb process, watch-jump process, and somersault process. In the original MA, the time consumed mainly lies in using the climb process to search local optimal solutions. The essential feature of this process is the calculation of the pseudogradient of the objective function that only requires two measurements of the objective function regardless of the dimension of the optimization problem. The purpose of the somersault process is to make monkeys find new search domains and this action primely avoids running into local search. Therefore, MA has been successfully applied to solve various optimization problems, such as the transmission network expansion planning [20], the intrusion detection technology [21], the optimal sensor placement in structural health monitoring [22], and the optimization of gas filling station project scheduling problem [23]. In view of the characteristics of the clustering problem, this paper proposed a monkey algorithm with search operator of artificial bee colony algorithm (ABC-MA). The algorithm introduced the ABC search operator before the climb process to strengthen the local search ability and to improve the somersault process combined with the -means method. The algorithm improves the calculation accuracy in a certain degree. The numerical experiment results show that the proposed algorithm has good performance than that of the basic monkey algorithm for solving the clustering problem.

The -Means Clustering Algorithm
The goal of data clustering is grouping data into a number of clusters. -means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. It was proposed by MacQueen in 1967 [24]. The procedure follows a simple and easy way to classify a given data set = { 1 , 2 , . . . , } through a certain number of clusters 1 , 2 . . . , (assume clusters) fixed a priori; each data vector is a -dimensional vector, satisfying the following conditions [25,26]: (2) ∩ = 0, , = 1, 2, . . . , , ̸ = ; The -means clustering algorithm is as follows.
(3) Assign each object to the group that has the closest centroid. The principle of division is as follows: if ( − ) < ( − ), = 1, 2, . . . , and ̸ = . The data will be divided into classified collection .
(4) When all objects have been assigned, recalculate the positions of the centroids * 1 , * 2 , . . . , * : where | | is the number of the points in the classified collection .
(5) Repeat steps 2 and 4 until the centroids no longer move.
The main idea is to define centroids, one for each cluster. These centroids should be placed in a cunning way because a different location causes different result. So, the better choice is to place them as much as possible far away from each other. In this study, we will use Euclidian metric as a distance metric. The expression is given as follows: Finally, this algorithm aims at minimizing an objective function, in this case, a squared error function. The objective function

Description of Modified Monkey Algorithm
The MA is a novel kind of evolutionary algorithm which can solve a variety of difficult optimization problems featuring nonlinearity, nondifferentiability, and high dimensionality. The difference from the other algorithms is that the time consumed by the MA mainly lies in using the climb process to search local optimal solutions. So according to the characteristics of the clustering problem, a new monkey algorithm with the search operator of artificial bee colony is proposed. In this section, we mainly describe the main components of the algorithm, representation of solution, initialization, climb process, watch-jump process, and improved somersault process and search operator. The details are listed as follows.

Representation of Solution.
At first an integer is defined as the population size of monkeys. And then, for the monkey , its position is denoted as a vector = ( ,1 , ,2 , . . . , , * ), where is equal to the number of the cluster centroids, and each cluster centroid includes components. The position will be employed to express a solution of the optimization problem.

Initial Population.
Initialization of the population will have great effect on the precision. In the original MA, the initial populations of possible solutions are generated randomly in the solution interval. However, for the clustering problem, each component of the data has different intervals. So, for monkey , we randomly choose of the samples (each sample includes components) from the data set.

Climb Process.
The climb process is a step-by-step procedure to change the monkeys' positions from the initial positions to new ones that can make an improvement in the objective function. The climb process is designed to use the idea of pseudo-gradient-based simultaneous perturbation stochastic approximation (SPSA) [27,28], a kind of recursive optimization algorithm. For the monkey , its position is = ( ,1 , ,2 , . . . , , * ), = 1, 2, . . . , , respectively. ( ) is the corresponding fitness value. The improved climb process is given as follows.
(1) Randomly generate two vectors Δ = (Δ ,1 , Δ ,2 , . . . , Δ , * ), where = 1, 2, . . . , * , respectively. The parameter ( > 0), called the step of the climb process, can be determined by specific situations. The step length plays a crucial role in the precision of the approximation of the local solution in the climb process. Usually, the smaller the parameter is, the more precise the solutions are.
(4) Update with provided that is feasible. Otherwise, we keep unchanged.
(5) Repeat steps (1) to (4) until the maximum allowable number of iterations (called the climb number, denoted by ) has been reached. Figure 1 shows the climb process of the monkey seeking the local optimal solution of ( ) = 2 with climb step 0.001 and climb number 1000 in 3d space. The red point represents the initial position and the green is the end.

Watch-Jump Process.
After the climb process, each monkey arrives at its own mountaintop. And then it will take a look and determine whether there are other points around it being higher than the current one. If yes, it will jump there from the current position and then repeat the climb process until it reaches the top of the mountain. For the monkey , its position is = ( ,1 , ,2 , . . . , , * ), = 1, 2, . . . , . The watch-jump process is given as follows.  (1) Randomly generate real numbers from ( − , + ), = 1, 2, . . . , * , respectively. Let = ( 1 , 2 , . . . , * ). The parameter is called the eyesight of monkeys which can be determined by specific situations. Usually, the bigger the feasible space of optimal problem is, the bigger the value of should be taken.
(2) Update with provided that both ( ) ≥ ( ) and are feasible. Otherwise, repeat step (1) until an appropriate point is found. For the clustering problem, we only replace with whose function value is smaller than or equal to ( ).
(3) Repeat the climb process by employing as an initial position.

Somersault Process Based on the -Means.
After repetitions of the climb process and the watch-jump process, each 4 The Scientific World Journal         The Scientific World Journal monkey will find a locally maximal mountaintop around its initial point. In order to find a much higher mountaintop, it is natural for each monkey to somersault to a new search domain. In the original MA, the monkeys will somersault along the direction pointing to the pivot which is equal to the bar center of all monkeys' current positions. Figure 2 shows the somersault process of the original MA [19]. However, the monkey is easy to leave the solution interval for the clustering problem and all monkeys will lose the population diversity because of somersaulting along the direction pointing the pivot after many iterations. Here we choose the center of objects belonging to the cluster as the pivot to replace the center of all monkeys by the -means algorithm. For the monkey , its position is = ( ,1 , ,2 , . . . , , * ); the improved somersault process is given as follows.
(1) Assign each object to the group that has the closest centroid 1 , 2 , . . . , according to the location of the monkey .
(2) Randomly generate real numbers from the interval [ , ] (called the somersault interval, which decides the maximum distance that monkeys can somersault).
(5) Update with provided that both ( ) ≥ ( ) and are feasible. Otherwise, generate a new solution to replace .

Search
Operator. The original MA mainly lies in using the climb process to search local optimal solutions. The climb step plays a crucial role in the precision of the approximation of the local solution. The smaller the climb step is, the bigger the climb number is and the higher precision the solution is; it will spend a lot of time to calculate the objective value. For example, the climb step is 0.01; the climb number should be set 100, so it needs to calculate 200 times objective function value every climb process. When we set the climb step 0.001, the climb number should be set 1000; we need to calculate 2000 times objective function value every climb process. In order to reduce the computing time, this paper introduced search operator of artificial bee colony algorithm before climb process.
The artificial bee colony optimization algorithm (ABC) is described by Karaboga based on the foraging behavior of honey bees [29]. In the ABC, the colony consists of three groups of bees: employed bees, onlookers, and scouts. Each employed bee seeks a food source according to the search operator (7) nearby its current food source then evaluates its nectar amount and determines whether to update the food source by greedy strategy. After all employed bees complete the search process, they share the position information of the food sources with the onlookers on the dance area. Each onlooker watches the dance of employed bees and chooses one of their sources with a probability depending on the nectar amounts of sources. If a food source cannot be improved through predetermined cycles, called "limit, " it is removed from the population, and the employed bee of that food source becomes scout. The search operator of employed bees is as follow: where ∈ {1, 2, . . . , } and ∈ {1, 2, . . . , * } are randomly chosen indexes. Although is determined randomly, and it is different from , is a random number between [−1, 1]. The experimental results show that it has a good optimization performance in optimizing complex multimodal problems [29] due to the strong local exploration ability of search operator.
In the MA, the local exploration ability of the climb process is weak and the somersault process has strong global search ability. Here we introduced the ABC search operator before the climb process to strengthen seeking the local optimal solution. For each monkey, each component is updated once adopting the ABC search operator. So each monkey will move * times. The local search process before the climb process is as shown in Algorithm 1.
To sum up, the whole flowchart of ABC-MA to find the optimal solution of the clustering problem is shown in Figure 3.

Simulation Experiment
In this section, the experiments were done using a desktop computer with a 3.01 GHz AMD Athlon(tm) II X4640 processor, 3 GB of RAM, running a minimal installation of Windows XP. The application software was Matlab 2012a. The experimental results comparing the ABC-MA clustering algorithm with six typical stochastic algorithms including the MA [19], PSO [30], CPSO [1,17], ABC [16,17], CABC [17], and -means algorithms are provided for two artificial data sets and ten real-life data sets (Iris), Teaching  Assistant Evaluation (TAE), wine, seeds, Ripley's glass, Statlog (heart), Haberman's survival, balance scale, Contraceptive Method Choice (CMC), and Wisconsin breast cancer which are selected from the UCI machine learning repository [31].
Teaching Assistant Evaluation ( = 151, = 5, = 3): the data consist of evaluations of teaching performance over three regular semesters and two summer semesters of 151 teaching assistant (TA) assignments at the Statistics Department of the University of Wisconsin-Madison. The scores were divided into 3 roughly equal-sized categories ("low, " "medium, " and "high") to form the class variable [31].
Wine data ( = 178, = 13, = 3): this is the wine data set, which is also taken from MCI laboratory. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. There are 178 instances with 13 numeric attributes in wine data set. All attributes are continuous. There is no missing attribute value [14,17,31]. Seeds data ( = 210, = 7, = 3): this data set consists of 210 patterns belonging to three different varieties of wheat: Kama, Rosa, and Canadian. From each species there are 70 observations for area , perimeter , compactness ( = 4 * * / 2 ), length of kernel, width of kernel, asymmetry coefficient, and length of kernel groove [31].
Statlog (heart) data ( = 270, = 13, = 2): this data set is a heart disease database similar to a database already present in the repository (heart disease databases) but in a slightly different form [31].
Haberman's survival ( = 306, = 3, = 2): the dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer. It records two survival status patients with the age of patient at time of operation, patient's year of operation, and number of positive axillary nodes detected [31].
Balance scale data ( = 625, = 4, = 3): this data set was generated to model psychological experimental results. Each example is classified as having the balance scale tip to the right, tip to the left, or balanced. The attributes are the left weight, the left distance, the right weight, and the right distance. The correct way to find the class is the greater of (left-distance * left-weight) and (right-distance * rightweight). If they are equal, it is balanced [31].
Contraceptive Method Choice ( = 1473, = 10, = 3): this data set is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. The samples are married women who were either not pregnant or do not know if they were at the time of interview. The problem is to predict the current Contraceptive Method Choice (no use, long-term methods, or short-term methods) of a woman based on her demographic and socioeconomic characteristics [14,17,31]. Here

Algorithm Comparison.
For every data set, each algorithm is applied 20 times individually with random initial solution. For the art1 and art2 data set, once the randomly generated parameters are determined, the same parameters are used to test the performance of three algorithms. The best value, the worst value, the mean value, and standard deviation are recorded in Tables 1, 2, 3 , 4, 5, 6, 7, 8, 9, 10, 11, and 12. The results are kept four digits after the decimal point.
The simulation results given in Tables 1-12 show that ABC-MA is very precise. As seen from results, the ABC-MA algorithm provides the optimum value and small standard deviation in compare to those obtained by the other methods. For Iris data set, the optimum value, the worst value, the average value, and the standard deviation of ABC-MA are 96.6555, 96.6563, 96.6558, and 3.2699 − 04, respectively. CABC also seeks the optimum solution 96.6555, but the standard deviation is bigger than ABC-MA. While the best solutions of MA, ABC, CPSO, PSO, and -means are 96.6614, 96.6566, 96.6580, 96.6556, and 97.1901, respectively. Table 4 shows the results of algorithms on the TAE dataset. The optimum value is 1490.9258 which are obtained only by ABC-MA. Noticeably other algorithms fail to attain this value even once within 20 runs. The mean value of ABC-MA is 1490.9456 which are smaller than that of MA, CABC, ABC, CPSO, PSO, and -means. Table 5 provides the results of algorithms on the wine dataset. As seen from the results, the ABC-MA algorithms are far superior to those obtained by the others. For the seeds data set, the best value, the worst value,   Table 8, the best value, the worst value, the worst value, and the standard deviation of ABC-MA are 10622.9824, 10622.9826, 10622.9824, and 3.0810 −05, respectively. It means that the ABC-MA algorithm is able to converge to the global optimum 10622.982 in all of runs, while -means, PSO, and CPSO may be trapped at local optimum solutions. For the Haberman's survival data set, the optimum value 2566.9888 can be obtained by ABC-MA and ABC. But the standard deviation of ABC is 1.2646 − 04 which is a little smaller than that of ABC-MA. The standard deviation of PSO is a little smaller than that of CPSO. Table 10 shows the results of algorithms on the balance scale dataset. As seen from the results, the best value, the worst value, and the mean value of ABC-MA algorithm are much better than those obtained by the others. For Wisconsin breast cancer data set, the best value and the worst value are 2964.3870 and 2964.9883. They are just very close, so the standard deviation is very small. The globe optimal value also can be obtained by the CABC algorithm. But the standard deviation   Haberman's survival data distribution be obtained by the CABC algorithm. The best value and the worst value of PSO are 5766.6412 and 6059.5781. That means PSO may fall into local optimum solutions. From Table 1 to Table 12, we can conclude that the results obtained by ABC-MA are clearly better than the other algorithms for most of data sets; CABC is a little better than ABC and CPSO is a little better than PSO; the -means is the worst for most of data sets. Figures 6,7,8,9,10,11,12,13,14,15,16, and 17 show the convergence curves of different data sets for various algorithms. As seen from the figures, the convergence rate of MA is the fastest. Figures 18, 19, 20, and 21 show the original data distribution of Iris and Haberman's survival data sets and the clustering result by ABC-MA algorithm.

Algorithm Evaluation.
In the original MA, the climb step plays a crucial role in the precision of the approximation of the local solution in the climb process. For example, for wine data set, when the climb step is 0.01, the optimum value, the worst value, the average value, and the standard deviation of MA are 16302.7254, 16467.6147, 16366.5331, and  In the original MA, the time consumed mainly lies in using the climb process to search local optimal solutions. When we set the climb number 200, it needs computing function values 400 times for every monkey in the climb process. Each iteration needs to calculate about 2000 times function values. For ABC-MA, the computing time is determined by the number of the clusters and the dimensions of the object. For example, for the Iris data set, the number of the clusters is 3 and the dimensions of the object is 4; each iteration needs to calculate the objective values about 160 times which is far less than that of MA. For PSO and ABC, the number of function evaluations is 100 at every iteration, but the results are poor. Because of introducing the cooperative strategy, CPSO [32] and CABC [17] increased a lot of computation time compared with PSO and ABC with the same population size. For example, for Iris data set, when the population size is 100, the numbers of the function evaluations of CABC, CPSO, ABC, and PSO are about 1400, 1300, 200, and 100, respectively. However, CABC and CPSO are difficult to convergence and the result of CPSO is not good.
In order to compare the performance of the three kinds of improved algorithms, the ABC-MA, CABC, and CPSO algorithms are run 20 times individually with 10000 function evaluations. The results are recorded in Table 13. As seen from the results, the results of the ABC-MA algorithm are better than CABC and CPSO. The better solution and the smaller standard deviation can be obtained most of data sets.
The results of CPSO and CABC have apparent difference between the 100 iterations and 10000 function evaluations. However, the difference of ABC-MA is small between the two.   We can conclude the ABC-MA has faster convergence speed than CABC and CPSO. The simulation results in the tables demonstrate that the proposed hybrid evolutionary algorithm converges to global optimum with a smaller standard deviation and better globe value and leads naturally to the conclusion that the ABC-MA algorithm is a viable and robust technique for data clustering. Figure 22 shows The boxplots of distribution of the objective values obtained by CPSO, CABC, and ABC-MA over 20 independent executions. We can see that ABC-MA can obtain smaller upper bound, smaller average, and lower bound of objective values.

Conclusions
Monkey algorithm is a new swarm intelligence algorithm; its outstanding advantage is that it can effectively avoid falling into local optimal solutions through the somersault process. In the original MA, the precision of the problem is decided by climb step and climb number of the climb process. Because climbing number is large, a lot of running time is consumed in the climb process. In this paper, an improved MA is proposed, artificial colony algorithm search operator is introduced on the basis of the original MA; the local optimal solution can be found by the climb process combined with the artificial colony algorithm search operator, so the climb number is reduced and the running time is far less than the original MA. In view of the clustering problem, we choose the center of objects belonging to the cluster as the pivot to replace the center of all monkeys by the -means algorithm in the somersault process. In this paper, 10 real instances are tested to compare with other algorithms by 100 iterations and 10000 function evaluations. The numerical experiment results show the improved MA has better results than themeans method, PSO, ABC, CPSO, CABC, and MA; especially the testing results of 10000 function evaluations are better, and running time is far lower than the original algorithm. So the improved MA has a good performance than that of the basic monkey algorithm for clustering analysis.