A Modified MinMax k-Means Algorithm Based on PSO

The MinMax k-means algorithm is widely used to tackle the effect of bad initialization by minimizing the maximum intraclustering errors. Two parameters, including the exponent parameter and memory parameter, are involved in the executive process. Since different parameters have different clustering errors, it is crucial to choose appropriate parameters. In the original algorithm, a practical framework is given. Such framework extends the MinMax k-means to automatically adapt the exponent parameter to the data set. It has been believed that if the maximum exponent parameter has been set, then the programme can reach the lowest intraclustering errors. However, our experiments show that this is not always correct. In this paper, we modified the MinMax k-means algorithm by PSO to determine the proper values of parameters which can subject the algorithm to attain the lowest clustering errors. The proposed clustering method is tested on some favorite data sets in several different initial situations and is compared to the k-means algorithm and the original MinMax k-means algorithm. The experimental results indicate that our proposed algorithm can reach the lowest clustering errors automatically.


Introduction
Clustering has broad applications in pattern recognition, image processing, machine learning, and statistics [1,2]. The aim is to partition a collection of patterns into disjoint clusters, such that patterns in the same cluster are similar, and, on the other hand, patterns of two different clusters are distinct.
One of the most popular clustering methods is -means algorithm, where clusters are identified by minimizing the clustering error. The -means algorithm is widely accepted in the literature. However, the -means algorithm is sensitive to the choice of initial starting conditions [3,4].
To deal with this problem, several methods have been proposed. For example, a method has been proposed to eliminate the dependence on random initial conditions. Global -means algorithm [5] is an incremental approach that starts from one cluster and at each step a new cluster is deterministically added to the solution according to an appropriate criterion. Based on the algorithm, Bagirov et al. proposed some modifications [6,7]. Tzortzis and Likas extended the algorithm to kernel space [8,9]. Zang et al. developed a fuzzy -means clustering algorithm and applied such algorithm to the investigation of speech signal [10]. An alternative approach to eliminate the influence of initial starting conditions is to use the multi-restarting -means algorithm [11][12][13][14]. A new version of this method is the Min-Max -means clustering algorithm [15], which starts from a randomly picked set of cluster centers and tries to minimize the maximum intraclustering error. Its application [16] shows that the algorithm is efficient in intrusion detection.
Particle Swarm Optimization (PSO), a population-based stochastic search process, was firstly proposed by Eberhart and Kennedy in 1995 [17]. Such process was introduced to simulate the social behaviors of bird flocking or fish schooling when the group of birds or fish searches for food. PSO is very fast, simple, and easy to understand and implement. PSO does not require the adjustment of many parameters and the memory space required by the algorithm is little. PSO has been widely used to improve other algorithms' performances such as ANN [18][19][20], scheduling problems [21,22], traveling salesman problems [23,24], and anomaly detection problems [25]. PSO also has been successfully applied in clustering problem [26][27][28][29][30]. Recently, PSO and -means algorithm are combined to develop novel clustering algorithms [31,32].

Computational Intelligence and Neuroscience
In this paper, a new version of modified MinMax -means algorithm is proposed. Recent investigation indicates that if the parameter of max has been set, the programme can reach the lowest max at ∈ [ init , max ] [15]. However, experiments imply that the above conclusion is not always correct.
Different parameters of max result in different values of max , and as such the value does not always comply with the rule that the larger the value of max , the lower the value of max . In MinMax -means algorithm, parameter needs priory set as well, and different values of also result in different values of clustering errors. The value of clustering errors does not have any regulation. Therefore we should decide the values of parameters to minimize cluster errors.
In this paper, we calculate the clustering errors for each parameter , respectively, without using the automatically adapted exponent as in [15]. By utilizing PSO, we choose the parametric value and obtain the minimum clustering errors. Thus, we can obtain the minimum clustering errors without choosing parameters manually.
We carry out many experiments on different data sets, including synthetic data sets and real world data sets in five different initial situations. Balanced type, unbalanced type, and almost balanced type data sets are considered in the study. Our investigation indicates that the proposed algorithm can search proper parameters to get the minimum clustering errors.
The rest of the paper is organized as follows. We briefly describe the -means, MinMax -means, and PSO algorithms in Section 2. In Section 3 we propose our algorithms. Experimental evaluation is presented in Section 4. Lastly, Section 5 concludes our work.

-Means Algorithm.
Given a data set = { 1 , 2 , . . ., }, ∈ ( = 1, 2, . . . , ), we aim to partition this data set into disjoint clusters 1 , 2 , . . . , , such that a clustering criterion is optimized. Usually, the clustering criterion is the sum of the squared Euclidean distances between each data point and the corresponding cluster center . This kind of criterion is called clustering error and depends on the cluster centers 1 , 2 , . . . , : where Generally, we call ∑ =1 ( ∈ )‖ − ‖ 2 intraclustering error (variance). Obviously, clustering error is the sum of intraclustering error. Therefore, we use sum to denote ( 1 , 2 , . . . , ); that is, sum = ( 1 , 2 , . . . , ). The -means algorithm finds locally optimal solutions with respect to the clustering error. The main disadvantage of the method is its sensitivity to initial position of the cluster center.

The MinMax -Means Algorithm.
As is known, in the -means algorithm, we minimize the clustering error. The MinMax -means algorithm minimizes the maximum intraclustering error: where , ( ) are defined as (1). Since directly minimizing the maximum intracluster variance max is difficult, a relaxed maximum variance objective was proposed [15]. The authors constructed a weighted formulation of the sum of the intracluster variances: where the exponent is a constant. The greater (smaller) the value is, the less (more) similar the weight values become, as relative differences of the variances among the clusters are enhanced (suppressed).
In [15], the authors give a practical framework that extends the MinMax -means to automatically adapt the exponent to the data set. It begins with a small ( init ) that is increased by step after each iteration, until a maximum value ( max ) is attained. For the method, we should first decide the values of parameters init , max , and step . Now, all clusters contribute to the objective, according to different degrees regulated by the values. It is clear that the more a cluster contributes (higher weight), the more intensely its variance will be minimized. So are calculated by To enhance the stability of the MinMax -means algorithm, a memory effect could be added to the weights: 2.3. PSO. PSO is a population-based metaheuristic algorithm. It is launched with a population (called a swarm) of individuals (called particles), where each particle represents a potential solution in the problem space. All of the particles move around the problem space to find the optimization solution (the best position) according to some speed (velocity) iteratively. In the -dimension problem space, the position Computational Intelligence and Neuroscience 3 and the velocity of the th particle are, respectively, denoted by the following vectors: ( ) = ( 1 ( ) , 2 ( ) , . . . , ( )) , The solution is evaluated by the fitness value for each particle at every iteration. Afterwards, a record of the best position of each particle based on fitness value is saved. The best previously visited position of the particle at iteration is denoted by vector ( ) = ( 1 ( ), 2 ( ), . . . , ( )) as the personal best. The position of all the particles which give the best fitness value at iteration is also recorded as the global best position denoted by ( At each iteration, the velocity and the position of each particle are updated according to the following equations: where is an inertia weight that introduces a preference for the particle to continue moving in the same direction. Here, 1 and 2 are two positive constant parameters called coefficients, and 1 and 2 denote two random numbers between (0, 1). In order to prevent the particle's blind search, each component of and each component of are kept within the ranges [− max , max ] and [− max , max ], respectively.
Inertia weight is not included in the original version [17]. The inclusion of an inertia weight in the PSO was first proposed by Shi and Eberhart [33]; then they subsequently investigated the effect of the inertial weight and maximum velocity on the performance of the particle swarm optimizer, and they provided guidelines for selecting these two parameters [34]. There are several strategies of inertial weight described in [20,[35][36][37], and there are some other ways to insure the convergence of the PSO, that is, a constriction factor [38] and a conditional random [39].

The Proposed Algorithm
The effectiveness and robustness of the MinMax -means algorithm depend on initializations of parameters [15]. Reference [15] introduces a practical framework that extends the MinMax -means to automatically adapt the exponent to the data set. They concluded that if max has been set, the programme can reach the lowest max at ∈ [ init , max ]. However, our experiments show that it is not always correct. We do experiments using the well-known data set Pendigits to support our claim. The description of this data set is given in Section 4. In our calculations the results of the MinMax -means algorithm are the average over 100 runs of max and sum defined by (3) and (1), respectively. The results are reported in Table 1. One can see from Table 1 that max and sum have different values for different and . To address this problem we propose a new algorithm to find optimal values of parameters and which provide the minimum  values of max and sum . We call it the PSO MinMax -means algorithm.
The proposed algorithm includes the PSO process and the MinMax -means process. We utilize PSO to optimize the two parameters. That is to say, we find optimal parameters and put them into MinMax -means process to obtain the minimum clustering errors. The specific method is illustrated as follows.
Step 1. Set up parameters of PSO, including iteration, population size, maximum velocity ( max ), inertial weight ( ), and two learning factors ( 1 , 2 ); give the number of clusters and initial weight = 1/ .
Step 2. Initialize each particle position ( ) and velocity ( ) randomly; randomly choose the center of each cluster. Note that each particle is a vector of the two parameters ( , ) for MinMax -means algorithm. Therefore, can be represented as = ( , ).
Step 4. Calculate the weighted sum of the intracluster variances by (4) for each particle.
Step 7. If stopping conditions of MinMax -means algorithm are not satisfied, go back to Step 3; otherwise, go to Step 8.
Step 8. Calculate fitness value for each particle using formula (1) or (3). That is to say, clustering errors are the fitness functions.
Step 9. Update the personal best and the global best .
Step 10. According to the best positions and , update the velocity and position for each particle using formula (8). It should be noted that cannot be larger than max or smaller than − max , and cannot be larger than max or smaller than − max . Thus, Step 11. Record the final best and if the specified number of iterations is reached; otherwise, go back to Step 8.
After performing all the steps above, we find the optimal parameters of MinMax -means algorithm and then get the final clustering results by plugging the optimal parameter to MinMax -means algorithm. Actually Steps 3 to 7 are the process of MinMax -means algorithm.

Computational Results
In the following subsections, we report experimental results for five different states (State 1 to State 5) using both synthetic and real world data sets. We also compare the -means, the MinMax -means, and the PSO MinMax -means algorithms using numerical results.
In each state, the results of the MinMax -means algorithm are tested in different parameters ( max , ), and we set init = 0, step = 0.01 as in [15]; we set the population size 20 and the generation number 100 in PSO MinMax -means algorithm. The experiments in each state of PSO MinMaxmeans algorithm just have two sets of results. The value of parameters in each set is one of the optimization values which can result in one minimum of the clustering errors.

Synthetic Data Sets.
In this subsection, two synthetic data sets 1 and 2 from [40] are used to test algorithms. Typically, they are generated from a mixture of four or three bivariate Gaussian distributions on the plane coordinate system. Thus, a cluster takes the form of a Gaussian distribution. Particularly, all the Gaussian distributions have the covariance matrices of the form 2 , where is the standard deviation. The first data set 1 with four Gaussian distributions and 300 sample points is located at (−1, 0), (1, 0), (0, 1), and (0, −1), respectively. Actually, takes the values of 0.4. As for data set 2 , we give three Gaussian distributions located at (1, 0), (0, 1), and (0, −1), with 400, 300, and 200 sample points, respectively. Therefore, 2 represents the asymmetric situation where the clusters do not take the same shape with different number of sample points. The data sets are shown in Figure 2, respectively.

Real World Data Sets.
Coil-20 is a data set [41], which contains 72 images taken from different angles for each of the 20 included objects. We use three subsets, Coil15, Coil18, and Coil19, with images from 15, 18, and 19 objects, respectively, as the data set in [15]. The data set includes 216 instances and each of the data instances has 1000 features.
Yeast (UCI) [42] includes 1484 instances about the cellular localization sites of proteins and eight attributes. Proteins belong to ten categories. Five of the classes are extremely underrepresented and are not considered in our evaluation. The data set is unbalanced.
Pendigits (UCI) [42] includes 10992 instances of handwritten digits (0-9) from the UCI repository [19] and 16 attributes. The data set is almost balanced. In the experiment, the sample data of Pendigits data set will be firstly normalized and then the algorithm will be implemented on the normalized data.
Ecoli (UCI) [42] is composed of 336 protein localization sites for the E. coli bacterium and seven attributes. Proteins Computational Intelligence and Neuroscience 5

110.8688
Computational Intelligence and Neuroscience   Tables 2, 3, and 5-8 are shown in Table 9 correspondingly. Based on the analysis shown in the tables, first, we find that our proposed algorithm can attain the lowest max and sum values in optimal parameters except in Table 5 (States 2-5). In States 2-5 of Table 5, our proposed algorithm did not attain the lowest max . It lies in the drawbacks of PSO algorithm itself which just gets the local optimal solution. Sometimes our proposed algorithm has better max than -means algorithm and the original MinMax -means algorithm (see Tables 3 and 7) and sometimes we have both better max and sum than other algorithms. We also find that max and sum cannot reach the lowest value simultaneously.
Second, it follows from the tables that the values of clustering errors in -means and in MinMax -means are equal when setting max = 0, = 0, implying that themeans algorithm can be considered as a special case of the MinMax -means algorithm.
Third, the proper parameter in our algorithm is not a single value. Here, we just give one of the values. Hence, MinMax -means algorithm and our proposed algorithm can both reach the lowest clustering errors on different parameter value.
Fourth, for the operation time of the algorithm, it is easy to observe that -means algorithm consumes the least time. The operation time of our proposed algorithm depends on the population size ( ), the number of generation ( ), and the speed of convergence. For convenience, denote the running time of single MinMax -means algorithm as ; then the operation time of our proposed algorithm is . Comparing the running time of MinMax -means to that of our proposed algorithm, it is hard to identify which method consumes less time. For example, when we perform Coil2 data set (State 1) with max = 0.5, = 0.3, the running time of MinMax -means is 0.7403 s, and the running time of our proposed algorithm ( = 3, = 2) is 0.6653 s. However, when we perform experiment on Ecoli data set (State 1), the running time is 0.3637 s for MinMax -means ( max = 0.5, = 0.3) and the running time for our proposed algorithm ( = 10, = 5) is 0.7848 s.
Finally, Tzortzis and Likas [15] stated that high value forces clusters with large variance to lose most or even all of their instances as their enormous weights excessively distance the instances from their centers.
Hence, their solution has the following properties: whenever an empty or singleton cluster emerges, no matter if max has been reached or not, decreasing by step reverts back to the cluster assignments corresponding to the previous value and clustering must be resumed from there. Our algorithm can resolve the problem automatically. We do experiment on all data sets mentioned in this paper. The result is as follows.       Pendigits data set, when = 0, MinMax -means algorithm converges for < 0.5. If is bigger than the convergence value, we get max = sum , which is much bigger than that of proper . Similar situation occurs on Yeast data set. In summary, we find that when is bigger than the convergence value, we get max = sum , which are much bigger than that of proper ; meanwhile the data sets are clustered just for one class. Based on the above analysis, our proposed algorithm cannot choose the corresponding parameter in PSO process. Therefore our proposed algorithm cannot be chosen singleton or empty for one of the clusters.

Conclusions
We modified the MinMax -means algorithm to attain the lowest clustering errors automatically. Firstly, we use PSO to search the optimal parameters which can result in the minimum errors. Then we plug the parameters obtained by PSO into MinMax -means algorithm. Experiments are tested on different data sets in different initial states, and the results show that our proposed algorithm is efficient in most situations.
As for future work, we plan to accelerate the proposed algorithm. A possible direction is data sets processing. For example, we can use the method of PCA. We also plan to achieve time efficiency of the PSO process, since many iterations in the algorithm may have repetitive calculations.