An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO

To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.


Introduction
Clustering is the process of assigning a homogeneous group of objects into subsets called clusters, so that objects in each cluster are more similar to each other than objects from different clusters based on the values of their attributes [1]. Clustering techniques have been studied extensively in data mining [2], pattern recognition [3], and machine learning [4].
Clustering algorithms can be generally grouped into two main classes, namely, supervised clustering and unsupervised clustering where the parameters of classifier are optimized. Many unsupervised clustering algorithms have been developed. One such algorithm is -means, which assigns objects to clusters by minimizing the sum of squared Euclidean distance between the objects in each cluster to the cluster center. The main drawback of the -means algorithm is that the result is sensitive to the selection of initial cluster centroids and may converge to local optima [5].
For handling those random distribution data sets, soft computing has been introduced in clustering [6], which exploits the tolerance for imprecision and uncertainty in order to achieve tractability and robustness. Fuzzy sets and rough sets have been incorporated in the -means framework to develop the fuzzy -means (FCM) [7] and rough -means (RCM) [8] algorithms.
Fuzzy algorithms can assign data object partially to multiple clusters and handle overlapping partitions. The degree of membership in the fuzzy clusters depends on the closeness of the data object to the cluster centers. The most popular fuzzy clustering algorithm is FCM which is introduced by Bezdek [9] and now it is widely used. FCM is an effective algorithm, but the random selection in center points makes iterative process fall into the saddle points or local optimal solution easily. Furthermore, if the data sets contain severe noise points or if the data sets are high dimensional, such as bioinformatics [10], the alternating optimization often fails to find the global optimum. In these cases, the probability of finding the global optimum can be increased by stochastic methods such as evolutionary or swarm-based methods. Bezdek and Hathaway [11] optimized the hard -means (HCM) model with a genetic algorithm. Runkler [12] introduced an ant colony optimization algorithm which explicitly minimizes the HCM and FCM cluster models. Al-Sultan and Selim [13] proposed the simulated annealing algorithm (SA) to overcome some of these limits and got promising results.
PSO is a population based optimization tool developed by Eberhart and Kennedy [14], which can be implemented and applied easily to solve various function optimization problems. Runkler and Katz [15] introduced two new methods for minimizing the reformulated objective functions of the FCM clustering model by PSO: PSO-and PSO-. In order to overcome the shortcomings of FCM, a PSO-based fuzzy clustering algorithm was discussed [16]; this algorithm uses the global search capacity of PSO to overcome the shortcomings of FCM. For finding more appropriate cluster centers, a generalized FCM optimized by PSO algorithm [17] was proposed.
Shadowed sets are considered as a conceptual and algorithmic bridge between rough sets and fuzzy sets, thereby incorporate the generic merits, and have been successfully used for unsupervised learning. Shadowed sets introduce (0, 1) interval to denote the belongingness of those clustering points, and the uncertainty among patterns lying in the shadowed region is efficiently handled in terms of membership. Thus, in order to disambiguate and capture the essence of a distribution, recently the concept of shadowed sets has been introduced [18], which can also raise the efficiency in the iteration process of the new prototypes by eliminating some "bad points" that have bad influence on cluster structure [19,20]. Compared with FCM, the capability of shadowedmeans is enhanced when dealing with outlier [21].
Although lots of clustering algorithms based on FCM, PSO, or shadowed sets were proposed, most of them need to input the preestimated cluster number . To obtain the desirable cluster partitions in a given data, commonly is set manually, and this is a very subjective and somewhat arbitrary process. A number of approaches have been proposed to select the appropriate . Bezdek et al. [22] suggested the rule of thumb ≤ 1/2 where the upper bound must be determined based on knowledge or applications about the data. Another approach is to use a cluster validity index as a measure criterion about the data partition, such as Davies-Bouldin (DB) [23], Xie-Beni (XB) [24], and Dunn [25] indices. These indices often follow the principle that the distance between objects in the same cluster should be as small as possible and the distance between objects in different clusters should be as large as possible. They have also been used to acquire the optimal number of clusters according to their maximum or minimum value. Therefore, we wish to find the best in some range, obtain cluster partitions by considering compactness and intercluster separation, and reduce the sensitivity to initial values. Here, we propose a modified algorithm named as SP-FCM which integrates the merits of PSO and interleaves shadowed sets between stabilization iterations. And it can automatically estimate the optimal cluster number with a faster initialization than our previous approach.
The structure of the paper is as follows. Section 2 outlines all necessary prerequisites. In Section 3, a new clustering approach called SP-FCM is presented for automatically finding the optimal cluster number. Section 4 includes the results of experiments involving UCI data sets, yeast gene expression data sets, and real data set. In Section 5, main conclusions are covered.

Related Clustering Algorithms
In this section, we briefly describe some basic concepts of FCM, PSO, shadowed sets, and XB validity index and review the PSO-based clustering method.

FCM.
We define = { 1 , . . . , } as the universe of a clustering data set, = { 1 , . . . , } as the prototypes of clusters, and = [ ] × as a fuzzy partition matrix, where ∈ [0, 1] is the membership of in a cluster with prototype ; , ∈ , where P is the data dimensionality, 1 ≤ ≤ , and 1 ≤ ≤ . The FCM algorithm is derived by minimizing the objective function [22] where > 1.0 is the weighting exponent on each fuzzy membership and is the Euclidian distance from data vectors to cluster center . And This produces the following update equations: After computing the memberships of all the objects, the new prototypes of the clusters are calculated. The process stops when the prototypes stabilize. That is, the prototypes from the previous iteration are of close proximity to those generated in the current iteration, normally less than an error threshold.

PSO.
PSO was originally introduced in terms of social and cognitive behavior of bird flocking and fish schooling. The potential solutions are called particles which fly through the problem space by following the current best particles. Each particle keeps track of its coordinates in the problem space which are associated with the best solution that has been achieved so far. The solution is evaluated by the fitness value, which is also stored. This value is called best. Another best value that is tracked by the PSO is the best value, obtained so far by any particle in the swarm. The best value is a global best and is called best. The search for the better positions follows the rule as ( + 1) = ( ) + 1 1 ( best ( ) − ( )) + 2 2 ( best ( ) − ( )) , where and are position and velocity vector of particle, respectively, is inertia weight, 1 and 2 are positive constants, called acceleration coefficients which control the influence of best and best in search process, and 1 and 2 are random values in the range [0, 1]. The fitness value of each particle's position is determined by a fitness function, and PSO is usually executed with repeated application of (5) until a specified number of iterations have been exceeded or the velocity updates are close to zero over a number of iterations.

PSO-Based FCM.
In this algorithm [26], each particle Part represents a cluster center vector, which is constructed as where represents the th particle, = 1, 2, . . . , is the number of particles, and < . is the th cluster center of particle Part . Therefore, a swarm represents a number of candidates cluster center for the data vector. Each data vector belongs to a cluster according to its membership function and thus a fuzzy membership is assigned to each data vector. Each cluster has a cluster center per iteration and presents a solution which gives a vector of cluster centers. This method determines the position vector Part l for every particle, updates it, and then changes the position of cluster center. And the fitness function for evaluating the generalized solutions is stated as The smaller is the FCM , the better is the clustering effect and the higher is the fitness function ( ).

Shadowed Sets.
Conventional uncertainty models like fuzzy sets tend to capture vagueness through membership values and associate precise numeric values of membership with vague concepts. By introducing -cut [19], a fuzzy set can be converted into a classical set. Shadowed sets map each element of a given fuzzy set into 0, 1, and the unit interval [0, 1], namely, excluded, included, and uncertain, respectively.
For constructing a shadowed set, Mitra et al. [21] proposed an optimization based on balance of vagueness. As elevating membership values of some regions to 1 and at the same time reducing membership values of some regions to 0, the uncertainty in these regions can be eliminated. To keep the balance of the total uncertainty regions, it needs to compensate these changes by the construction of uncertain regions, namely, shadowed sets that absorb the previous elimination of partial membership at low and high ranges. The shadowed sets are induced by fuzzy membership function in Figure 1.
Here denotes the objects; ( ) ∈ [0, 1] is the continuous membership function of the objects belonging to a cluster. The symbol Ω 1 shows the reduction of membership, the symbol Ω 2 depicts the elevation of membership, and the symbol Ω 3 shows the formation of shadows. In order to balance the total uncertainty, the retention of balance translates into the following dependency: And the integral forms are given as The threshold of reducing and elevating is and 1 − ( ∈ (0, 0.5)). The optimal value of should be acquired by translating it into the minimization of the following objective function: 4

Computational Intelligence and Neuroscience
For a fuzzy set with discrete membership function, the balance equation is modified as In order to find the best , it should satisfy the following optimal problem: where ∈ [0, 1] is the membership of in a cluster with prototype ; max and min denote the highest and lowest membership values to the th cluster; and is the threshold of the th cluster. The range of feasible values of threshold is . This approach considers all membership values with respect to a fixed cluster when updating the prototype of this cluster. The main merits of shadowed sets involve the optimization mechanism for choosing separate threshold and the reduction of the burden of plain numeric computations.

XB Clustering Validity
Index. The clustering algorithms described above require prespecification of the number of clusters. The partition results are dependent on the choice of . There exist validity indices to evaluate the goodness of clustering according to a given number of clusters; therefore, these validity indices can be used to acquire the optimal value of [27].
The XB index presents a fuzzy-validity criterion based on a validity function which identifies overall compact and separate fuzzy -partitions. This function depends upon the data set, geometric distance measure, and distance between cluster centroids and fuzzy partition, irrespective of any fuzzy algorithm used. For evaluating the goodness of the data partition, both cluster compactness and intercluster separation should be taken into account. For the FCM algorithm with = 2.0, the Xie-Beni index can be shown to be where min = min , − is the minimum distance between cluster centroids. The more separate the clusters, the larger the min and the smaller the XB .

Shadowed Sets-Based PSO-Fuzzy
Clustering: SP-FCM FCM strives to find compact clusters in where is one of the specified parameters. But the process of selecting and adjusting manually to obtain desirable cluster partitions in a given data set is very subjective and somewhat arbitrary.
To seek the optimal cluster structure, is always allowed to be overestimated [28], such that the distances between some clusters are not big enough or the membership values of some objects with different clusters are adjacent and ambiguous in a given data set. And, in this case, the modification of prototypes through long time iteration is meaningless.
The main subject of cluster validation is the evaluation of clustering results to find the partitioning that best fits the data set. Based on the foregoing algorithms, we wish to find cluster partitions that contain compact and well-separated clusters. In our algorithm is also overestimated and the clusters compete for data membership. We can set [ min , max ] as the reasonable range of cluster number based on the knowledge of the data. This provides a more transparent and tractable process of cluster number reduction. Considering the fuzzy partition matrix = [ ] × , each column is comprised of the membership values of all feature vectors with a single cluster center. Thus, an optimal threshold ( = 1, 2, . . . ) for each column should be found to create a harder partition by (12). The amount of data which are assigned membership value equal to 1 is identified as the cardinality of corresponding cluster. According to , the cardinality of the th column is Here, the threshold is not subjectively user-defined but it is established on the balance of uncertainty and can be adjusted automatically in the clustering process. This property of shadowed sets can be used to reduce the cluster number. In order to control the convergence speed, the decision to delete clusters can be based on some thresholds. Different threshold values should be set for different data sets depending on the cluster structure and size of data sets. Here, a threshold and attrition rate (0 < < 1) are set. The decision to delete clusters in SP-FCM is based solely on cluster cardinality and the threshold . If is too small, is reduced more slowly and it may stop prematurely before the optimal cluster number is found. On the other hand, if is too large, may be reduced too drastically. In our method, clusters whose cardinalities < are considered as "candidates" for removal. And we can remove up to ⌊ × ⌋ clusters having the lowest cardinality from the pool of candidates specified by . Limiting the number of clusters that can be removed at one time prevents from being reduced too drastically when is set too high for a given data set. This would automatically estimate the best cluster number while also utilizing a faster, consistent, and repeatable initialization technique. For evaluating the goodness of the data partition, both cluster compactness and intercluster separation should be taken into account. Hence the XB index is adopted.
For each in the range of [ min , max ] a set of cluster validity indexes were calculated, where max is the initial cluster number which is set to be much larger than the expected cluster number. The partition matrix with clusters with the best aggregate validity index is selected as the final cluster partition. The SP-FCM algorithm is summarized as in Algorithm 1.
Computational Intelligence and Neuroscience 5 (1) Initialize max and min , let = max , the real number , iteration counter = 0, iteration counter = 0, maximum iteration number of PSO. (2) Initialize the population size , the initial velocity of particles, the initial position of particles, 1 , 2 , , the threshold and attrition rate . (b) Calculate the cluster center for each particle by (4).
(c) Calculate the fitness value for each particle by (7). (d) Calculate for each particle. (e) Calculate for the swarm. (f) Update the velocity and position of each particle by (5). (g) = + 1 } Until PSO termination condition is met ( * ) (i) Calculate the optimal threshold ( = 1, 2, . . . , ) for each column of partition matrix ( ) by (12), and relocate (1 ≤ ≤ ) of th cluster according to (ii) Calculate cardinality for each cluster on the basis of the number of data whose membership value equal to 1 by (14), 1 ≤ ≤ (iii) Remove all clusters whose < and is among ⌊ × ⌋ lowest cardinality (iv) Update cluster number C Here, if ⌊ × ⌋ is equal to 0, we can let it to be 1. This means that the cluster with the lowest cardinality may be removed. The initial max cluster prototypes can be initialized using exemplars from data points selected by = ⌊( / max ) ⌋ . After termination, the and from ∈ [ min , max ] with the best cluster validity index XB are selected as the final cluster prototype and partition.

Experimental Results
In this section, the performance of FCM, RCM, shadowedmeans (SCM) [21], shadowed rough -means (SRCM) [19], and SP-FCM algorithms is presented on four UCI datasets, four yeast gene expression datasets, and real data. For evaluating the convergence effect, the fundamental criterion can be described as follows: the distance between different objects in the same cluster should be as close as possible; the distance between different objects in different cluster should be as far as possible. Here we use DB index and Dunn index to evaluate the clustering effect. For a given data set and value, the higher the similarity values within the clusters and the intercluster separation, the lower the DB index value. A good clustering procedure should make the value of DB index as low as possible. Reversely, higher values of the Dunn index indicate better clustering in the sense that the clusters are well separated and relatively compact. The details of experiments are mentioned below.

UCI Data Set.
In our experiments, totally four UCI data sets are used, including 4-dimensional Iris, 13-dimensional Wine, 10-dimensional Glass, and 34-dimensional Ionosphere. There are 3 clusters in data set of Iris, each of which has 50 data patterns; 3 clusters in data set of Wine, which have 50, 60, and 68 data patterns; 6 clusters in data set of Glass, which have 30, 35, 40, 42, 36, and 31 separately; and 2 clusters in data set of Ionosphere, which have 226 and 125 data patterns. The validity indices of each method are compared in Table 1. SP-FCM can identify compact groups compared to other algorithms when given the cluster number . It can also be seen that SRCM and SP-FCM have more obvious advantages than FCM, RCM, and SCM. SP-FCM performs slightly better than SRCM in most cases due to the global search ability which enables it to converge to an optimum or near optimum solutions. Moreover, shadowed set-and rough set-based clustering methods, namely, SP-FCM, SRCM, RCM, and SCM, perform better than FCM. It implies that the partition of approximation regions can reveal the nature of data structure and only the lower bound and boundary region of each cluster have positive contribution in the process of updating the prototypes. 6 Computational Intelligence and Neuroscience  As usual, the number of clusters is implied by the nature of the problem. Here, with the shadowed sets involved, one can anticipate that the optimal number of clusters could be found. The fuzzification coefficient can be optimized; however, it is common to assume a fixed value of 2.0, which associates with the form of the membership functions of the generated clusters. For testing the SP-FCM algorithm, the rule ≤ 1/2 is adopted, and the range of the expected cluster number can be set as (1)  [ min = 2, max = 16]. The swarm size is set as = 20, the maximum iteration number of PSO = 50, and, for cluster reduction, the cluster cardinality threshold = 10 and the attrition rate = 0.1. In each cycle, we get the distribution of every cluster, remove part of them according to their cardinality, and calculate the XB index, and the cluster number varies from max to min . After ending the circulation, the partition with the lowest value is selected as the final result. Figure 2 presents the validity indices in the process of generating the optimal cluster number. Smaller values indicate more compact and well-separated clusters.
The validity indices reach their minimum value at = 3, 3, 6, and 2 separately, which correspond to the final cluster prototype and the best partition. Through the shadowed sets and PSO approaches, the influence of each boundary region to the formation of the prototypes and the clusters can be properly resolved. Although more computing time is required to run SP-FCM, the reasonable result can be acquired for processing the overlapping and vagueness data patterns.

Yeast Gene Expression Data Set.
There are four yeast gene expression data sets used in the experiments, including GDS608, GDS2003, GDS2267, and GDS2712 downloaded from Gene Expression Omnibus. The number of classes and samples of GDS608 is 26 and 6303; for GDS2003, the number of classes and samples is 23 and 5617, for GDS2267 is 14 and 9275, and for GDS2712 is 15 and 9275. Table 2 presents the validity indices of different methods after the cluster number was given. The SP-FCM and SRCM obtain the same effect and perform better than other clustering algorithms. The improvement can be attributed to the fact that the global search capacity of PSO is conducive to finding more appropriate cluster centers while escaping from local optima.
For getting the optimum automatically, we let = 2.0, 1 = 1.49, 2 = 1.49, and = 0.72, and the rule ≤ 1/2 is adopted. The swarm size is set as = 20, the maximum iteration number of PSO is = 80, and, for cluster reduction, the range of the expected cluster number, the cluster cardinality threshold , and the attrition rate can be set as (1)  In each cycle, we get the distribution of every cluster, remove part of them according to their cardinality, and calculate the XB index, and the cluster number varies from max to min . The partition with the lowest value is selected as the final result after the loop is ended. As seen in Figure 3, for GDS608, at the beginning the cluster number decreases at a faster rate, it takes 26 iterations to reduce the cluster number from = 80 to = 30 and 4 iterations from = 30 to = 26, and the XB index begins to increase when the cluster number < 26. For GDS2003, it takes 24 iterations to reduce the cluster number from = 75 to = 30 and 7 iterations from = 30 to = 23, and the XB index begins to increase when the cluster number < 23. For GDS2267, it takes 23 iterations to reduce the cluster number from = 96 to = 20 and 6 iterations from = 20 to = 14, and the XB index begins to increase when the cluster number < 14. For GDS2712, it takes 23 iterations to reduce the cluster number from = 96 to = 20 and 5 iterations from = 20 to = 15, and the XB index begins to increase when the cluster number < 15. Here, the advantages of fuzzy sets, PSO, and shadowed sets are integrated in the SP-FCM and make this algorithm applicable to deal with overlapping partitions, the uncertainty, and vagueness arising from the boundary regions, and the optimization process in the shadowed sets makes this method robust to outliers, so that the approximation regions of each cluster can be determined accurately and the obtained prototypes approach the desired locations.

Real Data.
In this experiment totally 10 different packages are tested. Each package is represented by 100 frames captured from different angles by camera, and each frame is extracted SIFT feature points which are used for training a recognition system. Figure 4 shows some images with their SIFT keypoints. And this data set is comprised of 248150 descriptors. We let = 2.0, 1 = 1.49, 2 = 1.49, = 0.72, = 20, = 30, and = 0.01 for the SP-FCM and choose the reasonable range [ min = 200, max = 360] according to the category amount of packages and distribution of keypoints in each image. Eighty iterations of PSO are run on each given to produce the cluster prototype and partition matrix as the starting point for the shadowed sets. Longer PSO stabilization is needed to obtain more stable cluster partitions.  Within each cluster, the optimal decides the cardinality and realizes cluster reduction, and XB index is calculated. Each -partition is ranked using this index and selected as the final output by the smallest index value that indicates the best compact and well-separated clusters. At the beginning, the cluster number decreases at a faster speed; it takes 26 iterations to reduce the cluster number from = 360 to = 289 and 20 iterations from = 289 to = 267. The XB index increases at a relatively faster rate when the cluster number < 267. Figure 5 shows the XB index for ∈ [267, 289]. The index reaches its minimum value at = 276 that means the best partition for this data set is 276 clusters. Table 3 exhibits the comparative analysis of convergence effect. As expected, SP-FCM can provide sound results for the real data; the performance is assessed by those validity indices.

Conclusions
This paper presents a modified fuzzy -means algorithm based on the particle swarm optimization and shadowed sets to perform unsupervised feature clustering. This algorithm called SP-FCM utilizes the global search property of PSO and vagueness balance property of shadowed sets, such that it can estimate the optimal cluster number as it runs through its alternating optimization process. SP-FCM as a randomized based approach has the capability to alleviate the problems faced by FCM, which has some demerits of initialization and falling in local minima. Moreover, this algorithm avoids the subjective and somewhat arbitrary trials to estimate the appropriate value of cluster number, and it enhances this capability to find the optimal cluster number within a specific Computational Intelligence and Neuroscience range using cluster validity measures as indicators. The use of XB validity index allows the algorithm to find the optimum cluster number with cluster partitions that provide compact and well-separated clusters. From the experiments, we have shown that the SP-FCM algorithm produces good results with reference to DB and Dunn indices, especially to the high dimension and large data cases.