A Weight Possibilistic Fuzzy C-Means Clustering Algorithm

School of Computer Engineering, Jiangsu Ocean University, Lianyungang, Jiangsu 222003, China School of Mathematics and Information Engineering, Lianyungang Normal College, Lianyungang, Jiangsu 222003, China College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics Nanjing, Jiangsu 210016, China J. B Speed School of Engineering, University of Louisville, KY 40208, USA


Introduction
Clustering is a method of unsupervised learning and had been applied in various fields, including data mining, pattern recognition, computer vision, and bioinformatics. Cluster methods might be summarized as follows: partition-based [1,2], hierarchical-based [3], density-based [4][5][6], and gridbased [7]. Partition methods included hard partition [8,9] and soft partition [10][11][12]. Soft partition is represented by using fuzzy membership, and membership value lies in interval [0, 1]. Many fuzzy clustering algorithms had been developed and widely used in a variety of areas [13][14][15][16], such as data mining and pattern recognition. Ruspini [17] regarded fuzzy C-means (FCM) as a clustering algorithm, and DUNN [18] analyzed fuzzy exponent m and determined that the value of m was equal to 2. Bezdek generalized fuzzy exponent m > 1. e constraint of FCM ( c i�1 u ij � 1) might cause that membership conflicted with intuitive belonging degree; furthermore, it made the clustering results be sensitive to noise. In order to overcome this defect, Krishnapuram and Keller [19] relaxed the constraint and proposed a new algorithm named possibilistic C-means (PCM) [20] which reduced influence of noise on clustering and had good robustness. However, PCM relied on initialization condition and might produce coincident clusters [21]. Many algorithms were developed to overcome coincident problem. For example, studies [22,23] modified the PCM objective function adding an inverse function of the distances between cluster centers to resolve coincident problem. e study [23] proposed a new model named the fuzzy possibilistic C-means (FPCM) which introduced membership and typicality value t ij subjected to c i�1 t ij � 1 for unlabeled data. FPCM reduced sensitivity to noise in FCM and resolved coincident problem in PCM; however, typicality value became very small as dataset scale increasing for the reason of row sum constraints. e study by Pal [24] proposed new algorithms named possibilistic fuzzy C-means (PFCM), which was a hybridization of PCM and FCM and overcame problems of PCM, FCM, and FPCM. PFCM solves the noise sensitivity. So, PFCM has been widely applied in many fields [25][26][27] and solved some problems well. PFCM added coefficients a and b for membership and possibility, which measured the relative importance in the computation of centroids; however, the values of a and b were simply fix at 1, which meant that membership and possibility had the same importance during the process of computing centroids.
is setting made clustering results become less evident in some clustering. PFCM did not give a scientific and rational method to compute parameters. e main objective of this study was to generalize FCM, PCM, and PFCM algorithms and propose a new algorithm named weight possibilistic fuzzy C-means (WPFCM). We designed a new objective function on basis of PFCM. According to the requirement of minimizing objective function, iterative functions of membership, typicality, and centroid were obtained by constructing Lagrange function and deriving its derivation.
is study is quite different from literature [28]. First, this study focussed on clustering on possibilistic fuzzy C-means, while the study by Schneider [28] aimed to the possibilistic C-means algorithm. e research algorithms were different. Second, the key point was that designing weight parameter was diverse. Algorithm in this study could allocated weight value to samples inlier and outlier automatically according to the calculation method of weight parameters, which made membership value maximizing and outlier reduce influence of estimation. Weight parameter could satisfy the optimization objective function, make it iterate faster, and avoid the coincident problem. e method in the study by Schneider [28] could not have these advantages.
Experiments on different datasets show that new algorithm not only makes clustering results obvious but also partitions overlapping data better and also reduces iterative times and speeds up convergence. e rest of this study is organized as follows: Section 2 reviews the FCM, PCM, FPCM, and PFCM clustering algorithm. Section 3 provides a new method on the computation of parameters and WPFCM was proposed. Section 4 experimentally demonstrates the improvement of performance of WPFCM on some UCI database. Section 5 offers conclusion.

Related Works
Since fuzzy set theory was introduced by Zadeh, this method was applied in the clustering algorithm rapidly. FCM is one of the most famous algorithms and obtains clustering results by minimizing objective function and iterating membership and centroid. e objective function of FCM is designed as follows: where fuzzy exponent m is subjected to m > 1 and Euclidean distance is defined as D 2 ij � ‖x j − v i ‖ 2 . Membership can be obtained by minimizing objective function (1). e following equations are iterative functions of membership and centroid.
(2) e clustering performance is better; however, the algorithm subjects to the following three constraints: . . , n), and 0 < c i�1 u ij < n, (j � 1, 2, . . . , n), which make the algorithm be sensitive to noise and usually lead to center deviation for individual anomalous data points.
e constraints of FCM require data points to consider the relation to other points in current cluster and in other clusters; therefore, membership might conflict with intuitive belonging degree and does not directly reflect real clustering results. e FCM algorithm is sensitive to noise and obtains poor clustering results in noisy data environment. Krishnapuram and Keller [19] improved FCM and proposed the possibilistic C-means algorithm which relaxed constraint. Objective function is designed as follows: where η i is the scaling parameter of the i th class and defined η i � K( n j�1 u q ij D 2 ij / n j�1 u q ij ) (common K � 1), and exponent q subjects to constraint q > 1, and Euclidean distance is Iterative functions of typicality and centroid are obtained by minimizing objective function (3). Equations (4) and (5) are the iterative functions.
In equation (3), t ij is not the membership but possibility, and clustering results are easy to interpret. PCM [29] relaxes the constraint c i�1 u ij � 1, (j � 1, 2, . . . , n) and defines the constraint c i�1 t ij ≤ c, (j � 1, 2, . . . , n), so the rows and columns are independent, and data structure becomes loose. erefore, the algorithm is insensitive to noise and could deal with the dataset including outlier; on the other side, there is another weakness. Experiments show that PCM's clustering results depend on initialization and generate the coincident problem. Pal [30] held that clustering centroids were closed to data centers due to the effect of membership. Pal proposed a new algorithm FPCM on the basis of FCM and PCM. FPCM used data center as the clustering center. It is feasible to a great extent. Membership is a good method when data points need to be marked clearly because it is natural to assign a point to cluster whose prototype is the nearest to the point, while possibility is important to estimate clustering centers, and effectively reduces the influence that was brought by abnormal data point. e objective function was designed as follows: where membership subjects to constraint c i�1 u ij � 1, (j � 1,. . .,n), and the typicality subjects to constraint n i�1 t ij � 1 (i � 1,. . .,c), and other constraints are m > 1, q > 1, and 0 < u ij , t ij < 1, and Euclidean distance was defined Iterative function of membership and typicality and prototype can be obtained by minimizing objective function. e equations (7)-(9) are the iterative functions, respectively.
Although FPCM overcomes weakness of PCM and FCM, the typicality value becomes very small with sample data increase. Typicality value is limited by FPCM. On a large sample dataset, the typicality value is inconsistent with real value due to constraint row sum. Pal [24] improved algorithm FPCM relaxed the typicality constraint row sum, retained membership constraint column sum, and proposed a new algorithm named possibilistic fuzzy C-means (PFCM). e objective function is designed as follows: where parameters subject to constraints m > 1, and parameters a and b are the constants. Iterative function of membership and typicality and prototype can be obtained by minimizing objective function. e equations (11)-(13) are iterative functions. where , and usually, K is a constant (K � 1).

WPFCM Algorithm
is section includes three paragraphs. e motivation of weight parameters was first introduced, and then, the calculation method of weight parameters is presented in the second part, and the last part gives objection function and the steps of algorithm.

Motivation Weight Parameters. PFCM integrates merit
of PCM with FCM, which includes membership and typicality. PFCM reduces the sensitivity to noise in FCM and overcomes coincident problem in PCM and can deal with the problem that typicality value becomes very small with data points increasing in FPCM. After analyzing the parameters a and b, we found that the values of a and b have influence on membership and typicality and then affect the clustering results. If parameter a is greater than b, prototype is affected more by membership than by typicality; on the contrary, if parameter b is higher than a, prototype is affected more by typicality than by membership. erefore, if we want to reduce influence of clustering results caused by outlier, we should select values of a lower than b. How to determine values of parameters is difficult. Usually, values of parameters are all fixed at 1, which means that membership and typicality have the same importance to clustering results. At this time, setting two parameters a and b becomes meaningless. In many situations, we do not know whether values fit to parameters, and then, it depends on experience to determine values of parameters a and b. Assigning values of parameters a and b lacks mathematical basis, so it is occasional and unscientific in PFCM. e clustering results become unstable. ere is another weakness in PFCM that all vector data share the same value of parameters in the cluster process; however, different vectors have various importance for clustering. So it is unreasonable for parameters a and b to be fixed at 1. In order to overcome these weaknesses, we proposed a new method to compute values of weight parameters which replace parameters a and b in PFCM. New parameters consider the importance of each sample data in the process of clustering. New calculation method is more reasonable. e importance of parameter lies in the fact that values of a and b directly affect typicality value t ij and centroid value v i , affect membership indirectly, and then influence on the clustering results.
e study by Fan et al. [32] assigned weights to properties according to the importance of each property to the cluster process. For example, in dataset IRIS [34], the third property and the fourth property are beneficial to get obvious clustering results, so they are assigned a high weight value and others are assigned a low weight value.
e premise is that we must know which property is important and unimportant. To an unknown dataset, this method is inappropriate and cannot be applied. e study by Nock and Nielsen [33] estimated all samples' probability density by using the analogy method. is method needs a great deal of computation. e study by Hung [29] gave a prototype-driven learning of parameter α which is based on exponential separation strength between clusters and updated each iteration to improve the performance of FCM. Equation (14) is the definition of parameter of α.
where parameter β is defined as the distance from data point x j to sample mean. Parameter β is defined as follows: and x can be defined as sample mean: x � n j�1 x j n .
Definition 1. A given sample set to be classified is denoted by fuzzy subsets, and c is the number of clustering.

Definition 2.
According to the importance of the data point x j (x j ∈ X) during the clustering process, weight parameter can be defined as c ij , which is the weight of x j w. r. t. class i. e following equation is the calculation method.
Theorem 1. Distance from x j to center v i can be regarded as weight; if the distance is long, then the value of weight will be high; on the contrary, if the distance is short, then the value of weight will be low.
Proof:. x is the mean value of sample, and the difference between x j and x is reflected by the distance from data point x j to x, which is constant. e smaller the value of D 2 ij , the shorter the distance from x j to class i. We can deduce that the larger the value of − ‖x j − v i ‖ 2 , the smaller the value of will be small; on the contrary, the long the distance from x j to class i, the larger the value of exp(− ‖x j − v i ‖ 2 × n/ n j�1 ‖x j − x‖ 2 × c). Optimization of objective function requires a minimum value. Weight parameter should satisfy the optimization objective; so long distance should get large c ij . It is appropriate to use c ij as the weight parameter. According to the rule of classification, there is little difference among all data in the same class and great difference in different classes. During the process of designing objective function, in order to assign data point, the nearest distance from data point to center should be selected, which is denoted by the maximizing membership value. e typicality value can be used to reduce influence of estimation caused by outlier. New objective function should meet two requirements: on the one hand, role of membership in objective function should be increased when the sample is inlier; on the other hand, the role of the typicality value should be increased in objective function when the sample is outlier. erefore, the objective function is designed as in the following equation, which include two parts: the first part is fuzzy function denoted by fuzzy weight parameter and the second part is typicality function denoted the by typicality weight parameter.

Definition 3.
e new objective function is designed as following which is based on FCM, PCM, and PFCM.
where c ij (0 < c ij < 1) denotes the weight between data point x j and class i, which comes from equation (17). Different data points have various weight values, and then, clustering results are more reasonable by using different weight parameters and avoid the coincident problem. U, T, and V denote membership matrix (c × n), typicality matrix (c × n), and centroid matrix (c × 1), respectively. Here, u ij (0 < u ij < 1) is the membership of feature point x j in cluster c i and t ij (0 < t ij < 1) is the typicality of x j in cluster c i . D 2 ij � |x j − v i | 2 is the Euclidean distance between data point x j and v i . e parameters m (m > 1) and q (q > 1) are the fuzzy exponents. e parameter η i (η i > 0) is a constant, which is defined by where K usually is fixed at 1.
According to the analysis of the preceding context, we know that the nearer the distance between data point x j and cluster c i , the smaller the value of weight parameter c ij . e distance from data point x j to cluster c i is near, which shows that data point x j belongs to the i th cluster. e weight parameter value of membership should be increased and be set as (1− c ij ). On the contrary, the further the distance between x j and the i th cluster, the greater the difference between x j and the i th cluster, and x j may be an anomalous point. Typicality weight parameter should be increased to reduce the effect of x j on clustering. e typicality weight parameter is set as c ij . With increase (decrease) of weight parameter of membership, typicality weight parameter will be decreased (increased). e weight parameter is calculated on the basis of different sample date points, which overcomes the unreasonable value of a and b in PFCM and resolves coincident problem which is caused by small value of a and poor initialization centroid.
According to Definition 3, the Lagrangian multiplier method was used to construct the Lagrange equation. In order to minimize equation (18), the partial derivatives of u ij and t ij were computed according to constraints c i�1 u ij � 1 and n j�1 t ij � 1 and acquired the membership u ij and typicality t ij and centroid v i as follows:

WPFCM Algorithm.
According to the objective function, steps of algorithms were provided as following:

Experiments
In order to validate the algorithm efficiency, some experiments on different datasets were carried out. e initial value of parameters is set as follows: ε � 0.000001, the maximum iterative times max_iter � 100, constant K � 1, the number of class Cluster_n � 2 for dataset X 12 , and the number of class Cluster_n � 3 for dataset IRIS [34].
X 12 is a two-dimensional dataset with 12 data points. e coordinates of X 12 are given in Table 1. Figure 1 shows the coordinate distribution of dataset X 12 . ere are ten points forming two clusters with five points each on the left and the right sides of the axis y. Data points x 6 and x 12 are considered as noise, and each has the same distance to two clusters. Table 2 presents centroids which are generated by running FCM, PFCM, and WPFCM on X 12 . Suppose distance equation Dist X � ||V X12 − V X || 2 , which denotes the distance from real centroid to centroid V X generated by algorithms. e following are the centroids:  0.00 0.00 x 7 1.67 0.00 x 8 3.34 1.67 x 9 3.34 0.00 x 10 3.34 − 1.67 x 11 5.00 0.00 x 12 0.00 10.00 Scientific Programming Table 3 provides the minimum iteration times of FCM, PFCM, and WPFCM with optimal given parameters. Iteration times of WPFCM are slight less than FCM and far less than PFCM. erefore, WPFCM has less running time in large datasets and has a high speed of convergence. Table 4 presents the membership value by running FCM, PFCM, and WPFCM. By comparison, membership values of WPFCM are better than the other two algorithms; especially for data points x 3 and x 9 , the membership values are equal to one. Data points x 3 and x 9 are the center of two clusters, which show that WPFCM is easier to recognize the cluster center. Membership value cannot tell noisy data point x 6 and x 12 , but noisy data are identified by using the typicality value in Table 5. By analyzing data in Table 5, typicality values of WPFCM are greater than PFCM. If one data point has larger typicality value, the data point is more likely to belong to the cluster. One of typicality values of x 3 and x 9 are up to 1 in Table 5, which show data point, respectively, belongs to two clusters with large possibility.
Membership values of noisy data x 6 and x 12 are equal to 0.5. Figure 1 shows that distance from x 6 to two clustering centers is far less than x 12 , but Table 4 cannot show this difference. Table 5 shows that typicality values of ten data points are greater than 0.9 except for x 6 and x 12 . Typicality values of x 6 and x 12 are far less than others, so we consider the data points x 6 and x 12 are noise. We also find the typicality value of x 12 is far less than x 6 in Table 5, which shows noisy data x 12 belong to two clusters with less possibility than x 6 and which reflects distribution of x 6 and x 12 in Figure 1. WPFCM improves the defect of FCM. From Table 5, we also find the typicality value of WPFCM is better than PFCM, so WPFCM can get more obvious clustering results. Table 6 presents centroids and iteration times by running WPFCM on dataset X 12 with different parameters. Clustering results of WPFCM are better than FCM and PFCM as a whole. Iteration times of WPFCM are a bit less than FCM and PFCM. When the value of m keeps unchanged and value of q varied from 2 to 5, there is an increasing tendency of membership values, but not obvious; however, typicality values have an evident decreasing tendency. Clustering centers also have great changes, and the values are increasing and nearer to real centroid. Iteration times are decreasing. With increase of q, the influence of weight parameter c ij on clustering results is increasing. Weight parameter c ij is generated in iterative procedure, and initial centroid is generated randomly, so WPFCM overcomes the defect of random selection a and b and improves the vulnerability of uncertain clustering results. Clustering results in Table 7 are better than in Table 6; however, we find that membership values greatly reduce but iteration times increase. Comprehensive consideration suggests that the value of m and q are 1.5 and 5, respectively.       IRIS is a four-dimensional dataset including three classes: setosa, versilcolor, and virginica. Each cluster has 50 data points, adding up to 150 data points. e first cluster setosa has good separation from the other two clusters without overlapping.
ere are some overlaps between versilcolor and virginica.
Data in Tables 8 and 9 were acquired by running FCM, PCM, PFCM, and WPFCM many times on IRIS. Each algorithm got good clustering centroid. Compared with other algorithms, WPFCM acquired more obvious membership and typicality values and better separation. e two centroids of versilcolor and virginica got by running the PCM algorithm almost overlap. It is difficult to find separation between clusters in Tables 8 and 9. In order to compare separation between different classes, we defined the distance between classes as Dist ij � ||V i − V j || 2 , which denotes the distance from i th cluster to the j th cluster. Table 10 provides distance values between different centroids generated by FCM, PCM, PFCM, and WPFCM with IRIS. Dist 12 and Dist 13 that are calculated by using Dist ij � ||V i − V j || 2 in WPFCM, FCM, and PFCM reflect the fact that setosa separates from the other two classes versilcolor and virginica. However, in PCM, Dist 12 and Dist 13 have almost identical results and Dist 23 is nearly zero, so the results do not reflect the features of dataset which is caused by coincidence of PCM. Although FCM, PFCM, and WPFCM all reflect separation of setosa from other two class and overlapping between versilcolor and virginica, by comparing Dist 23 , we find that Dist 23 in WPFCM is the nearest to the real value. We conclude that WPFCM reflects the characteristic of dataset better than other algorithms and easily get good partition especially for class versilcolor and class virginica. Table 11 provides the distance values between different centroids and real centroid generated by FCM, PCM, PFCM,      (24) is defined as the sum of distance between centroid acquired from each algorithm and real centroid. Dist xi represents distance from the i th cluster center to real centroid. Each Dist X i in WPFCM is less than in other algorithms. Compared value of Dist X , there is relation Dist WPFCM < Dist PFCM < Dist FCM < Dist PCM in Table 11, which shows that there is little difference between centroid of WPFCM and real centroid. Iterative times generated by FCM, PCM, PFCM, and WPFCM with IRIS are given in Table 12. Iterative times of WPFCM are slightly larger than FCM, but far less than PCM and PFCM. e WPFCM algorithm acquires clustering center quickly and has fast convergence speed. e number of resubstitution errors from FCM, PCM, PFCM, and WPFCM on dataset IRIS is given in Table 13. Resubstitution errors of WPFCM are slightly less than FCM and PFCM, but far less than PCM, no matter with regard to membership value or typicality value. Table 13 includes two relations: U eWPFCM < U ePFCM < U eFCM < U ePCM and T eWPFCM < T ePFCM < T eFCM < T ePCM . Resubstitution errors of membership and typicality are up to 50 in PCM, which are far greater than other algorithms. e reason is that the PCM algorithm has clustering consistency issues, and there are overlapping data in versilcolor and virginica.

Conclusions
A new possibilistic fuzzy C-mean based on weight parameters was proposed according to the importance of membership and typicality in the clustering process. First, aiming at unreasonable parameters a and b, we designed weight parameter c ij based on literature [23] and provided the concrete calculation method. Weight parameters (1− c ij ) and c ij were assigned to membership and typicality. Objective function (equation (18)) was improved, and then, new algorithm idea (Algorithm 1) was provided. Experiment on different datasets show that the new algorithm has good performance in dealing with noisy data and gets better clustering results. WPFCM resolves the coincident problem and overcomes defect of sensitivity to noisy data. New algorithm discusses the influence on membership values, typicality values, and centroids with different values of exponent parameters m and q. Exponent parameters are determined by comprehensively comparing membership   (1) Initializing parameters m (m > 1), q (q > 1), and ε, c (0 < c < 1), setting the maximum cycle number max_iter, setting the initial value of cycle number as 1, and randomly generating centroid V 0 . (2) Computing distance according to D 2 ij � |x j − v i | 2 (3) Computing the weight parameter c ij and (1− c ij ) by using equation (18) (4) Computing membership value u ij and typicality value t ij by using equations (19) and (20) (5) Computing the objective function obj_fcn (6) If |obj_fcn (i)-obj_fcn (i− 1) |<ε or iterative times are less than max_iter, then stop Else obj_fcn (i) ⟶ obj_fcn (i− 1) (7) Computing centroid v i by using equation (21) and going to step 2 ALGORITHM 1: Weight possibilistic fuzzy c-means clustering algorithm. 8 Scientific Programming values, typicality values, and centroids. Experiments compare iterative times in different algorithms. WPFCM has less iterative times and fast convergence speed. Resubstitution errors of WPFCM are near to FCM and PFCM, but far less than PCM. Comprehensively, many performance indexes suggest that WPFCM overcomes weakness of sensitivity of FCM and resolves the coincident problem of PCM and unreasonable weight parameters of PFCM. e next work is to extend the new algorithm to the nonpoint prototype clustering model such as the spherical prototype, the quadric prototype, and the shell prototype.

Data Availability
e data used to support the findings of this study are available at http://archive.ics.uci.edu/ml/datasets/iris.

Conflicts of Interest
e authors declare that they have no conflicts of interest.