Weighted Subspace Fuzzy Clustering with Adaptive Projection

,


Introduction
Clustering, an unsupervised learning technique, partitions data into distinct groups or clusters based on predefned principles, where the maximal similarity between samples within a cluster and the minimal similarity between samples coming from diferent clusters are expected.Clustering has been widely studied in numerous domains, such as image processing [1,2], data mining [3,4], and recommendation systems [5].Several clustering models have been introduced in the past few decades, such as k-means clustering and its variations [6,7], hierarchical clustering [8,9], graph-based clustering models [10,11], and spectral clustering (SC) [12].Despite the mature development of clustering, how to improve clustering performance in high-dimensional scenarios is still a challenge.
When handling high-dimensional situations, the "curse of dimensionality" limitation arises, and the efciency and performance of prototype-based clustering techniques, typically k-means and fuzzy c-means (FCM) [13], usually decrease drastically with the increase of feature dimensionality [14].In practical applications, raw data often possess a large number of dimensions and may also contain superfuous or noisy features that are useless or negatively benefcial for separating classes [15].Consequently, detecting intrinsic structures directly from raw data may not provide reliable outcomes.
Subspace clustering has an intuitive interpretation and good performance and has been extensively explored [16][17][18].Spectral clustering [12], a popular graph-based subspace clustering method, has been shown to be particularly efective.Spectral clustering transforms the clustering task into a graph segmentation problem.Generally, the spectral clustering method comprises two stages: it frst leverages the eigenvalue decomposition of the Laplacian matrix of the graph to generate spectral embeddings of data and then groups the low-dimensional embeddings into diferent clusters by using k-means technology.In spectral clustering, similarity graph construction plays a paramount role in achieving favorable performance.Representative methods to construct a similarity graph include using a Gaussian function directly, sparse subspace clustering (SSC) [16], and low-rank representations (LRR) [19].Besides, Nie et al. [20] proposed a construction method based on adaptive neighbors (CAN and PCAN).Zhou et al. [21] adopted the notion of typicality rather than the probability of learning a similarity graph.When performing these methods, the importance diversity of diferent features is not considered which may make the obtained graph biased.
Additionally, manifold learning-based clustering for high-dimensional data has gained signifcant attention.Manifold learning algorithms can handle high-dimensional data better by capturing the low-dimensional manifold structure of the data.Isometric mapping (Isomap) [22], local linear embedding (LLE) [23], locality preserving projection (LPP) [24], and neighborhood preserving embedding (NPE) [25] belong to such a branch.However, the two-stage process in manifold learning-based clustering models unavoidably results in a signifcant issue: the results generated at the end of the frst stage (manifold learning) may be not necessarily the optimal input for the downstream clustering task, which amplifes the learning error.
As a classic clustering method, k-means is widely used in the postprocessing steps of spectral clustering and manifold learning.However, the efectiveness of k-means usually decreases when it comes to overlapping regions of the data.Te fuzzy version of k-means, i.e., fuzzy c-means (FCM), uses a membership matrix to evaluate the membership of each sample to each cluster, which can better capture overlapping partitions.However, FCM is sensitive to noisy data and outliers, and FCM usually uses Euclidean distances, which makes FCM unsuitable for processing highdimensional data because it may be contaminated by redundant features.
Recently, fuzzy clustering has also been widely studied in deep learning areas.Feng et al. [26] exploited AE-based reconstruction of original data, the consideration of intercluster separation, and pseudo-label-based afnity regularization to fnd suitable new representation of data.Te generalization ability of the fuzzy clustering model is then improved in complex high-dimensional situations.Gong et al. [27] extended curriculum learning from sample-wise learning to cluster number-wise learning and proposed an end-to-end deep fuzzy clustering incorporated with curriculum learning.Cheng et al. [28] proposed a novel degradation-agnostic multitask image restoration framework, which performs fuzzy clustering to achieve degradation type-agnostic background extraction.Although deep fuzzy clustering has made certain progress, it is not suitable for high-dimensional datasets with a small number of samples, and the generated features in latent space lack interpretability.
In traditional clustering models, all features are treated as having equal importance, which does not accurately refect the signifcance of diferent features.In practical settings, features are formed in terms of the human understanding of the clustering tasks and data properties.For example, in gene expression data of cancer patients, the dimensionality of data is relatively high, and redundant or noisy features have a substantial impact on the clustering process.Hence, it is benefcial to consider the importance diversity of features for clustering tasks.Some research studies related to weighted features have been reported.Keller and Klawonn [29] proposed a weighting fuzzy cmeans clustering (WFCM).Huang et al. [30] introduced the feature-weighting technique to k-means type clustering.Based on the notion of adaptive neighbors, Nie et al. [31] presented a self-weighted clustering (SWCAN), and Li et al. [32] further introduced a projection SWCAN (WPANFS) model as a new unsupervised feature selection method.Lu et al. [33] put forward a robust weighted clustering method with discrimination properties.Bodyanskiy et al. [34] proposed a method of fuzzy clustering for high-dimensional data using an information weighting strategy.Te sensitive problem of "norm concentration" is solved through weighted parameters, making the model more suitable for high-dimensional data processing.Pimentel et al. [35] proposed two types of weights to prevent the cluster membership ambiguity problem.Chen et al. [36] proposed a sparse FCM clustering with principal component analysis embedding to identify noise or outliers in the target by adding additional weighting factors to each data point.Generally, lower weights are assigned to features that contribute little or have negative impacts on objective tasks, such as noisy features, while higher weights are assigned to features that are critical to the performance.In practical applications, it is unrealistic to directly assign diferent weights to features based on human experience.Developing a data-driven assignment mechanism of weights for diferent features is a potential solution.
Tis paper proposes a weighted subspace fuzzy clustering (WSFC) with a locality preservation model that can quantify the importance of diferent features, project data onto a lower-dimensional space, and perform fuzzy clustering simultaneously.Since the importance of each feature can be well quantifed and redundant features can be revealed, WSFC exhibits the sparsity and robustness of fuzzy clustering.Te intrinsic local structures of data can also be well preserved while enhancing the interpretability of clustering tasks.
Te contributions of this article are summarized as follows: (1) A weighted subspace fuzzy clustering (WSFC) model is proposed which can automatically quantify the importance of diferent features.Te negative infuence of noisy or harmful features can be suppressed, and the interpretability of the captured features and the robustness of the model can be improved.
(2) Te ideas of weighted features, geometrical structure preservation, and fuzzy clustering are seamlessly integrated into a unifed model.Te remainder of this article is arranged as follows.Some notations and related works are summarized in Section 2. In Section 3, the motivation of WSFC is frst introduced.Ten the details and solutions of WSFC are presented.Finally, the convergence and complexity of the proposed model are analyzed.Extensive experiments are conducted in Section 4. Section 5 draws the conclusions.

Preliminaries
2.1.Notations.Given a data matrix A, a j denotes its j th column, and a ij stands for the (i, j) th element.Te transpose, trace, and rank of A are denoted by A T , tr(A), and rank(A), respectively.Te other basic notations used in this article are listed in Table 1.

Fuzzy C-Means (FCM)
. Fuzzy c-means (FCM) [13] is diferent from the k-means algorithm.FCM introduces the degree of membership of data points to diferent classes, not just binary cluster assignments.Te objective function is shown in the following formula: where U ∈ R N×C is the fuzzy membership degree matrix, which indicates the membership degree of each data point in each cluster.ψ(ψ > 1) is a fuzzifer coefcient.Te larger ψ is, the more fuzzy the clustering result is, and ψ usually takes 2. d ik represents the distance between each data point and each cluster center.Diferent distance metrics can be selected according to diferent clustering tasks, among which the Euclidean distance is more commonly used, i.e., Te minimization of J FCM (U, V) can be solved by an iterative solution method to alternately update the membership matrix U and the prototype matrix V. Te updating formulas are as follows: If d ik � 0, then set u ik � 1 and u il � 0 for ∀l ≠ k.

Weighting Fuzzy Clustering (WFCM).
Weighting fuzzy clustering (WFCM) [29] introduced a new distance measure that contains the idea of a weighting strategy.Te distance is defned as follows: i and v (s) k stand for the sth feature of x i and v k , respectively.Te number of original dimensionality is denoted by M. α ξ ks is an impact factor determining the infuence of the sth feature with respect to the kth cluster.ξ is an exponential coefcient that emphasizes the strongness of each feature for each cluster.Te objective function of WFCM is shown in the following formula: Unlike most algorithms that treat all variables equally, WFCM introduces the idea of weighting strategies.Te weights of features can measure the infuence of features in the clustering process.Smaller weights can suppress or reduce the infuence of unimportant features.By involving this strategy, the importance of each feature in each cluster can be exactly detected.[24] is a linear dimensionality reduction method based on local information, which maps data points onto a low-dimensional space by maintaining their local domain structure in the original high-dimensional space.LPP can be considered as a linear approximation to the nonlinear Laplacian Eigenmap [37].

Locality Preserving Projection (LPP). Locality preserving projection (LPP)
Te local information of data can be described by a similarity graph G � (X, W), where X is the set of vertices and the afnity matrix W measures the weights of edges.Te objective function of LPP is expressed in the following form: where P ∈ R M×r is the projection matrix that maps the original samples onto a low-dimensional space.Minimizing (5) means that if the projected versions of x i and x j , i.e., P T x i and P T x j , are far away, w ij in the objective function will impose severe penalties.Tis ensures that if x i is close to x j , P T x i and P T x j are also expected to be located closely, thereby preserving the local structure of the data during the projection procedure.

Weighted Subspace Fuzzy Clustering with Adaptive Projection
3.1.Motivation.Most of the traditional fuzzy clustering models conduct clustering in the initially formed feature space or perform clustering in the subspace after extracting features from the original feature space.However, they are based on the premise that the contribution of diferent features is equal, and rarely consider the diferent contributions of features when solving clustering tasks.When the dimensionality increases, the negative efects caused by amount of irrelevant features may be produced.
To solve these problems, feature-weighting techniques are the potential solutions.Inspired by this idea, we consider weighting the features of the initial data samples to suppress the impact caused by the noisy and useless features on the clustering tasks.Due to the high dimensionality, the clustering process is also expected to be performed in a lowerdimensional subspace.In this way, a two-stage framework is involved, namely, mapping the raw data onto a new space with low dimensionality and then completing the downstream clustering tasks.Te two-stage model has the problem that the results of the frst stage may be not the optimal initialization for the subsequent stage, so we hope to integrate feature weighting, dimensionality reduction, and clustering into a unifed end-to-end model.
According to the above considerations, a generalized weighted subspace fuzzy clustering framework is given as follows: where ζ denotes a suitable convex function imposed on the membership degree matrix U. Θ ∈ R M×M is a weighted diagonal matrix, Θ ii is the weight of the ith feature, and Te noisy features are expected to have small weights such that they can be suppressed when optimizing the model.Θx i can be considered as a weighted version of the sample x i .g is a transformation operator mapping Θx i onto a lower-dimensional space.g(Θx i ) can be considered as a new representation of the weighted sample Θx i in the transformed space. L(•) and L(•) are loss functions.Te frst term is the clustering loss derived from the weighted samples in the transformed subspace rather than the raw data in the initially formed space.L(g(Θx i )) is the representation loss when conducting space transformation.
R(U) and R(g) are the regularization terms.μ is a balance parameter, and c and η are regularization factors.
3.2.Formulation.In (6), diferent loss functions and regularization terms can be exploited.By integrating the LPP technique, the proposed optimization function is realized as follows: where P ∈ R M×r denotes a projection matrix.Data samples with weighted features are expected to map onto a subspace to further reduce the impact of noisy features.
gets the diagonal elements of a matrix or returns a square diagonal matrix with the diagonal elements equalling the given vector.Four parts are involved in the objective function (7).Te frst part in (7) measures the fuzzy clustering error in the lower-dimensional subspace.Diferent from the classical FCM, the features of data are weighted frst, so that the importance of features to clustering can be distinguished and irrelevant features can be suppressed.Ten, the featureweighted data samples are projected onto a lowerdimensional subspace.Jointing with the action of the third part, the infuence of noisy data can be further suppressed while better preserving the local geometrical structure of the data.
In the second part, c controls the sparseness of U which is like in robust and sparse fuzzy clustering (RSFCM) [38].When the value of c tends to be larger, U contains more nonzero elements.If c is large enough, all elements of U are 4 International Journal of Intelligent Systems nonzero and U is nonsparse.In contrast, the sparsity of U can be manifested.Te third part introduces the locality-preserving projection, which frst weights the data features, and then projects the samples onto a lower-dimensional subspace while maintaining the local structures of data.Te optimal weight matrix Θ is learned not only based on the clustering objective loss but also on adhering to the local neighborhood structure maintenance.μ is a balance parameter adjusting the infuence caused by preserving the neighborhood structures among data.Te third part will have more contributions in (7) if μ has a large value.Otherwise, the frst part will be decisive.
Te fourth part is to avoid obtaining a trivial solution under P T Θ � 0. When the objective function attends to small, the value of the fourth term will become large to avoid getting a value of 0 and making the objective function meaningless.η > 0 is a regularization parameter.
Obviously, if Θ ii � 1/M, i.e., each feature has the same weight, WSFC becomes projected fuzzy c-means clustering (LPFCM) which was reported in [39].Furthermore, if P � I, WSFC becomes RSFCM [38].Consequently, WSFC not only inherits the merits of LPFCM and RSFCM but also quantifes the diferences in importance of features.In this way, the solution space of the clustering model is further extended, and the more optimal solutions can be achieved.

Solution and Algorithm.
Since the formula ( 7) is convex with respect to U, P, Θ, and V, respectively, an iterative optimization strategy can be engaged.

3.3.1.
Solving V When Fixing U, P, and Θ.In this case, the problem ( 7) is simplifed as follows: Te derivative of (8) with respect to v k is obtained as follows: Setting ( 9) to zero, it produces (10) is the unique solution.However, if rank(ΘPP T Θ) < M, (10) is one of the solutions when letting (9) equal to zero.

3.3.2.
Solving U When Fixing V, P, and Θ. Te problem (7) can be rewritten as follows: Minimizing ( 11) can be converted to N subtasks.For each x i , the corresponding subtask is simplifed as follows: where stands for the distance between the weighted x i and the weighted v k in the lower-dimensional subspace.Equation ( 12) can be further written as follows: Te problem ( 13) is a quadratic optimization task under simplex constraints which can be solved by using the available method presented in [40].U, V, and Θ.In this case, the problem (7) can be converted as follows:

Solving P When Fixing
whereD � Subsequently, formula ( 14) can be rewritten as follows: min where Obviously, E ∈ R M×M is real and symmetric.Tus, the eigenvalue decomposition method can be utilized on E to solve P ∈ R M×r .P is formed with the eigenvectors whose related eigenvalues are r nonzero minimum.

3.3.4.
Solving Θ, When Fixing U, P, and V.In this case, the problem (7) becomes First, a theorem is given as follows: 6 International Journal of Intelligent Systems Theorem 1 (see [32]).Assuming that matrices A, B, Θ ∈ R M×M , and Θ � diag(θ), we have tr(ΘAΘB) � θ T (A °BT )θ, where " °" denotes a Hadamard product and θ is a vector which entries are corresponding to the diagonal elements in Θ.
According to (15) and Teorem 1, the frst part in (18) can be rewritten as follows: Te second part of the problem (18) becomes Regarding the third part, we have Substituting ( 19), (20), and ( 21  Problem ( 23) is also a quadratic programming problem with simplex constraints.Te solution of ( 23) can be obtained with the techniques reported in [40].
According to the solutions of U, P, V, and Θ, the proposed method is described in Algorithm 1.

Convergence and Complexity Analysis.
Since U, P, V, and Θ are optimized alternatively, their optimal solutions can be guaranteed in each iteration step.From the steps 3-6 in Algorithm 1, it has J WSFC U (t) , V (t) , P (t) , Θ (t) It indicates that the convergence of the proposed method WSFC can be guaranteed.
Te computational complexity of Algorithm 1 mainly focuses on the iterations of U, P, V, and Θ. Suppose the iteration steps of Algorithm 1 is T, the complexity of updating U is O(TsNCM), where s is the iterations in the involved Newton method.Te time for computing V is O(TNCM).Since the eigenvalue decomposition technique is used, the time for forming P is O(TM 3 ).When solving Θ, the time for computing H is O(TNM 2 ).Due to H ∈ R M×M International Journal of Intelligent Systems and the time complexity for solving Θ is O(TM 3 ), thus the time for updating Θ is summarized as O(T(NM 2 + M 3 )).Generally, since C ≪ N and s ≪ N, the total time complexity of WSFC is simplifed as O(T(NM 2 + M 3 )).

Experiments and Discussion
4.1.Datasets and Methods for Comparison.Several datasets are exploited for experiments, where breast and CNS datasets are from [41] (available online at https://csse.szu.edu.cn/staf/zhuzx/Datasets.html), and other benchmark datasets come from UCI repository (available online at https://archive-beta.ics.uci.edu/).Te main information of these datasets is listed in Table 2.
Two commonly used indices are utilized to evaluate the efectiveness of the selected methods, including classifcation accuracy (ACC) and normalized mutual information (NMI) [48].Te best results over ten repeated execution times are recorded to illustrate the performance of this method.Te higher the ACC and NMI values, the better the clustering performance.

Experimental Settings.
A lower-dimensional space can be produced in LPFCM, SOGFS, SSC, LPP, PCA, and WSFC.For the selected gene datasets, to speed up executions and avoid the singularity problem when optimizing the projection matrix, the higher-dimensional dataset is projected onto a lower-dimensional space comprising 100 features through PCA.Subsequently, the dimensionality of the projected subspace varies from 10 to 100 with a step size of 10 whereas the projected dimensions for all selected UCI datasets tune from 2 to M (the number of features in the original space).Ten, the best ACC value and its corresponding NMI value are recorded as the fnal results.In the proposed method WSFC, there are three parameters c, μ, and η.When the parameter c is assigned with large values, U will have more nonzero elements.In contrast, U will tend to be sparse.Te parameter μ balances the contribution induced by the local structure preservation properties in (7).If μ is fxed with large values, the geometric structure preservation will play a more important role in (7).η is a regularization parameter.When conducting the experiments, the grid search strategy is used to determine the values of c, μ, and η, which is often used in the available research studies [38,39,44].Te tuning range of c is 10 − 3 , 10 − 2 , 10 − 1 , 10 0 , 10 1 , 10 2  , the range of μ is 10 − 2 , 10 − 1 , 10 0 , 10 1 , 10 2  , and the range of η is 10 − 2 , 10 0 , 10 2 , 10 4  .Te parameters involved in other comparison methods are determined or tunned with the values reported in their original articles.3 and 4 list the ACC and NMI indices obtained by each method on each dataset, respectively.Te best results among all methods on each dataset are in bold.Te second best results are highlighted Te projected spaces obtained by LPP, PCA, LPFCM, and WSFC on the Wine dataset are visualized in Figure 2 by using the t-SNE method.From Figure 2, we can fnd that too many points with diferent colors, i.e., coming from diferent clusters, are blended together in LPP, PCA, and LPFCM.Due to the negative impacts caused by the redundant features, the results visualized based on the original data directly are also not desired, refer to Figure 2(a).Te visualization results of WSFC can well form three clusters, in  International Journal of Intelligent Systems          International Journal of Intelligent Systems which the local structure of the data can be well detected and samples coming from diferent clusters can be separated well.Tis illustrates the validity of the projection strategy involved in WSFC.

Parameter Sensitivity Analysis.
To comprehensively compare the performance of the methods involving projection strategies, the ACC variation curves with respect to diferent values r on some datasets are displayed in Figure 3.It can be found that WSFC generally outperforms other clustering methods under most values of r.
Te balance parameter μ and the regularization factors cand η are involved in WSFC.To analyze the infuence caused by these parameters, Figure 4 presents the performance of WSFC with respect to diferent parameter value sets.In Figure 4, the color on each point denotes the corresponding ACC or NMI value.We can observe that larger values of μ and smaller values of c yield better performance, and the efect of η is not signifcant.A larger value of μ emphasizes the efectiveness of the third part in (8), i.e., preserving the local geometrical structure between data, and a smaller value of c ensures the sparsity of the membership degree matrix U.

Ablation Experiments.
To further verify the efectiveness of each term in the proposed model, we conduct an ablation study.Te accuracy (ACC) results are shown in Table 5.For the variant without the third term which is labeled as "notLPP," we remove the efect of the third term by setting μ � 0 in the original model.In the variant without weighting strategy which is labeled as "not Θ," the feature-weighting matrix Θ is defned as the identity matrix Θ � I, ensuring uniformity of feature importance and counteracting the efects of adaptive weighting.From Table 5, WSFC outperforms both ablation confgurations, with the average ACC approximately 4% higher than notLPP, and exceeds the model not Θ by 6%.Tese results highlight the critical role of the third term and featureweighting strategy in the proposed model.Te third term cleverly captures the local nuances of the dataset.Also, feature weighting determines the importance of features when conducting space transformation and clustering, thereby minimizing the negative impact of redundant features.
4.6.Te Learned Weights.In addition, the obtained weights on twelve datasets are visualized in Figure 5. Obviously, the learned weight distribution on each dataset is sparse; especially, the weights of some features approach to zero which means that these features are redundant or even harmful.It is known that there is a large divergence between the value ranges of features on the Wine dataset.From Figure 5(l), the ffth and thirteenth features are assigned with very small weights which indicate that these two features have no efect on the clustering process, while the fourth and tenth features have less contributions.By introducing a weighting strategy, WSFC performs far better than the other fuzzy clustering methods on Wine (refer to Table 3).Similarly, only six features are assigned with relatively large weights on the Ionosphere dataset (refer to Figure 5(h)), and only one feature is useful on Balance and Wholesale datasets (refer to Figures 5(a) and 5(f ), respectively).In this way, the contribution of important features can be enhanced precisely, and then, their infuence on the clustering processes can be maintained signifcantly.
In order to further verify the efectiveness of the featureweighting strategy, we use t-SNE to visualize the original space (X), projection subspace (P T X), and feature-weighted projection subspace (P T ΘX) of some datasets as shown in Figure 6.It can be seen that there is a large amount of overlap between data points belonging to diferent clusters in the original data space, and there is no obvious boundary between clusters.Te visualized data distribution in the projection subspace is better than the one in the original space, but it is still not ideal.Te data visualization in the feature-weighted projection subspace is desired, and different clusters are clearly separated into diferent groups.Te main reason is the feature-weighting strategy that can pay more attention to the discriminative features, thereby reducing the impact of redundant and harmful features for clustering and then capturing the intrinsic structure of the data.

Convergence Analysis.
Te convergence curves of WSFC are presented in Figure 7.It can be found that WSFC converges fast, and the number of iterations is almost less than 10.Although the variables involved in WSFC are optimized alternatively, the convergence of WSFC can be guaranteed by the block coordinate descending (BCD) technique.

Statistical Analysis.
We employ the Friedman-Nemenyi testing framework [49] to ascertain the statistical signifcance across various methods on the evaluated datasets.Following the results presented in Table 3, the average rank of each method over all datasets is displayed in Figure 8. Te smaller the values of average ranks, the better the performance of the corresponding methods.
Based on the Friedman test, the statistic τ F is formally defned as follows: where τ F follows an F-distribution with degrees of freedom b − 1 and (b − 1)(a − 1), where a stands for the number of datasets, b represents the number of compared methods, and Ar 2 i corresponds to the average rank of the i th method over all datasets.Te ACC validity index is used here for clarifcation.Te NMI validity index can be analogously International Journal of Intelligent Systems analyzed, and the same results can be drawn.Te computed value of the statistic τ F � 8.2487.Given a confdence level of 0.95, the critical value is F 0.05 (12, 12 × 11) � 1.8262 which is less than τ F � 8.2487.Terefore, the null hypothesis is rejected.It indicates that the diferences in performance among the compared methods are statistically signifcant.
Te Nemenyi post hoc test is then employed to ascertain the presence of statistically signifcant disparities between pairwise methods.Te computation for the critical diference is detailed subsequently: For a given signifcance level of α � 0.05 and b � 13, therefore, q α � 3.3128.Te critical diference is CD � 5.2670, and the Friedman test diagram is shown in Figure 9.When the two lines are nonoverlapping, it means a signifcant diference in the performance of the corresponding two methods.International Journal of Intelligent Systems As shown in Figure 9, it can be found that WSFC exhibits superior performance in terms of ACC results compared to most methods, with the exception of RSFCM, SOGFS, RGC, and LPFCM.Especially, WSFC archives signifcantly better performance when comparing with LPP and WFCM.It indicates that the proposed unifed model can alleviate the error accumulation in a two-stage clustering framework and the weighting strategy can efectively quantify the importance of diferent features which is benefcial for the learning model.

Conclusions
In this study, a weighted subspace fuzzy clustering (WSFC) method is introduced in which the importance of diferent features, the optimal subspace, and clustering are achieved simultaneously.WSFC not only inherits the merits of FCM and some of its variations but also extends the model solution space which is benefcial for attending the optimal solution.By integrating the locality preservation projection technique, the local geometrical structures of data can be well maintained when transforming raw data to a projected lower-dimensional subspace.Since the importance of each feature is well quantifed, the interpretability of feature selection and projection mechanism can be improved.Extensive experiments show that some features are assigned with relatively small weights which indicate that these features can be suppressed to eradicate their negative efects, and then, WSFC achieves superior performance compared to other available clustering methods.Te afnity matrix involved in WSFC is defned in advance, and adaptively learning this matrix according to data distribution in WSFC may further improve the efectiveness of WSFC.

Figure 8 :Figure 9 :
Figure 8: Average rank value of each method over all datasets with respect to ACC.
Input: Data set X, the values of c, μ, η, the number of clusters C, the projected dimensionality r, and W; Output: Prototypes V, fnal membership degrees U, weight matrix Θ and the matrix P;

Table 3 :
Te ACC results.Te bold values indicate the best results among the compared methods.Te second best results are highlighted within brackets.
Te bold values indicate the best results among the compared methods.Te second best results are highlighted within brackets.