Neutrosophic Clustering Algorithm Based on Sparse Regular Term Constraint

Clustering algorithm is one of the important research topics in the field of machine learning. Neutrosophic clustering is the generalization of fuzzy clustering and has been applied to many fields. *is paper presents a new neutrosophic clustering algorithm with the help of regularization. Firstly, the regularization term is introduced into the FC-PFS algorithm to generate sparsity, which can reduce the complexity of the algorithm on large data sets. Secondly, we propose a method to simplify the process of determining regularization parameters. Finally, experiments show that the clustering results of this algorithm on artificial data sets and real data sets are mostly better than other clustering algorithms. Our clustering algorithm is effective in most cases.


Introduction
With the increasing development of information technology, the data dimensions on the Internet have increased exponentially. For example, dimensions of various documents, multimedia, and gene expression data can reach hundreds of thousands. Facing these data, scholars have proposed many data processing methods [1][2][3].
In 1965, Zedah [4] proposed the concept of fuzzy set. Fuzzy theory is applied in many areas, such as multiattribute decision-making [5][6][7], image processing [8], and cluster analysis [9]. In particular, fuzzy clustering has made considerable progress in the past few decades. Based on fuzzy sets, FCM [10] algorithm is proposed. e quality of the clustering results is good, but there are still some problems for uncertainty problems. erefore, in recent years, scholars have devoted themselves to propose a variety of methods to improve the fuzzy c-means algorithm of various aspects. Hwang [11] et al. combined the type-2 fuzzy set with the FCM (T2-FCM) clustering algorithm and made an improvement on the uncertainty that affects the final class c classification. Linda [12] et al. improved the general type-2 fuzzy set fuzzy c-means (GT2-FCM) algorithm through the alpha surface representation theorem, described the ambiguity in linguistic terms, and transformed the uncertainty of the language into the uncertain fuzzy positions of the extracted clusters. e algorithm [12] works well when there are noisy samples or insufficient training samples. e T2-FCM and GT2-FCM algorithms are all improved for the uncertainty of fuzzy c-means algorithm.
In 1986, Atanassov [13] proposed the concept of intuitionistic fuzzy sets, which solved some of the drawbacks of traditional fuzzy sets, and is more capable of processing uncertain information. Chaira [14] et al. introduced intuitionistic fuzzy entropy into the traditional fuzzy c-means algorithm, and the new algorithm proposed was used to cluster CT brain scan partial images, which can identify brain abnormalities. Bukiewicz [15] et al. introduced a variable to deal with the uncertainty and similarity measurement between intuitionistic fuzzy sets in the fuzzy c-means algorithm and proposed a data set fuzzy clustering method based on the intuitionistic fuzzy set theory. Zhao [16] et al. constructed the corresponding lambda cutting matrix by calculating the correlation coefficient on the intuitionistic fuzzy set and then clustered on the cutting matrix. Cuong [17] proposed the concept of the picture fuzzy set (PFS), which is a direct extension of a fuzzy set and intuitive fuzzy set.
ong [18] proposed an image fuzzy clustering algorithm based on image fuzzy sets. e algorithms proposed in literatures [14][15][16][17][18] have better clustering performance than the traditional general algorithm, but they have certain limitations in the application. e generated membership matrix does not have sparseness, which will increase the amount of calculation. In view of the limitations of the intuitive fuzzy sets, Smarandache [19] proposed the neutrosophic set theory. e basic idea is that everything can be described in three degrees of truth, uncertainty, and distortion. Each object has three degrees of membership function. Each membership function belongs to the standard and nonstandard subsets of ]0 − , 1 + [. e neutrosophic set theory can not only describe the uncertainty problems better but also solve the existing problems when applying fuzzy theory. erefore, scholars have done in-depth research on neutrosophic set [20][21][22][23][24] and proposed many neutrosophic clustering algorithms. Ye [25] proposed a single-valued neutrosophic minimum spanning tree (SVNMST) clustering algorithm, which shows great advantages in the clustering of single-valued neutrosophic observation data. In the same year, Ye [26] proposed singlevalued neutrosophic clustering methods based on similarity measures between single-valued neutrosophic sets (SVNSs). Guo [27] proposed neutrosophic c-means clustering algorithm (NCM). e NCM algorithm can calculate certainty and uncertainty, and the membership function is not affected by noise. Nowadays, neutrosophic clustering has been applied to many fields such as image segmentation and biology [28][29][30][31][32]. PFS is a standardized form of neutrosophic set. e FC-PFS algorithm proposed in [18] is actually a kind of neutrosophic set type algorithm. However, the algorithm needs to calculate three matrices of the same scale, and the membership matrix is not sparse, which affects the clustering effect to a certain extent.
In order to solve the abovementioned problems, this paper proposes a new algorithm sparse neutrosophic fuzzy clustering algorithm (SNCM). e main idea is to introduce a regularization term into FC-PFS algorithm. e new algorithm can produce sparsity, since it reduces the number of eigenvalue vectors of the sample. us, SNCM reduces the complexity of the model. Experiments show that the performance of the proposed algorithm is better than some other clustering algorithms. e experimental results produce a sparse membership matrix, which reflects the effectiveness of the algorithm. e specific arrangements for the rest of this article are as follows. e second section introduces the related basic concepts and algorithms, the third section presents the new algorithm proposed in this article and the solution process, the fourth section proves the effectiveness of the proposed algorithm through related experiments, and the fifth section gives relevant conclusions.

Related Algorithms
In this paper, the data set contains n data points, each point is a d-dimensional feature vector; the purpose of clustering is to obtain c clusters. e following introduces some clustering algorithms FCM and FC-PFS.

FCM Algorithm.
e FCM algorithm proposed in 1984 is a very well-known algorithm. It is not only used in fuzzy engineering but also popular in the fields of medical diagnosis and communication. e FCM algorithm divides each data point x i into a specific cluster v j , u ij means the i-th data point x i belongs to the membership value of the j-th cluster. e cluster center of the cluster is expressed as v j ∈ R d , and the objective function of the FCM algorithm is where m is a fuzzy parameter and the constraint condition of formula (1) is as follows: Using Lagrangian multiplier method, the iterative method of membership degree and cluster center is obtained: Until the number of iterations reaches the maximum value or |J (t) − J (t− 1) | < ε, the iteration terminates, where J (t) and J (t− 1) are the objective function values of the t and t − 1 iterations, and ε are the termination thresholds, generally in the range of (0, 0.1). According to the fuzzy membership value, if u il � max(u i1 , u i2 , . . . , u ik ), then x i is divided into jth cluster. It can be proved that the algorithm finally converges to the local optimum or the saddle point of the objective function.

FC-PFS Algorithm
where μ _ A (x) is the degree of positive membership of each xϵX in A, η _ A (x) is the degree of neutral membership of x in A, and c _ A (x) is the degree of negative membership of x in A, and it satisfies the following conditions: 2 Complexity e refusal degree of an element is calculated as Definition 2. X is an object (point) set, x is an element in X, and the neutrosophic set A on X can be expressed as where T A (x) is the truth membership degree, I A (x) is the indeterminacy membership degree, and F A (x) is the falsity membership degree, which belongs to the standard and nonstandard subset of From the abovementioned two definitions, it can be seen that the picture fuzzy set is actually the standard form of the neutrosophic set. erefore, the FC-PFS algorithm proposed by ong and Son is based on the neutrosophic set. e objective function of the algorithm is Among them, u ij , ξ ij , and η ij are the true membership degree, refusal membership degree, and neutral membership degree of the data points x i belonging to the j-th cluster, respectively. e constraints of formula (8) are Using the Lagrangian multiplier method, the iterative method is adopted to obtain the update formulas of u ij , ξ ij , η ij , and v j : e iteration is terminated until the number of iterations reaches the maximum or

Determining the Objective Function.
In traditional kmeans clustering, each row of the membership matrix U contains a 1, and the remaining c − 1 elements in this row are 0, so the row sum of U is 1, and each column sum represents the number of sample points in each cluster, and the fuzzy cmeans algorithm needs to choose the appropriate fuzzy degree m. Different from the abovementioned three clustering algorithms, the algorithm in this paper relaxes each element of U to a nonnegative value less than 1 under the constraint conditions and presets the ambiguity m � 1. Our goal is to get a sparse U, so we introduce regular terms to get the objective function of the new algorithm: e abovementioned formula satisfies the following constraints: We can see that if the sample point is divided into a single cluster, u ij (2 − ξ ij ) is equal to 1. Otherwise, it is a nonnegative value less than 1.
e new algorithm considers the sparsity of the membership degree of each sample point assigned to different clusters in the clustering process. In the process of minimizing equation (11), the importance of each part is controlled by the parameter c. If the parameter is zero, the membership vector of each sample is not sparse. If the parameter size is constantly adjusted, the sparsity of the member vector will also change. As the parameter gradually increases, the membership vector contains more and more nonzero elements. When the maximum value is reached, all elements of the membership vector are not zero, and the membership vector is nonsparse at this time. erefore, this parameter controls the sparsity of the membership vector. We will give a method to determine the appropriate parameters in the subsequent part to obtain more accurate clustering results.
By considering zJ/zv j � 0, we have Solve U with fixed V, ξ, and η. In order to facilitate the solution, we make the following deformation of the objective function: where (17) can be divided into n subproblems: en, (18) is written in the following vector form: By solving problem (19), the solution of S can be obtained, and the update formula of U can be further obtained e specific solution process for problem (19) is given in Section 3.3. Fixed variables U, ξ and V , use the Lagrange multiplier method to solve η: We use the function L to derive η to make it equal to zero, that is, 4 Complexity Finally, using the similar technique of Yager [33] to generate operators, we modify the hesitation of the intuitionistic fuzzy set π 1/α to obtain the value of element rejection degree by replacing u ij + η ij with μ A (x), as follows: · (i � 1, . . . , n; j � 1, . . . , c).
3.3. Optimization Method for c. In specific practice, the regularization parameter in question (19) is difficult to determine, and its value can be from zero to infinity. In this section, a method for determining the regularization parameter c is given. e Lagrangian function of question (19) is where λ and β i are greater than zero and are Lagrange multipliers.
According to the KKT condition, the optimal solution of is the following form In practice, if we focus on the locality of the data, usually we can get better performance. erefore, it is best to learn a sparse s i . Another advantage of learning sparse matrix S is that it can greatly reduce the computational burden of subsequent processing. Without loss of generality, it is assumed d i1 , d i2 , . . . , d ic to be sorted from small to large. If the optimal s i only has k nonzero elements, then according to equation (30), we know s ik > 0 and s i,j+1 � 0. So, we have According to equation (30) and constraint s T According to equations (38) and (39), we have an inequality of c i erefore, in order to obtain the optimal solution of problem (19) with precise k nonzero values, we can make Taking the average of c 1 , c 2 , . . . , c n , the calculation formula is as follows: Equation (35) gives a method to determine the regularization parameters.
According to equations (31), (33), and (35), the following optimal solution can be obtained, ij (i � 1, 2, . . . , n; j � 1, 2, . . . , c) by equation (28) (g) Set the conditions for jumping out of the iteration Below, we analyze the algorithm complexity. First, we analyze the time complexity of the algorithm. From the algorithm steps, the basic sentence of the algorithm is the loop body of the algorithm iterative calculation variable, and the loop body for calculating u is embedded, so the time complexity of the algorithm is O (nt), t is the number of iterations and n is the number of sample points. Secondly, the space complexity of the analysis algorithm is related to the data scale, so the space complexity is O (nm), n is the number of sample points, and m is the dimension.

Results and Discussion
In order to verify the feasibility of the clustering algorithm SNCM proposed in this paper, classic clustering algorithms are selected: FCM [10], K-means [34], Ncut [35], Rcut [36], FC-PFS, and an effective clustering method based on data indeterminacy in neutrosophic set domain (INCM) [37], as comparison algorithms. A variety of evaluation indicators such as accuracy (ACC) and normalized mutual information (NMI) are used to evaluate the clustering results.
In terms of parameters, due to the instability of the Kmeans, FCM, and FS-PFC, a method of averaging them is adopted for 50 runs. For the Rcut and the Ncut, the experiment used the widely used self-tuning Gaussian method to construct the affinity matrix (the value is self-tuning). Take 0.9 for the parameter in FC-PFS algorithm. e parameter values in INCM algorithm are the best values found in literature [37]. e parameter in SNCM algorithm is 0.9, the value of parameter k is self-adjusted, and maxSteps is 1000.
In terms of experimental environment, all the experimental environments in this article are Microsoft Windows 10 system, the processor is Intel(R) Core(TM) i5-7200U CUP @ 250 GHz 2.70 GHz, memory 8.00 GB, programming software used is MATLAB R2016a.

SNCM Algorithm Descriptions.
First, we illustrate the process of the proposed algorithm SNCM clustering the WBC data set; at this time, n � 683 and c � 2. e initial membership matrix, uncertainty matrix, and rejection matrix are as follows: e distribution of data points according to these initializations is illustrated in Figure 1(a) in which the SNCM algorithm is used to calculate the cluster centers using equation (19): en, we calculate the new membership matrix, uncertainty matrix, and rejection matrix: 6 Complexity According to the abovementioned matrix, the calculated value of is greater than ε, so the iterative step will continue. Figure 1(b) shows the distribution of clusters after the first iteration.
rough a similar process, we continue to calculate the cluster center, membership degree, uncertainty degree, and rejection matrix until the stopping condition is met. e final membership degree, hesitation degree, and rejection degree matrix is as follows: e calculated final cluster centers are expressed as follows, and the distribution of clusters and cluster centers is shown in Figure 1

Verification of Sparsity.
First of all, experiments are carried out using artificial aggregation data sets and real Wine data sets. e aggregation data set is a data set composed of 7 clusters of 788 2-dimensional data points. e Wine data set is a data set composed of 3 clusters of 178 12dimensional data points. e parameter k satisfies k ≤ c. e goal of the experiment is to show that the membership matrix generated by the SNCM algorithm which is sparse compared to the FCM algorithm. Due to the large number of sample points, it is inconvenient to present the complete membership matrix in the article, so we select some sample points for display. Tables 1-4 are the membership matrix results obtained by the SNCM algorithm and the FCM algorithm on the two data sets. It can be seen from the experimental results that the SNCM algorithm effectively reduces the complexity of the model. Next, we perform experiments on the artificial data set. Figures 2(a) and 2(b) show the distribution of the two data sets, where data set (a) has four clusters and data set (b) has three clusters. Clustering is performed using the proposed algorithm, and the clustering results and the weighted connection graph are shown in Figures 2(c)-2(f ). Figures 2(d) and 2(f ) use the final degree of membership as the connection weight between the data point and the cluster center. e data point is connected to the cluster center. It can be seen that the points within the cluster are closely connected to the cluster center, and the points between the clusters are separated from the cluster center. It is separated, so the proposed algorithm can effectively cluster the aforementioned data sets and can effectively divide clusters with few categories.
ese data sets are in the UCI Machine Learning Library Data Set. ey cover the characteristics of various data sets such as high-dimensional and low-dimensional, multiple samples, and a few samples. e information of the night real data sets is shown in Table 5.
e experimental results on the real data set are shown in Tables 6 and 7. e folded data represent the best result, followed by the italic. Table 6 shows the ACC comparison of different algorithms under each data set. Table 7 shows the                    61.38%, which is higher than INCM (54.02%), FCM (53.30%), FC-PFS (53.23%), K-means (59.81%), Ncut (50.72%)), and Rcut (41.39%). e specific situation is shown in Figure 3.
For the parameters, in Figure 4, different exponents are given to verify the algorithm, and the average clustering accuracy of the proposed algorithm under different exponents is listed in the chart. We find that the clustering quality of SNCM is relatively stable. As the index increases, the accuracy of the SNCM algorithm also tends to increase. erefore, the parameter value in the experimental part is 0.9 to improve the clustering accuracy of the SNCM algorithm.
Finally, we test the convergence of SNCM on the data sets. e results are shown in Figure 5. It can be seen that SNCM algorithm can absolutely converge with few interaction steps.
e SNCM algorithm improves the generalization ability of the algorithm by introducing regularization terms, so that the membership matrix has sparseness, and the calculation of membership considers the degree of sparseness k. Compared with the comparison algorithm, in most cases, the result of this algorithm is better than that of the comparison algorithm.
e experiment of the algorithm on multiple data sets can also illustrate this point, and the parameter k has great influence on results.

Conclusion
In this paper, we have proposed a novel method, called neutrosophic clustering algorithm based on sparse regular term constraint. Different from the previous neutrosophic clustering algorithm, the algorithm proposed in this paper can handle the case of ambiguity m � 1, not limited to the condition of m > 1. Furthermore, the regular term is introduced to make the algorithm sparse, thereby reducing the computational complexity of the algorithm. Moreover, we propose a method to simplify the process of determining regularization parameters and improve the clustering effect. In addition, a large number of experiments show that the clustering results of the proposed algorithm on artificial data sets and real data sets are mostly better than other clustering algorithms. However, the parameter k in the algorithm has a greater impact on the clustering effect. So, we will focus on this in the future.
Data Availability e data in this article come from the data set in the UCI Machine Learning Library and are available in the official database.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.