A Belief Two-Level Weighted Clustering Method for Incomplete Pattern Based on Multiview Fusion

Incomplete pattern clustering is a challenging task because the unknown attributes of the missing data introduce uncertain information that affects the accuracy of the results. In addition, the clustering method based on the single view ignores the complementary information from multiple views. Therefore, a new belief two-level weighted clustering method based on multiview fusion (BTC-MV) is proposed to deal with incomplete patterns. Initially, the BTC-MV method estimates the missing data by an attribute-level weighted imputation method with k-nearest neighbor (KNN) strategy based on multiple views. The unknown attributes are replaced by the average of the KNN. Then, the clustering method based on multiple views is proposed for a complete data set with estimations; the view weights represent the reliability of the evidence from different source spaces. The membership values from multiple views, which indicate the probability of the pattern belonging to different categories, reduce the risk of misclustering. Finally, a view-level weighted fusion strategy based on the belief function theory is proposed to integrate the membership values from different source spaces, which improves the accuracy of the clustering task. To validate the performance of the BTC-MV method, extensive experiments are conducted to compare with classical methods, such as MI-KM, MI-KMVC, KNNI-FCM, and KNNI-MFCM. Results on six UCI data sets show that the error rate of the BTC-MV method is lower than that of the other methods. Therefore, it can be concluded that the BTC-MV method has superior performance in dealing with incomplete patterns.


Introduction
In the information era, data have abundant research value, but collecting complete data is signifcantly difcult. In the collection process, the reasons for missing data are varied, including subjective and objective factors, such as equipment malfunction, personnel operation error, false memory, and partial rejection by the respondents [1]. Missing data, also called an incomplete pattern, is a common phenomenon in practical applications. A survey shows that 45% of data sets in the UCI machine learning repository, which covers many felds, are incomplete [2]. Deletion and imputation methods are commonly used to deal with missing data. Deleting incomplete patterns is an easy method, which is acceptable when the incomplete pattern accounts for less than 5% of the whole data set [3]. Te imputation method, which replaces missing values with estimations, is a popular method for dealing with incomplete patterns [4]. For instance, the KNN technology and its derivatives have been used in many application felds because of their strong operability [5][6][7].
A number of imputation methods based on the KNN technology have been proposed [8][9][10]. In the early method, the average value of the k-nearest neighbor about the incomplete pattern was used to express the missing value [9]. In addition, the imputation methods of integrating KNN and other technologies were proposed by some researchers [11][12][13]. For example, an adaptive imputation method for missing values, which uses KNN and self-organizing map (SOM) based on belief function theory, is proposed in [11]. In this method, the uncertainty caused by the missing data is represented. Te linear local approximation method is presented, which uses the KNN with optimal weights obtained by local linear reconstruction technology to estimate the missing values [13]. Te estimated values obtained by the traditional KNN based on a single view are globally optimum but may not be locally optimum. Terefore, the imputation method based on a single view may decrease the accuracy of clustering methods.
Clustering is an important task of pattern recognition and machine learning, which divides objects into diferent clusters based on the similarity between patterns [14]. Hard clustering and fuzzy clustering methods have been used in many felds by their universality [15][16][17], but the clustering methods based on a single view ignore the information from multiple views [18]. Compared with the single-view clustering method, the clustering method based on multiple views explores the complementary information of each view, which can improve the accuracy of the clustering result [19,20]. Recently, multiview clustering has become a popular research topic [21,22]. A collaborative multiview clustering method is proposed in [23] to overcome disagreement between the views, the diferent properties, and scales of views. Te weights that represent the importance of views and features are proposed in [24]; an objective function is designed to express the heterogeneity of diferent views and the consistency across views during iterations. Jiang et al. [25] proposed the multiview FCM clustering algorithm with views and feature weights based on collaborative learning; this method can exclude irrelevant components in the clustering procedure, which increases the precision of the clustering results. In addition, multiview spectral clustering methods have been studied recently. Te spectral clustering algorithms consist of two steps as follows: learning the similarity graphs from instances and obtaining the clustering result based on spectral clustering. Tang et al. [26] proposed a unifed one-step multiview spectral clustering method (UOMvSC). In order to obtain the clustering results, the UOMvSC method combined the multiview embedding matrices and graphs into a unifed graph. A joint afnity graph for multiview clustering is proposed in [27]; the diversity regularization term is designed to learn the diferent weights of diverse views. Zheng et al. [28] proposed a novel multiview clustering method that integrates withinview partial graph learning, cross-view partial graph fusion, and cluster structure recovery. However, most of the clustering methods for incomplete patterns are based on singleview, and the clustering results are not accurate enough. In addition, to our knowledge, there is little research on the multiview imputation method, although researchers have proposed numerous methods to improve the accuracy of the estimation.
In this paper, we develop a belief two-level weighted clustering method for incomplete patterns based on multiview fusion (BTC-MV). Te main contributions of this work are summarized as follows: (1) Attribute-level weighted imputation strategy for incomplete patterns: In this strategy, the variance of each attribute, which is called attribute weight, is used to refect the importance. Te weighted attribute is used in searching for KNN of the incomplete pattern based on multiple views.
(2) View-level weighted fusion strategy based on belief function theory: Te view-level weights are obtained by optimizing the new objective function of the clustering method based on multiple views. Tey are regarded as the discounted factors in the belief fusion, which represent the importance of the evidence from diferent view spaces. (3) To the best of our knowledge, the belief two-level weighted clustering method for incomplete patterns based on multiview fusion is proposed for the frst time. Compared with other state-of-the-art methods, the BTC-MV method performs better in multiview clustering for incomplete patterns.
Te rest of this paper is organized as follows: In Section 2, we introduce related work on missing data classifcation methods and the basics of belief function theory. Te details of the belief clustering method for incomplete patterns based on multiview fusion are shown in Section 3. In Section 4, we compare the BTC-MV method with other state-of-the-art methods on six UCI data sets. Finally, the conclusion is drawn in Section 5.

Classifcation of the Missing Data.
According to the missing mechanism, the incomplete pattern can be divided into missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) [29]. Wang et al. [30] proposed a query algorithm based on the Spark framework to handle query problems with incomplete data sets. Te clustering method for incomplete patterns includes the imputation of the missing data and the clustering for data sets.
In addition to the abovementioned imputation method based on the KNN technology, the mean imputation (MI) and fuzzy c-means imputation (FCMI) methods have obtained signifcant research progress [31][32][33]. In MI [34], the missing data are estimated by the mean value or mode of the corresponding attribute, and it is used for the data sets with a similar attribute distribution in each category. However, the estimations of the same attribute in diferent incomplete patterns are equal. In FCMI [33], the estimations are calculated by the clustering centers and the distance between the centers and the patterns. However, the performance of this imputation strategy depends on initial conditions. Te clustering algorithm is applied to partition the data set into several clusters, and it has been widely used in various felds. A cluster-based information retrieval approach was proposed in [35]. Te k-means clustering method and frequent closed item set mining were combined to extract clusters of documents and fnd frequent terms. Te clustering method and pattern mining algorithm were integrated to search for the most relevant object from a clustered set of objects [36]. Te space-time series clustering methods, such as hierarchical, partitioning-based, and overlapping clustering methods were used in big urban trafc data sets [37]. In addition to the single-view clustering methods described above, many researchers have extended the single-view clustering methods to multiview clustering methods. A multiview FCM clustering method based on the collaborative learning was proposed in [38]; it included a single-view partition process and a collaborative step to share information between diferent views. Wang and Chen [39] proposed a multiview fuzzy clustering method with minmax optimization. Te multiview clustering method can integrate the information from diferent views.
In recent years, with the development of the neural networks, many models based on deep learning have been built to classify incomplete data sets [40][41][42]. In [41], a multivariate time series generative adversarial network is proposed for multivariate time series imputation, which improves the imputation performance. However, the performance of the deep learning classifcation models depends on large data sets. When the data set is small, the model cannot be stable.

Basics of the Belief Function Teory.
Te belief function theory is called evidence theory or Dempster-Shafer theory (DST), which is a classic theoretical framework used in probabilistic reasoning [43,44]. Te belief function theory can generate a belief mass by fusing the useful evidence from independent sources, which is used in many felds [45,46]. In this theory, the discernment framework consists of fnite, mutually exclusive, and complete elements of the problem under study, and it is represented by Ω � ω 1 , ω 2 , . . . , ω c . Te power-set of the discernment framework Ω expresses the uncertainty, which is denoted as 2 Ω . Te basic belief assignment (BBA) is a function m(•) from 2 Ω to [0, 1], which satisfes the following conditions: where m(A) expresses the probability of the evidence supporting proposition A but does not support the occurrence of any true subset of A. All elements that satisfy A ∈ 2 Ω and m(A) > 0 are called focal elements of m(•). Te outputs of classifers or fuzzy clustering methods indicate the extent of the corresponding evidence that supports diferent classes. Te DS fusion theory [47,48] is used in many felds because it can integrate the evidence from many independent sources by its commutative and associative properties. Te fusion strategy of the evidence from diferent independent sources m 1 (•) and m 2 (•) at the discernment framework 2 Ω is shown as follows: whereK � indicates the confict belief mass between evidence from diferent sources. However, the result of the DS fusion theory is unreasonable when the confict between evidence from diferent sources is signifcantly high. Terefore, a series of methods are proposed to solve the abovementioned problems, such as a fusion strategy proposed by Dubois and Prade in [49] and PCR6 rules based on the proportional confict redistribution [50].

Clustering Method for Incomplete Pattern
We propose the BTC-MV method to decrease the error rate of the clustering method in incomplete patterns, where data are randomly missing or unobserved. Te fowchart of the BTC-MV method is presented in Figure 1. First, an attribute-level weighted imputation strategy is proposed to estimate the missing or unobserved value in the data set X. In this step, the variance of each attribute in the data set X is calculated and regarded as the weight of the KNN and the missing values are estimated by the KNN. Second, a fuzzy C-means clustering method based on multiple views is proposed to cluster the complete data set with estimated values, and the membership values and the weight of multiple views are submitted to a view-level weighted fusion strategy to get precise results. Tird, the BTC-MV method uses the weight of multiple views as the discounted factors, and a belief fusion strategy is proposed to fuse the membership values of the pattern in diferent views. Finally, the clustering results are obtained. Te details of the BTC-MV method are shown as follows.

Attribute-Level Weighted Imputation Strategy.
Here, all attributes in data set X are divided into N views. X μ i expresses the feature matrix of the pattern x i under the view space μ. We assume that some attributes of pattern x i are unobserved, because the clustering method for incomplete patterns is our research topic.
In the BTC-MV method, the attribute-level weighted imputation strategy based on the KNN is proposed to estimate the missing value. First, we calculate the variance of each attribute in the data set X, as shown in equation (3), which expresses the importance of diferent attributes in X. A bigger variance indicates a larger diference between all instances in the attribute space, so the estimation calculated by k-nearest instances is more accurate. Ten, the weighted KNN method is proposed to search for the top-k-nearest neighbors of y μ i in the view space μ; the distance between the complete pattern x μ i and the incomplete pattern y μ i is shown in equation (4). Te variance of the attribute is regarded as the weight of distance between complete patterns and incomplete patterns. According to the weighted distance, we obtain K neighbors closest to the incomplete pattern and estimate the missing value. Finally, the estimated value of the missing data is calculated by equation (5). Te imputation strategy is shown in Algorithm 1, which introduces the variance of diferent attributes in X to estimate the missing data and improve the precision of the estimation.
where x μ p is the pth attributes of the data set X in the view space μ, p is the number of the patterns in the view space μ, n μ is the number of the attributes in view space μ, S μ p is the variance of attribute p, and it is normalized as the weight of attribute p under the view space μ. y i is an incomplete pattern with t unobservable attributes.

Clustering Method Based on Multiple Views.
After the attribute-level weighted imputation method based on multiple views, the data set X is regarded as a complete data set with estimations. Te fuzzy C-means clustering method based on multiple views is conducted on the complete data set X, which is shown in Algorithm 2. In each view, we calculate the clustering centers, the membership values, and the view weights. Te objective function of the clustering method based on multiple views is shown in the following equation: where β is the weight exponent that determines the fuzziness of the clustering result, m ic is the membership value of the ith pattern x ip to the cth cluster center v cp , and w μ is the weight of the μ th view. d 2 (x μ ip , v μ cp ) expresses the Euclidean distance between x ip and v cp in the view space μ.
Te optimal values of the multiview clustering method are obtained by minimizing the objective function by iterative optimization. In general, the optimal values are derived by setting the partial derivatives of the objective function to zero. According to the Lagrangian multiplier method, the Lagrangian function of the objective function J under the constraints of equation (7) is shown in the following equation: where λ i and ϕ are the Lagrangian multipliers.
Te optimal values of the objective function J, such as the cluster center v μ cp , the weight of the view w μ , and the membership value m ic , are obtained by calculating partial derivatives of the function L, which are shown in the following equations:  Computational Intelligence and Neuroscience

View-Level Weighted Fusion Strategy Based on the Belief Function Teory.
In the multiview clustering process, the weights of various views are diferent, which indicates that the reliability of the evidence from various sources is different. Terefore, the membership values of the pattern belonging to diferent clustering centers are not equally weighted in diferent views. We use discounting techniques and DS fusion theory to integrate diferent membership values of the pattern and named it the view-level weighted fusion strategy based on belief function theory. In this method, a classic discounted rule proposed by Shafer in [43] is applied here; the membership values based on the multiple views can be regarded as the evidence that the pattern belongs to all possible classes in the discernment framework. First, we multiply the membership values by the view weights representing reliability. Ten, the discounted membership values in diferent views are fused by a belief function theory. Finally, the clustering results can be obtained. In this section, the membership values are treated as mass values; the view weights are regarded as the discounted factors; and the discounted masses are obtained by equation (13). Te discounted masses are regarded as the probability that the pattern belongs to diferent categories in multiple views. m′(Ω) represents the imprecision of the clustering method due to incomplete patterns. In the BTC-MV method, the discounted masses from multiple views are fused by the DS theory, as shown in equation (2). Finally, the clustering results are determined by the maximum belief masses.

Experiment Application
In this section, in order to test the performance of the BTC-MV method, we conduct massive experiments on six data sets with diferent dimensions from the UCI repository [2]. We divided the attributes of each data set into diferent groups to satisfy the scenarios of multiple views. Some attributes of this data set are randomly missing to meet the assumption of an incomplete data set. Te important information of these well-known data sets, including the number of attributes (N a ), classes (N c ), instances (N i ), and views (N v ), is shown in Table 1. Tese six data sets, where the attributes range from 4 to 16, views range from 2 to 4, classes range from 2 to 7, and the instances range from 150 to 13611, are representative and generic results can be obtained. In order to justify the performance of the BTC-MV method, the classic imputation methods and clustering methods are combined and compared with the proposed BTC-MV method. Te typical methods of estimating missing data include MI [34] and k-nearest neighbors (KNN) [9]. Te classic clustering methods used in the comparison experiments include K-means [51], and FCM [52]. According to the number of views, it can be divided into single-view clustering and multiview clustering. Te error rate marked as R e is used to evaluate the performance of the BTC-MV method. Te formula for calculating error rate is R e � N e /N, where N e is the number of the patterns with error clustering results and N is the total number of the patterns used to conduct experiments. Te experiments are conducted with MATLAB software.

Experiment 1.
In the methods of MI-KM, MI-KMVC, KNNI-FCM, KNNI-MFCM, and BTC-MV, parameter K represents the number of the patterns used to estimate the missing data, and it is one of the main parameters in BTC-MV. In the BTC-MV method, K patterns closest to the incomplete data are searched from multiple views with the known attributes. It is worth noting that the parameter K infuences the precision of the estimations and the performance of the clustering methods. In order to verify the infuence of parameter K on the clustering methods, numerous experiments are carried out under diferent K values and the comparison results are shown in Figure 2. Te error rate of the BTC-MV method varies with the parameter K. However, when K takes a value from 3 to 20, the error rate of the BTC-MV method fuctuates in an acceptable extent. Tis result indicates that the BTC-MV method has strong robustness for parameter K, which is an advantage of the BTC-MV method in practical classifcation applications.

Experiment 2.
In this experiment, we set each data set to have 10%, 30%, and 50% incomplete patterns, respectively. Moreover, for each incomplete pattern, there are 50% unknown attributes. We compare the performance of the BTC-MV with other clustering methods on six incomplete data sets, which are shown in Tables 2-4. Te error rate of the BTC-MV method on diferent data sets is lower than that of other methods. It may be because the performance of the attribute-level weighted imputation strategy in the BTC-MV method is superior. Tis imputation method can accurately estimate the missing values because it makes the patterns with high attribute correlation closer to the missing data. So, we can obtain complete data sets with precision estimations and reduce the error rate of the clustering method. It is noteworthy that as the number of missing data increases, the Computational Intelligence and Neuroscience 5  Step 2: Searching for k-nearest neighbors of the incomplete pattern y μ i in the view space μ by equation (4) Step 3: Estimating the missing data of incomplete pattern y μ i by equation (5) ALGORITHM 1: Attribute-level weighted imputation strategy. 6 Computational Intelligence and Neuroscience Input: Complete data set X with estimated values Parameters: C: the number of the clustering centers; N: the number of the views; threshold ε used to determine whether to stop the iteration; τ max : the maximum number of the iteration; β: fuzziness index. Output: Te weight of the views w and the membership matrix m Initial: Randomly generate the membership matrix m For each μ do: Forτ � 1: τ max Calculate the clustering centers by equation (9); Calculate the weight of the μth view by equations (10) and (11); Calculate the membership values by equation (12); Calculate the objective function J τ by equation (6); End End Return w and m ALGORITHM 2: Te clustering method based on multiple views.   In each data set, the bold value is the lowest error rate.  In each data set, the bold value is the lowest error rate. In each data set, the bold value is the lowest error rate. In each data set, the bold value is the lowest error rate. In each data set, the bold value is the lowest error rate. 8 Computational Intelligence and Neuroscience error rate of these methods also increases. Tis phenomenon indicates that the missing data make the information ambiguous, leading to a degradation in the performance of the clustering methods.

Experiment 3.
In this section, we test the infuence of the number of unknown attributes in the incomplete patterns. We set each data set to have 30% missing data and each incomplete pattern to have 30%, 50%, and 70% unknown attributes, respectively. We compare the performance of the BTC-MV with other clustering methods on six incomplete data sets, which are shown in Tables 5-7.
Te results of these experiments indicate that the increase of unknown attributes generally leads to a decrease in clustering performance, as missing data introduce uncertain information. However, compared with other methods, the method of the BTC-MV has superior performance. Tis experiment further validates the efectiveness and robustness of the BTC-MV method.

Conclusions
In this paper, the new BTC-MV method is proposed to meet the challenges of incomplete data clustering. Te BTC-MV method estimates the unknown attributes by the weighted KNN strategy based on multiple views; the weights are represented by the variance of each attribute, which refects the importance of the attribute. Te attribute-level weighted imputation strategy improves the precision of the estimations. Ten, the clustering method based on multiple views is proposed in BTC-MV, and the view weight expresses the reliability of the evidence from diferent spaces. Terefore, the membership values of the pattern belonging to various categories in multiple views cannot be equally weighted. Finally, in the BTC-MV method, a view-level weighted fusion strategy based on belief function theory is proposed to integrate the evidence from diferent source spaces. We conducted experiments on six UCI data sets to compare the performance of the BTC-MV method with that of other state-of-the-art methods. Te experiment results show that the efectiveness of the BTC-MV method in clustering incomplete patterns.
In the BTC-MV method, the attribute-level weighted imputation strategy makes an important contribution in improving the accuracy of clustering incomplete patterns. However, it is costly to introduce large computations because the distances need to be calculated in the KNN strategy. We will consider using other methods to reduce the computational complexity in future work. In addition, we will also research other methods to optimize the data set in order to obtain superior clustering performance.

Data Availability
Te data sets used in this proposal are extracted from the University of California Irvine machine learning repository.