In real-world applications of multiview clustering, some views may be incomplete due to noise, sensor failure, etc. Most existing studies in the field of incomplete multiview clustering have focused on early fusion strategies, for example, learning subspace from multiple views. However, these studies overlook the fact that clustering results with the visible instances in each view could be reliable under the random missing assumption; accordingly, it seems that learning a final clustering decision via late fusion of the clustering results from incomplete views would be more natural. To this end, we propose a late fusion method for incomplete multiview clustering. More specifically, the proposed method performs kernel
The term “multiview data” refers to a collection of different data sources or modalities that describe the same samples. For example, clinical text and images serve as two views of a patient’s diagnosis file, or an image on a webpage may be described by the pixel data and the surrounding text. Clustering is one of the unsupervised learning tasks that divides samples into disjointed sets, revealing the intrinsic structure of the samples [
However, in real-world applications of multiview clustering, incomplete views often exist. For example, in patient grouping [
A straightforward strategy for handling incomplete multiview clustering is to first fill the incomplete view information and then apply the common multiview clustering algorithm. Some widely used filling algorithms include zero filling, mean value filling, and
In addition to simple filling methods, a few early fusion methods have been proposed for incomplete multiview clustering. In [
What these studies overlook is that the clustering results from the incomplete views could be reliable under a random missing assumption. Most of the studies on incomplete multiview clustering are based on this assumption, which holds that whether an instance in a view is missing is not relevant to the corresponding sample’s cluster label. Under this assumption, the missing ratios of each cluster should be almost the same; therefore, the overall cluster structure could be kept in an incomplete view.
Accordingly, we build a toy data consisting of three Gaussian distributions to illustrate how the cluster structure could be maintained under random missing conditions. We randomly delete the instances with different ratios and perform kernel
Cluster structure of the visible instances remains stable when this view suffers different ratios of random missing. The complete view consists of three Gaussian distributions. ACC is the kernel- (
We repeat the random missing procedure for 100 times at different missing ratios. We calculate the average ACC and plot the cluster centroids of the visible instances under random missing. Black crosses are cluster centroids of the complete view. Red squares are cluster centroids of the visible instances under random missing.
Since clustering results from incomplete views could thus be made reliable, this enables us to propose a late fusion method for incomplete multiview clustering, while most of the previous studies focus on early fusion methods. Firstly, we perform kernel
A brief example of incomplete multiview clustering via late fusion.
We conclude this section by highlighting the main contributions of this work, as follows: (1) We propose a late fusion method for incomplete multiview clustering, while most previous studies have concentrated on early fusion methods. Experimental results also validate the effectiveness of the proposed method. (2) In the second step of the proposed method, we design an alternate updating algorithm with proved convergence to learn the clustering decision that achieves the best
In this section, we introduce some preliminary knowledge to facilitate better understanding of our proposed method. We first outline the notations used in this paper, after which
Suppose the incomplete multiview data have
The idea behind
An alternate updating algorithm is designed to solve this problem. Firstly, the centroids of the clusters are initialized. The cluster assignment is then updated by assigning the cluster label of each sample according to the closest centroid. Next, the centroids are updated by calculating the average of the samples in each cluster. The centroids and the cluster assignment are alternately updated until the cluster assignment no longer changes.
Kernel
Define
However, the problem in Equation (
In a departure from conventional subspace methods, we develop a late fusion method for incomplete multiview clustering. This method performs kernel
In line with most of the previous research into incomplete multiview clustering, we also assume that the instances in each view satisfy the random missing assumption. Although there are missing instances in an incomplete view, a common clustering method can be applied directly to the visible instances. As pointed out in the introduction, the clustering results in each view are reliable, which makes the late fusion of these results promising. In this paper, we perform kernel
To create a fusion of the clustering results
For view
For the multiview situation, we wish to find a consistent clustering decision
Similar to
The updating of
Equation. (
Denoting
When
By taking the derivative of Equation (
Equation (
Therefore, to minimize Equation (
The derivative of
The alternate updating of
For the alternate optimization,
The overall algorithm is summarized in Algorithm
Incomplete multiview data: The number of clusters: The initial clustering decision:
The final clustering decision: Initializing centroids Performing kernel
Updating Updating
Eigenvector decomposition is applied to solve the kernel
Experimental comparisons are conducted on six multiple kernel learning benchmark datasets. In these datasets, each kernel serves as a view.
A precomputed kernel dataset from [
Consumer video analysis benchmark dataset proposed in [
Handwritten numerals (0–9) dataset from UCI Machine Learning Repository. The original dataset consists of 6 feature sets and can be downloaded from
17 category flower dataset from Visual Geometry Group. The original dataset can be downloaded from
102 category flower dataset from Visual Geometry Group. The original dataset can be downloaded from
Fold recognition dataset which consists of 694 proteins with 27 SCOP fold [
The basic information of these datasets is summarized in Table
Information of datasets.
Dataset | Sample number | Kernel number | Cluster number |
---|---|---|---|
Caltech102 | 1530 | 25 | 102 |
CCV | 6773 | 6 | 20 |
Digital | 2000 | 3 | 10 |
Flower17 | 1360 | 7 | 17 |
Flower102 | 8189 | 4 | 102 |
ProteinFold | 694 | 12 | 27 |
The proposed method is compared with several imputation methods and a representative early fusion method. Moreover, the best result of a single view is also provided as a baseline.
The best clustering result from a view. We select the view that has the highest clustering performance with the visible instances. If this view is incomplete, we assign the missing instances with random labels and then report the performance.
The missing kernel entries are filled by zero, after which multiple kernel
The missing kernel entries are filled by the average value of the corresponding visible entries in other views. Multiple kernel
The incomplete kernels are filled using the
The alignment-maximization filling proposed in [
This subspace method, proposed in [
In our experiments, the number of clusters is considered as prior knowledge. Base kernels are centralized and scaled during the preprocessing procedure following the suggestion in [
Since the base kernels are complete in the original datasets, the incomplete kernels need to be generated manually. We assume that the ratio of samples with missing views (incomplete sample ratio) is
The proposed method requires an initial clustering decision
Performance comparisons between the initialization and the corresponding late fusion in terms of NMI (%).
ProteinFold | Flower17 | Caltech102 | Digital | CCV | Flower102 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Initial | Late fusion | Initial | Late fusion | Initial | Late fusion | Initial | Late fusion | Initial | Late fusion | Initial | Late fusion | |
|
||||||||||||
BS | 33.53 |
|
37.27 |
|
52.70 |
|
57.24 |
|
15.14 |
|
43.05 |
|
ZF | 34.79 |
|
41.95 |
|
57.21 |
|
44.85 |
|
13.06 |
|
39.90 |
|
MF | 35.27 |
|
42.45 |
|
57.07 |
|
44.56 |
|
13.23 |
|
39.83 |
|
KNN | 35.19 |
|
42.60 |
|
57.23 |
|
65.59 |
|
12.59 |
|
40.04 |
|
AF | 37.14 |
|
43.87 |
|
58.16 |
|
48.08 |
|
|
13.20 | 40.35 |
|
|
||||||||||||
|
||||||||||||
BS | 28.27 |
|
39.52 |
|
48.62 |
|
45.03 |
|
10.94 |
|
35.74 |
|
ZF | 30.05 |
|
38.28 |
|
53.35 |
|
41.34 |
|
8.88 |
|
37.04 |
|
MF | 31.10 |
|
37.45 |
|
53.26 |
|
40.17 |
|
8.97 |
|
36.98 |
|
KNN | 33.73 |
|
38.74 |
|
54.86 |
|
65.90 |
|
9.53 |
|
37.52 |
|
AF | 33.94 |
|
42.24 |
|
57.41 |
|
47.35 |
|
10.92 |
|
38.08 |
|
|
||||||||||||
|
||||||||||||
BS | 24.63 |
|
22.18 |
|
46.87 |
|
35.28 |
|
7.61 |
|
29.66 |
|
ZF | 26.03 |
|
33.38 |
|
50.99 |
|
39.06 |
|
8.76 |
|
35.21 |
|
MF | 27.50 |
|
33.28 |
|
50.53 |
|
35.38 |
|
9.01 |
|
|
35.17 |
KNN | 32.92 |
|
34.18 |
|
53.14 |
|
58.95 |
|
8.40 |
|
|
35.34 |
AF | 32.73 |
|
40.02 |
|
56.21 |
|
46.07 |
|
10.61 |
|
|
35.66 |
Although the experimental results in the previous section show that improvement can be obtained using the late fusion method, the question of how to choose a suitable initialization to ensure the best final performance remains unsolved. In this section, we conduct some empirical studies to determine the relationship between the initialization method and the final late fusion performance.
For each dataset, we calculate the mean NMI of different incomplete sample ratios for the late fusion of different initializations. Once this is complete, we rank the performance on each dataset to see which initialization achieve the best final performance, as shown in Table
Rank of the late fusion performance with different initializations in terms of NMI (%).
ProteinFold | Flower17 | Caltech102 | Digital | CCV | Flower102 | Rank score | Overall | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mean | Rank | Mean | Rank | Mean | Rank | Mean | Rank | Mean | Rank | Mean | Rank | |||
BS | 34.62 | 3 | 43.67 | 5 | 53.02 | 5 | 64.28 | 2 | 16.78 | 1 | 42.89 | 1 | 3.33 | 3 |
ZF | 34.18 | 5 | 44.70 | 2 | 56.22 | 3 | 52.82 | 4 | 12.80 | 5 | 37.95 | 4 | 4.83 | 4 |
MF | 34.50 | 4 | 44.65 | 3 | 56.19 | 4 | 50.26 | 5 | 12.91 | 3 | 37.91 | 5 | 5.00 | 5 |
KNN | 35.93 | 1 | 44.28 | 4 | 56.87 | 2 | 68.63 | 1 | 12.80 | 4 | 37.97 | 3 | 3.17 | 2 |
AF | 35.90 | 2 | 45.02 | 1 | 58.17 | 1 | 53.77 | 3 | 13.11 | 2 | 38.14 | 2 | 2.50 | 1 |
However, as shown in Table
Rank of the performance change with different initializations on different datasets in terms of NMI (%).
ProteinFold | Flower17 | Caltech102 | Digital | CCV | Flower102 | Rank score | Overall | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Change | Rank | Change | Rank | Change | Rank | Change | Rank | Change | Rank | Change | Rank | |||
BS | 5.31 | 1 | 14.20 | 1 | 3.78 | 1 | 18.09 | 1 | 4.93 | 1 | 6.73 | 1 | 1.00 | 1 |
ZF | 3.86 | 2 | 7.29 | 2 | 2.48 | 3 | 10.87 | 2 | 2.52 | 2 | 0.55 | 2 | 2.17 | 2 |
MF | 3.16 | 3 | 7.17 | 3 | 2.56 | 2 | 10.14 | 3 | 2.51 | 3 | 0.51 | 3 | 2.83 | 3 |
KNN | 2.11 | 4 | 5.93 | 4 | 1.76 | 4 | 4.62 | 5 | 2.30 | 4 | 0.14 | 4 | 4.50 | 4 |
AF | 1.47 | 5 | 3.02 | 5 | 0.89 | 5 | 6.47 | 4 | 1.18 | 5 | −0.17 | 5 | 5.00 | 5 |
In short, it may be impossible to find a universal best initialization technique for the proposed late fusion method. However, the empirical results allow us to draw some conclusions regarding the choice of initialization. (1) If we have a strong prior knowledge to decide which view is most important, BS may be a suitable initialization, since BS can be substantially boosted by late fusion (overall rank 1 as shown in Table
Figure
Comparison between the best late fusion and the commonly used imputation methods. (a) Performance on Caltech102. (b) Performance on CCV. (c) Performance on Digital. (d) Performance on Flower17. (e) Performance on Flower102. (f) Performance on ProteinFold.
In this section, we compare the proposed method with partial view clustering (PVC), which is a representative early fusion method proposed in [
Comparison with early fusion method. (a) Performance on Digital view 1 and view 2. (b) Performance on Digital view 1 and view 3. (c) Performance on Digital view 2 and view 3.
In this paper, we propose a novel late fusion method to learn a consensus clustering decision from the clustering results of incomplete views without imputation. To learn the consensus clustering decision, we design an alternate updating algorithm and prove its convergence theoretically. Moreover, we perform comprehensive experiments to study carefully how the initialization affects the final performance of the proposed method. Although we cannot find a best initialization for all situations, we suggest that the clustering result of the best single view is an effective initialization. With suitable initialization, the proposed method outperforms the commonly used imputation methods and a representative early fusion method.
Although the proposed method demonstrates the effectiveness of late fusion strategy in the field of incomplete multiview clustering, there are several promising directions for further research. First direction is to automatically generate the clusters without fixing the number of clusters. In many real-world applications of clustering, the number of clusters is unknown, where the proposed method cannot be applied. Instead of using kernel
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was supported by the National Key R&D Program of China (No. 2018YFB1003203), the National Natural Science Foundation of China (Nos. 61672528, 61403405, and 61702593), and Hunan Provincial Natural Science Foundation of China (No. 2018JJ3611).