Semi-Supervised Multi-View Clustering with Weighted Anchor Graph Embedding

A number of literature reports have shown that multi-view clustering can acquire a better performance on complete multi-view data. However, real-world data usually suffers from missing some samples in each view and has a small number of labeled samples. Additionally, almost all existing multi-view clustering models do not execute incomplete multi-view data well and fail to fully utilize the labeled samples to reduce computational complexity, which precludes them from practical application. In view of these problems, this paper proposes a novel framework called Semi-supervised Multi-View Clustering with Weighted Anchor Graph Embedding (SMVC_WAGE), which is conceptually simple and efficiently generates high-quality clustering results in practice. Specifically, we introduce a simple and effective anchor strategy. Based on selected anchor points, we can exploit the intrinsic and extrinsic view information to bridge all samples and capture more reliable nonlinear relations, which greatly enhances efficiency and improves stableness. Meanwhile, we construct the global fused graph compatibly across multiple views via a parameter-free graph fusion mechanism which directly coalesces the view-wise graphs. To this end, the proposed method can not only deal with complete multi-view clustering well but also be easily extended to incomplete multi-view cases. Experimental results clearly show that our algorithm surpasses some state-of-the-art competitors in clustering ability and time cost.


Introduction
In many practical applications, a growing amount of realworld data naturally appears in multiple views, which are called multi-view data, where the data may be characterized by different attributes or be collected from diverse sources. For example, an image can be described with different features, such as SIFT (Scale-Invariant Feature Transform), HOG (Histogram of Oriented Gradient), LBP (Local Binary Pattern), etc. [1]; a piece of specific news can be reported to multiple news organizations [2]; and a web page can be represented as a web page with links, texts, and images, respectively [3]. In other words, all of these objects are characterized by different characteristics, and each characteristic is referred to as one view describing the object.
Generally, an individual view has a wealth of information to execute machine learning tasks, but it ignores leveraging the consistent and complementary information from multiple views [4]. Proper use of such information has the possibility of elevating various machine learning performances. erefore, it is critical to consider how to effectively leverage such information.
Multi-view clustering, which adaptively separates data into corresponding groups by utilizing the consistency or complementarity principle among multiple views, is a very popular research direction. From the perspective of involved technologies, most of the existing literature reports are roughly classified into three types: matrix factorizationbased, graph-based, and subspace-based approaches. As Kang et al. [5] pointed out, matrix factorization-based approaches seek a common matrix among different views, and graph-based approaches explore a common affinity graph, while subspace-based approaches learn the consensus subspace with low dimension. erefore, as multi-view clustering, the key to obtaining high performance is to confirm that the optimal consistent representation is generated. To this end, multiple multi-view clustering models have been presented [6][7][8][9][10][11][12][13][14][15][16][17] and widely used in various realworld scenarios, for instance, object recognition [18], feature selection [19], information retrieval [20], etc.
One of the basic assumptions is that all views are complete, which is adopted by the aforementioned multiview clustering approaches. However, in real-world applications, it is very common that samples are missing in some views for a lot of reasons, such as man-made faults or temporary failure of the sensor.
us, previous complete multi-view methods cannot work well in this scenario since the pairwise information of samples missing some views cannot be directly used. If we want to apply conventional multi-view clustering algorithms to deal with the incomplete dataset, we can either remove the samples with incompleteness or fill incomplete samples with information during pre-processing. Nevertheless, these pre-processing methods will cause the original data to lose information or introduce noise, which makes conventional multi-view clustering methods unavoidably degrade or even fail. erefore, incomplete multi-view clustering cases have drawn increasing interest recently, and many attempts have been made to tackle this problem [2,[21][22][23][24][25][26].
Moreover, real-world data usually contains a small number of labeled samples in some practical applications. e aforementioned methods are unsupervised and cannot leverage prior information to improve the performance, which limits their application. In practice, labeled samples are available, and efficiently exploiting these data can significantly improve clustering performance and reduce clustering time consumption. Inspired by this framework, some advanced semi-supervised multi-view clustering frameworks have recently been created to perform various clustering tasks [27][28][29][30][31][32][33]. However, most of these methods learn the optimal common indicator matrix from multiple views by performing alternative optimization algorithms, which leads to high computational complexity and cannot be widely used.
In view of the above issues, we present a new framework called Semi-supervised Multi-View Clustering with Weighted Anchor Graph Embedding (SMVC_WAGE), which is conceptually simple and efficiently generates highquality clustering results in practice. SMVC_WAGE employs inherent consistency and external complementary information to seek the optimal fusion graph that spans multiple views compatibly in structure. Specifically, we apply the anchor graph learning to bridge all the intrinsic view samples, which can greatly enhance efficiency and improve stableness. Moreover, this can also solve the dilemma that samples sharing no common views cannot be directly used for computing cross-view similarities. Besides, instead of regularizing or weighting the loss of each view in a conventional way, the proposed method directly combines the graphs of different views to construct the global optimal fused graph, where the weights are learned in a nearly parameter-free manner. erefore, through exploring anchor selection strategy from labeled samples and designing the weighted fusion mechanism for multiple views simultaneously, the proposed method can not only deal with complete multi-view clustering well, but also be easily extended to the incomplete multi-view instance.
e main contributions of this paper are summarized as follows: (1) We provide a simple and effective anchor strategy.
Based on these anchor points, the proposed method can exploit the intrinsic and extrinsic view information to bridge all samples and capture more reliable nonlinear relations, which can greatly enhance efficiency and improve stableness while partitioning multi-view data into different clusters. (2) We propose a novel graph fusion mechanism that constructs the global fused graph via directly coalescing the view-wise graphs, and the procedure is nearly free of parameters. (3) We present a more general semi-supervised clustering framework that can deal with complete multiview clustering well and be easily extended to incomplete multi-view cases. (4) Experimental results on six widely used multi-view datasets clearly show that our algorithm surpasses some state-of-the-art competitors in clustering ability and time cost.
Other parts of the paper are organized as follows: Section 2 briefly reviews the related works. In Section 3, the proposed algorithm is described in detail. Afterwards, the experimental results and discussion are given in Section 4. Finally, Section 5 concludes the paper.

Related Work
In this section, we firstly make an introduction of recent progress of two specific multi-view clustering approaches. en, we briefly describe the related work of semi-supervised multi-view clustering.

Complete Multi-View Clustering.
Multi-view clustering exploits the consistent and complementary information from multi-view data to increase clustering performance and stability, which has attracted extensive attention recently. Numerous multi-view clustering models have been built. Usually, the multi-view clustering approaches assume that total samples have complete information in each view, where the samples are called complete multi-view data. Roughly speaking, in terms of related techniques, they can be mainly divided into two sections: graph-based and subspace-based methods.
Graph-based methods aim to construct the optimal fusion graph which is performed by graph-cut or other techniques to obtain the final result. Li et al. [6] developed a novel approach, named Multi-view Spectral Clustering (MVSC), which selects several uniform salient points to construct a bipartite graph that represents the manifold structures of multi-view data. Nie et al. [7] offered a new approach called Self-weighted Multi-view Clustering (SwMC), which is completely self-weighted and directly assigns the cluster label to the corresponding data point without any post-processing. Wang et al. [8] proposed a general Graph-based Multi-view Clustering (GMC), which jointly learns the graph of each view and the unified graph in a mutually enhanced manner and directly generates the final clustering result. Tang et al. [9] presented a robust model for Multi-view Subspace Clustering, which designs a diversity regularization term to enhance the diversity and reduce the redundancy among different feature views. Additionally, graph-based methods usually need to predefine graphs, and the quality of the graph largely determines the final clustering performance. e work in [10] introduced a novel model named Multi-view Clustering with Graph Learning (MVGL), which learns one global graph from different graphs constructed by all views to promote the quality of the final fusion graph. e work in [11] presented a novel method named Multi-view Consensus Graph Clustering (MCGC), which minimizes disagreement among all views and imposes a low-rank restraint on the Laplacian matrix to gain a unison graph. e study in [12] proposed a novel model called Graph Structure Fusion (GSF), which designs an objective function to adaptively tune the structure of the global graph. e work in [13] proposed a novel multi-view clustering method, which learns a unified graph via crossview graph diffusion (CGD), where the initial value entered is each predefined view-wise graph matrix. To further learn a compact feature representation, the study in [14] proposed to capture both the shared information and distinguishing knowledge across different views via projecting each view into a common label space and preserve the local structure of samples by using the matrix-induced regularization.
Subspace-based methods are widely studied; they utilize various techniques to obtain low-dimensional embedding. In general, they can efficiently reduce the dimensionality of the raw data and be easy to explain. Because of this property, the study in [15] proposed to simulate different views as different relations in a knowledge graph, which learns a unified embedding and several view-specific embeddings from similarity triplets to perform multi-view clustering. e work in [16] proposed a novel model called Latent Multi-view Subspace Clustering (LMSC), which encodes complementary information between different views to automatically learn one latent consistent representation. To decrease the computational complexity and this memory requirement, the work in [17] introduced a novel framework entitled Binary Multi-View Clustering (BMVC), which jointly learns these collaborative binary codes and binary cluster structures to perform large-scale multi-view clustering.

Incomplete Multi-View Clustering.
In practical applications, we are more likely to be provided with incomplete multi-view data. However, conventional multi-view clustering approaches unavoidably degrade or even fail while dealing with incomplete multi-view data. Recently, many works have been executed to solve this issue, which can be generally classified into matrix factorization-based and graph-based methods in terms of involved techniques.
Matrix factorization-based methods directly learn a latent consistent representation with low dimensionality from all views by utilizing the matrix factorization techniques. Li et al. [21] developed a pioneering approach called Partial multi-View Clustering (PVC), which learns a latent consistent subspace of complete samples and a private latent representation of incomplete samples by exploiting nonnegative matrix factorization (NMF) and sparsity norm regularization. Zhao et al. [22] presented a model that learns the compact global structure over the entire samples across all views by integrating Partial multi-View Clustering and graph Laplacian term. Shao et al. [23] presented the framework named Multi-Incomplete-view Clustering (MIC), which exploits weighted NMF and L 2,1 -norm regularization to learn the latent consistent feature matrix. Hu and Chen [24] proposed the approach called Doubly Aligned Incomplete Multi-view Clustering (DAIMC), which can handle negative entries through integrating weighted semi-NMF and L 2,1 -norm regularized regression. While the above approaches can deal with incomplete multi-view data, the comparatively large storage and computational complexities limit their real-world applications. Liu et al. [25] proposed a novel framework called Late Fusion Incomplete Multi-view Clustering (LF-IMVC), which simultaneously imputes each incomplete sample and learns a consistent indicator matrix.
Graph-based methods focus on learning the low-dimensional representation from each graph which is constructed by each view and uncover the relationships between all samples. Wen et al. [26] introduced a general framework, which learns the low-dimensional representations from all views via exploiting spectral constraint and coregularization term. Guo and Ye [2] proposed a new algorithm named Anchor-based Partial Multi-view Clustering (APMC), which integrates the intrinsic and extrinsic view information into the fused similarities via anchors; then, the unified clustering outcome can be achieved by performing spectral clustering on the fused similarities.

Semi-Supervised Multi-View
Clustering. Semi-supervised multi-view clustering, which uses a small proportion of labeled samples as well as a great number of unlabeled samples to perform clustering, is one of the hottest research directions in machine learning. As the most popular technique in the area of semi-supervised multi-view clustering, graph-based methods construct a graph, where vertices contain unlabeled and labeled data and edges reflecting the similarity of vertices spread information from labeled to unlabeled vertices. inking of each kind of feature as a modality, Cai et al. [27] proposed an algorithm named Adaptive Multi-Modal Semi-Supervised classification (AMMSS), which jointly learns the weight and the commonly shared class indicator matrix. Karasuyama and Mamitsuka [28] proposed a new method called Sparse Multiple Graph Integration (SMGI), which linearly combines multiple graph Laplacian matrices with sparse weights for label Computational Intelligence and Neuroscience propagation. Nie et al. [29] presented a new framework called Auto-weighted Multiple Graph Learning (AMGL), which automatically learns a set of optimal weights without any parameters. Nie et al. [30] presented a novel model named Multi-view Learning with Adaptive Neighbors (MLAN), which directly partitions the final optimal graph into corresponding groups and the process only has the parameter for the robustness. To take advantage of the information in multi-view data, Nie et al. [31] proposed a new model called Adaptive MUlti-view SEmi-supervised (AMUSE), which obtains a more suitable unified graph for semi-supervised learning via imposing a structural regularization term constraint. Aiming at the incomplete multi-view issue, Yang et al. [32] proposed a novel framework called Semi-supervised Learning with Incomplete Modalities (SLIM). It employs the inherent modal consistency to learn discriminative modal predictors and performs clustering via the external complementary information of unlabeled data. However, graph-based approaches do not always make sure whether the final representation has the same label as the raw data. Cai et al. [33] introduced a new semi-supervised Multi-View Clustering method based on Constrained Nonnegative Matrix Factorization (MVCNMF). It propagates the label information to a consistent representation via exploiting matrix factorization techniques.

Proposed Method
In this section, we elaborate our simple yet effective approach called Semi-supervised Multi-View Clustering with Weighted Anchor Graph Embedding (SMVC_WAGE), which provides a general framework for semi-supervised multi-view clustering. Specifically, SMVC_WAGE firstly provides a simple and effective anchor strategy that exploits the intrinsic and extrinsic view information to bridge all samples and capture more reliable nonlinear relations. en, the proposed method learns the weight for each view via utilizing the seed-based semi-supervised K-Means and the designed mathematical techniques to seek the optimal fusion graph that spans multiple views compatibly in structure. Ultimately, spectral clustering is conducted on the global fused graph to obtain a unified clustering result. To this end, in the following, we describe the notation and problem definition firstly and then introduce the Semi-supervised K-Means based on Seed for single-view clustering. irdly, we propose SMVC_WAGE for solving both complete and incomplete multi-view clustering.

Notation and Problem Definition
(1) Notations. Except in some specified cases, italic, not bold letters (v, V, . . . , ) represent scalars. Bold uppercase letters (X, . . . , ) denote matrices, while bold lowercase letters (x, . . . , ) are vectors. I is an identity matrix with an appropriate size, and 1 is an all-one vector with a compatible length. (2) Definition. As multi-view data, each sample is characterized by multiple views with one unified label. Assume that we are provided with a dataset X � X (1) , X (2) , . . . , X (V) composed of N samples from the V views in K clusters, in which Multi-view clustering aims to classify all samples into K batches via utilizing the consistent and complementary information from multi-view data, where K is assumed to be predefined by users.

Semi-Supervised K-Means Based on Seed.
e proposed method performs spectral clustering on the global fused graph to obtain a unified clustering result whereas K-Means clustering is the important component of spectral clustering. Additionally, for our method, the seed-based semi-supervised K-Means is the key step to learn the weights from multiple views. erefore, it is necessary to review Semisupervised K-Means based on Seed.
Without any loss of generalization, we assume a singleview data matrix X � [x 1 ; x 2 ; . . . ; x n ] ∈ R n×d , where X can be acquired from the above-mentioned multi-view data. Suppose that the single-view data matrix X is categorized into K clusters C l K l�1 . In a semi-supervised single-view clustering framework, we customarily collect a small amount of labeled data X S , termed the seed set X S ⊆X, through prior knowledge, and we suppose that, for each cluster C l ⊆X, there is typically at least one seed point x i ∈ X S . Note that we take a disjoint K partitioning X S,l K l�1 of the seed set X S , so that x i ∈ X S,l belongs to C l . In semisupervised K-Means, the seed set X S is utilized to initialize the K-Means approach.
us, the centroid of the l-th cluster C l is initialized with the mean of the l-th partition X S,l ; then, the semi-supervised K-Means objective function can be written as where x i ∈ R d is the i-th sample x i from the single-view data matrix X, u l ∈ R d is the mean of the l-th partition X S,l , and δ is the Dirac delta function. Furthermore, u l and δ(x i − u l ) can be defined as the following equations, respectively.
where |X S,l | is the number of samples in X S,l . rough further analysis of K-Means objective function equation (1), its optimal solution is an NP-hard problem [34]. However, the objective function is quickly locally minimized and converges to a local optimum by using the efficient iterative relocation algorithms [35]. 4 Computational Intelligence and Neuroscience

Anchor-Based Global Fused Similarity Matrix Construction in Multi-View Data.
In recent years, some studies [2,36,37] apply an anchor-based scheme to form the similarity matrix S. Generally, the anchor-based scheme mainly consists of two steps. e first step is that m anchor points can be searched from the raw data, where m ≪ n. e second is that a matrix Z ∈ R n×m is designed to measure the similarity between anchor points and data points. ere are two common methods for anchor point generation: random selection and K-Means method. Random selection is to extract a portion of data as anchor points via adopting random sampling from original data. Although the random selection strategy saves time, it cannot ensure that the selected anchor points are always good, which makes the results neither ideal nor stable. K-Means approach utilizes the clustering centroids as anchor points, which makes the chosen anchors more representative in comparison with random selection. Nevertheless, an inevitable problem is that K-Means is sensible to its origin centroid. To eliminate this problem, the K-Means method requires numerous independent and repeated running. For this reason, exploiting the K-Means as a pre-processing or post-processing framework is also unpredictable and has computational complexity. Considering that several real samples may have the label in practice and real samples that belong to the same cluster have similar statistical characteristics, while samples belonging to different clusters have greater differences in statistical characteristics, we can obtain the seed set X S � X (1) S , . . . ; X (v) S,K ] ∈ R q×d v denotes the seed set in the v-th view with K clusters, y � [y 1 ; y 2 ; . . . ; y q ] ∈ R q denotes the corresponding label vector, and q � |X (v) S | denotes the number of labeled samples. en, the mean of each partitioning in the seed set X S can be chosen as anchor points.
Specifically, the generated anchor points set in the v-th view can be represented as can be obtained according to (2). en, the similarity between data point x (v) i and anchor point u (v) l is defined as where is usually adopted. e parameter σ can be set to 1 without loss of generality.
For multi-view clustering, there is a common assumption that it can increase clustering performance and stability via appropriately exploiting the consistent and complementary information between different views. Based on this assumption, how to seamlessly combine multiple views is crucial to the final clustering result. Considering the differences in the clustering quality of each view, we first calculate the clustering accuracy of each view through the prior information and then obtain the weights for different views, where the view with greater clustering accuracy has larger weight during information fusion, and similarly the view with less clustering accuracy has a smaller weight. More specifically, we utilize the semi-supervised K-Means to acquire clustering result Note that c (v) and y are the cluster labels and the ground-truth labels of the seed set X (v) S , respectively, and then we calculate the clustering accuracy of each view by (17) in the seed set X S . Furthermore, to ensure that the view with greater clustering accuracy has a larger weight, we apply the softmax function to acquire the weights for different views. e weights of the views can be represented by where w v is the non-negative normalized weight for the v-th view and the sum of all elements of w is 1, a v is the clustering accuracy for the v-th view in the seed set X (v) S , and λ is a scalar used to control the distribution of weights between different views. e truncated similarity matrix Z (v) can be obtained by (4), and then all truncated similarity matrices are integrated into a global truncated similarity matrix Z ∈ R N×K between all samples and anchors.
Once we obtain the matrix Z, the global fused similarity matrix S ∈ R N×N between all samples can be approximated by an anchor graph [36].
where Λ � diag(Z T 1) ∈ R K×K is the diagonal matrix.

Spectral Analysis on Global Fused Similarity Matrix.
To further simplify the clustering process, spectral clustering can be performed on the global fused similarity matrix S. Specifically, the objective function of spectral clustering is where tr(·) is the matrix trace operator, F ∈ R N×K is the indicator matrix, and K is the number of clusters. e Laplacian matrix L is defined as L � D − S in graph theory, where the degree matrix D ∈ R N×N is written as a diagonal matrix with D ii � N j�1 S ij . We can obtain the indicator matrix F that consists of eigenvectors corresponding to the largest K eigenvalues by performing eigen decomposition on L. However, the computational complexity is O(N 2 K) via performing eigen decomposition on L, which leads to being not suitable for large-scale data.
Computational Intelligence and Neuroscience 5 Fortunately, according to [2,37], S is a double stochastic matrix. us, the degree matrix D � diag(S1) is an identity matrix I, and the Laplacian matrix L can be written as L � I − S. To make the analysis simple, (8) is equivalent to the following equation: Note that S can be written as e Singular Value Decomposition (SVD) of A can be formulated as where U ∈ R N×N , Σ ∈ R N×K , and V ∈ R K×K are the left singular vector matrix, singular value matrix, and right singular vector matrix, respectively. Furthermore, U and V satisfy both U T U � I and V T V � I. us S can be derived from (10) as It is obvious that the column vectors of U are the eigenvectors of the similarity matrix S. To reduce the computational complexity, we prefer to conduct SVD on A to acquire the desired F rather than to directly perform eigenvalue decomposition on S. Based on this, (10) is written as Since Σ and V are the singular value matrix and right singular vector matrix of A, respectively, we can perform eigen decomposition on a small a column-orthonormal matrix containing the K eigen vectors and by Θ � diag(θ 1 , . . . , θ K ) ∈ R K×K a diagonal matrix storing the K eigen values on the main diagonal. It is obvious that where Σ T Σ returns a K × K diagonal matrix storing all the eigen values of R � A T A. us, the singular value matrix Σ can be derived as Σ � Θ 1/2 . en, the final solution can be simplified as where F ∈ R N×K is the indicator matrix. After that, we can perform semi-supervised K-Means on F to acquire the final results. e whole procedure of SMVC_WAGE for complete multi-view data is summarized in Algorithm 1.

e Proposed Method for Incomplete Multi-View Data.
Our proposed method (SMVC_WAGE) can not only deal with complete multi-view clustering well, but also be easily extended to incomplete multi-view clustering. To simplify the incomplete multi-view case, we take three views as an example, which verifies that SMVC_WAGE can be straightforwardly extended to the scenarios of incomplete multi-view data. Similar to the problem definition in Section 3.1, we still assume that the incomplete three-view data consists of N samples. In order to make the discussion easy without losing generality, we follow [2] to adjust the original dataset to X � X (1,2,3) , X (1,2) , X (1,3) , X (2,3) , ×d 2 , and X (3) ∈ R n 3 ×d 3 denote the samples present in the three views, both view-1 and view-2, both view-1 and view-3, both view-2 and view-3, only view-1, only view-2, and only view-3, respectively. Similarly, n c is the number of samples described by the three views. n 12 denotes the number of samples shared by both view-1 and view-2; n 13 and n 23 have the same meaning. n v (v � 1, 2, 3) stands for the number of samples only existing in the v-th view. e total number of samples is N � n c + n 12 + n 13 + n 23 + n 1 + n 2 + n 3 .
As stated in Section 3.3, the proposed method (SMVC_WAGE) for incomplete multi-view data mainly consists of two steps, i.e., construction of anchor-based global fused similarity matrix S and spectral analysis of a global fused similarity matrix. Figure 1 shows the whole construction process of the global fused similarity matrix, and all possible cases are considered in incomplete threeview data, i.e., missing two views, missing one view, and missing no view.
It is very challenging to randomly choose from labeled data to generate anchor points in incomplete multi-view data, as some labeled samples miss one view or two views, and thus pairwise information may be unavailable. Fortunately, the common samples appearing in all views can help generate anchor points to solve the dilemma. Based on the above analysis, we assume that all labeled samples covering each cluster are included in common samples; then, we can obtain the seed set X S � X (1,2,3) S , y from the common samples with the label, where X (1,2,3) S ∈ R q×(d 1 +d 2 +d 3 ) denotes the seed set present in all views, y � [y 1 ; y 2 ; . . . ; y q ] ∈ R q denotes the corresponding label vector, and q represents the number of samples in the seed set.
en, as stated in Section 3.3.1, the generated anchor points set in the v-th view can be represented as can be obtained according to (2).
As illustrated in the second column in Figure 1, we partition this incomplete three-view case into three scenarios. Specifically, we rearrange the samples according to the characteristics of each sample so that we can directly perform the anchor-based truncated similarity matrix construction method described in Section 3.3.1 on each scenario. Each scenario can be represented as a view and the view's anchor points, where missing samples are removed. Taking the first scenario as an example, there are n c + n 12 + n 13 + n 1 samples that appeared in view-1, and n k anchor points are generated from the seed set X (1,2,3) S .
en, we construct an anchor-based truncated similarity matrix Z (1) ∈ R (n c +n 12 +n 13 +n 1 )×n k by (4). Similarly, we can analyze other scenarios. 6 Computational Intelligence and Neuroscience To fuse the above truncated similarity matrices that appeared in three scenarios appropriately, we reorder them into aligned matrices, with rows and columns following the order of the original samples. To fully exploit the consistent and complementary information among different views, we make the view with high quality have a larger weight ratio in the common representation by employing the prior knowledge from multi-view data. More specifically, we first obtain the clustering accuracy a v of each view in the seed set X S � X (1,2,3) S , y and apply softmax function to acquire the weight w for different views as mentioned in Section 3.3.1.
en, we obtain the global truncated similarity matrix Z ∈ R N×n k according to the weighted combining scheme by (6). Finally, we acquire the global fused similarity matrix S ∈ R N×N by (7) as Figure 1 shows. According to Section 3.3.2, as a final step, spectral clustering is performed on the global fused similarity matrix S to acquire a unified clustering result. e whole procedure of SMVC_WAGE for incomplete three-view data is summarized in Algorithm 2.

eoretical Analysis of the Proposed Algorithm.
In this section, we provide a brief theoretical analysis of the proposed algorithm, containing computational complexity analysis and convergence analysis.

Computational Complexity Analysis.
e computational complexity of the proposed algorithm mainly consists of five parts, i.e., calculating U (v) , Z (v) , w v , F, and the final clustering results. In Algorithm 1, the corresponding computation is in steps 3, 4, 6, 9, and 10, where the number of anchor points and clusters, expressed in K, is equal for each view. Specifically, computation complexity of these steps is summarized as follows:
(3) e number of clusters K; the trade-off parameter λ.
Procedure (1) Initialize the trade-off parameter λ � 10 and the width parameter σ � 1 in Gaussian kernel function.
(2) Generate the seed set X S � X (1) S , (2). (4) Construct the truncated similarity matrix Z (v) ∈ R N×K for each view by (4). (5) Calculate clustering accuracy a v of each view by (1) and (17) in the seed set X S . (6) Acquire the weight w v for different views by (5).
Note that the dataset's view number V ≪ N, clusters or anchor points number K ≪ d and K ≪ N, and the number of labeled samples q depends on the samples number N and the percentage of labeled data ξ. Since we exploit semi-supervised K-Means to obtain the clustering result, t v and t are usually small [38].
Compared with Algorithm 1, the main difference of Algorithm 2 is to deal with incomplete multi-view data.
erefore, similar to the Algorithm 1, the total main computational complexity of Algorithm 2 is (16) where N v denotes the number of non-missing samples in the v-th view. According to the above analysis, in order to further simplify the representation, the overall computational complexity of SMVC_WAGE is O(Nd + qd + N), where d � max(d 1 , d 2 , . . . , d V ). In addition, the experimental results of running time have also proven the computational advantages of SMVC_WAGE.

Convergence Analysis.
Firstly, the whole procedure of SMVC_WAGE just exploits the semi-supervised K-Means to calculate the optimal clustering result in an iterative manner, where the strong convergence property of semisupervised K-Means has been proven in [38,39]. Secondly, by calculating (13) performing eigen decomposition, indicator matrix F can obtain the global optimal solution [9].
irdly, the experimental result of convergence study can also demonstrate the strong convergence of SMVC_WAGE. In summary, the proposed method has good convergence property.

Experiments
In this section, extensive experiments are performed to evaluate the performance of our method (SMVC_WAGE). Firstly, we describe six multi-view datasets used in the experiment. Secondly, we introduce the comparative methods and evaluation metrics. Ultimately, the comparison results show the proposed method's effectiveness and efficiency.

Input:
(1) Given the incomplete three-view data X � X (1) , X (2) (2) Given q labeled samples appearing in all views and covering K clusters; the corresponding label vector y∈ R q .
(3) e number of clusters K; the trade-off parameter λ.
(3) Generate seed set X S � X (1,2,3) S , y , which contains q labeled samples appearing in three views, where  (1) and (17) in the seed set X S . (9) Acquire weight w for different views by (5). ALGORITHM 2: e proposed SMVC_WAGE for incomplete three-view data. 8 Computational Intelligence and Neuroscience

Datasets Description.
Six real-world multi-view datasets are adopted to validate our method. Among these datasets, the first two are text datasets, and the other four are image datasets. ey are widely used benchmark datasets. e descriptions of these datasets are given below, and some important statistical information is presented in Table 1.
(1) Cornell (http://lig-membres.imag.fr/grimal/data. html): this text dataset is one of the popular WebKB datasets [3,26]. It includes 195 documents with more than 5 labels: student, project, course, staff, and faculty, where each document is characterized by two views: the citation view and the content view, i.e., 195 citation features and 1703 content features.
(2) 3Sources (http://erdos.ucd.ie/datasets/3sources. html): this text dataset is naturally an incomplete multi-view dataset [2] and is collected from three well-known online news sources: BBC, Reuters, and e Guardian. In total, it contains 948 news articles covering 416 distinct news stories, which are categorized into six topical labels: business, entertainment, health, politics, sport, and technology. Among these distinct stories, 53 appear in a single news source, 194 are in two sources, and 169 are reported in all three sources.

Compared Methods and Experimental Settings.
Our proposed method solves the problem of complete and incomplete multi-view clustering. us, to prove the efficiency and effectiveness of this framework, we choose Spectral Clustering [44] and three multi-view methods to compare the performance of complete multi-view clustering: MVSC [6], AMGL [29], and MLAN [45]. Similarly, we compare the Spectral Clustering [44], PVC [21], IMG [22], DAIMC [24], IMSC_AGL [26], and APMC [2] for incomplete multi-view clustering. We denote the proposed method as SMVC_WAGE. e description of these methods is given as follows: (1) SC: we perform Spectral Clustering (SC) [44] on all views independently as the baseline.  [6] constructs a bipartite graph and then uses local manifold fusion to integrate the graph of each view into a fused graph. Finally, Spectral Clustering is performed on the fused graph to obtain the result. (4) AMGL: Auto-weighted Multiple Graph Learning (AMGL) [29] is a Spectral Clustering-based method and is easily extended to semi-supervised multiview clustering. It automatically learns a set of optimal weights without any parameters. (5) MLAN: Multi-view Learning with Adaptive Neighbors (MLAN) [45] is a graph-based multiview learning model and calculates the ideal weights automatically after finite iterations. It can perform local manifold structure learning and semi-supervised clustering simultaneously. (6) PVC: Partial multi-View Clustering (PVC) [21] works based on non-negative matrix factorization to acquire a consistent representation. Lastly, K-Means is performed on the consistent representation to acquire the result. For comparison methods, the source codes are available from the authors' websites. Since the 3Sources dataset is a naturally incomplete multi-view dataset, we utilize it for incomplete multi-view clustering and conduct complete multi-view clustering on the other datasets. We select the best two views from the 3Sources dataset as the input of PVC and IMG, because they cannot work on more than two-view scenario. Since SC cannot directly deal with incomplete multiview data, we first populate the missing information with the mean of the feature values in the corresponding view. Empirically, the number of nearest neighbors accounts for 10% of the dataset size. Since all the comparison methods conduct K-Means clustering on the latent consistent representation, we set the maximum number of iterations to 1000 for K-Means clustering. Considering the limitation of the comparison methods, we firstly learn a latent consistent representation of the raw data and then use labeled data to generate seed clusters that are utilized to initialize the cluster centroids of semi-supervised K-Means. Furthermore, to make the experiments more conclusive and fair, the parameters of each method are initialized, being corresponding to the paper's report, and present the final result of SMVC_WAGE with the trade-off parameter λ � 10 and the width parameter σ � 1 in Gaussian kernel function. In terms of semi-supervised clustering, for all datasets, we randomly choose a small proportion as labeled data in each category, where the proportion is denoted by ξ (10%, 20%, 30%, 40%). To randomize the experiment, we run each method 20 times with different random initialization to record the mean performance as well as the standard deviations in all experiments. Due to different parameter ranges and preprocessing, some of the results may be inconsistent with the published information.

Evaluation Metrics.
ere are many evaluation metrics for assessing the clustering performance [46]. In our experiments, we choose three evaluation metrics, namely, Clustering Accuracy (ACC), Normalized Mutual Information (NMI), and Purity, to conduct a comprehensive evaluation.
ese evaluation metrics can be calculated in a certain framework through the clustering result and the ground-truth of the dataset. e first evaluation metric is ACC, usually defined as follows: where n means the number of samples, y i means the groundtruth label of the i-th sample, c i means the corresponding cluster label calculated, δ means the Dirac delta function: and map(·) is the optimal mapping function that arranges the cluster labels to match the ground-truth labels via the Kuhn-Munkres algorithm [47]. e second evaluation metric is NMI, which integrates mutual information and entropy. NMI is formulated as follows: where I(y i , c i ) denotes the mutual information between y i and c i , and E(·) returns the entropy. Let n c i be the number of samples in cluster C i (1 ≤ i ≤ k) which is acquired via performing clustering methods, and n y j be the number of samples belonging to cluster Y j (1 ≤ j ≤ k) with the ground-truth label. en, NMI is rewritten as NMI � where n i,j means the number of samples in the intersection between C i and Y j .  Bold numbers denote the best results.
Computational Intelligence and Neuroscience       e third evaluation metric is Purity which measures the effectiveness of clustering by calculating the percentage of correct labels. Purity is defined by For the three evaluation metrics, a higher value indicates a better performance. e readers can refer to [48] to get more details about their definitions.

Complete Multi-View Clustering Results.
To explore the effectiveness of our method, these complete multi-view methods are performed on five complete multi-view datasets with different percentages of labeled data, where the experimental results are enumerated in Tables 2-6 in the form of ACC, NMI, and Purity. rough the analysis of these tables, we can get some observations as follows: (1) From Tables 2-6, we can see that the clustering performances are quite different in single-view clustering scenarios for all multi-view datasets. is is mainly because each view has a difference in the feature scales and distributions. e experimental results also imply that it is necessary to research how to appropriately combine multiple views to enhance the clustering performance.
(2) From Tables 2-6 and Figures 2(a)-2(e), we can find that the proposed SMVC_WAGE can obtain much better results than the best single view and concat for all scenarios. Meanwhile, we can see that concat performs the worst in most instances, mainly because directly concatenating views into a long view may lead to redundant information, resulting in poor clustering results. us, these experiment results demonstrate that clustering performance can be effectively improved via properly exploiting the consistent and complementary information to learn a common representation.
(3) From Tables 2-6, we can see that the proposed SMVC_WAGE outperforms all competitors such as MVSC, AMGL, and MLAN while dealing with most of the complete multi-view clustering. is is mainly because SMVC_WAGE can not only fully exploit the intrinsic consistency and extrinsic complementary information across different views, but also make the high-quality single view has a larger weight ratio in the common representation by utilizing the prior information in the multi-view data. ese experimental results prove that our method is effective in complete multi-view clustering. (4) From Tables 2-6, we observe that the performance of the above methods first rises to high value and then maintain slight variation as the number of labeled data increases. For the proposed SMVC_WAGE, with 30% or 40% labeled data, the method always obtains the best result. Meanwhile, with 10% or 20% labeled data, our method obtains slightly worse results. e main reason is that our method heavily depends on how to construct the graph through prior information. us, we cannot generate the structure of the graph optimally when there is less labeled data, leading to slightly worse results.

Incomplete Multi-View Clustering Results.
To explore the effectiveness of the presented SMVC_WAGE in dealing with the incomplete multi-view data, we conduct experiments on the naturally incomplete 3Sources dataset, where the missing rate of each view is 16%, 28%, and 30%, respectively. e results are recorded in Table 7 and Figure 2(f ). Similar to the complete multi-view clustering, the above comparison results show that the performance of the proposed SMVC_WAGE is significantly superior to all the compared methods on the 3Sources dataset with a different percent of labeled data. us, our method can deal with incomplete multi-view clustering well. e above experimental results on Cornell, UCI Handwritten Digit, ORL, NUS-WIDE-OBJECT, MSRC-v1, and 3Sources have well proven that the presented SMVC_WAGE outperforms most algorithms in terms of clustering ability. e main reason is that SMVC_WAGE   Figure 3: e average running time (seconds) of methods mentioned above on each dataset. In the figure, to ensure visual discernibility, the running time of maximum display is 5 seconds in the coordinate axis, and the specific value can be seen in Table 8: (a) the five complete datasets: Cornell, Digit, ORL, NUS, and MSRC-V1; (b) the incomplete dataset: 3Sources dataset.

Running Time.
e running time was recorded to compare the computational complexity of the methods on all datasets. From Table 8, it is clear that the proposed SMVC_WAGE has the shortest running time in almost all datasets except MSRC-v1. Meanwhile, as shown in Figure 3, in which the original data is from Table 8, we see that the running time of SMVC_WAGE is many times smaller than the above-mentioned multi-view clustering algorithms on all datasets, especially the UCI Handwritten Digit, ORL, NUS-WIDE-OBJECT, and 3Sources dataset. is is mainly because these datasets have a relatively large number of views and samples, and the data quality of each view varies greatly. In summary, the experimental results have fully proven the computational advantages of SMVC_WAGE.

Parameter Sensitivity Analysis.
Our proposed SMVC_WAGE has only one hyperparameter λ, which trades off the weight of each view. In the following, the parameter analytical experiments are performed on each dataset to reveal the effect of this parameter. We first set the percentage of labeled data ξ from 10% to 40% as mentioned before; then, we explore the ACC of SMVC_WAGE by ranging the λ within 0.01, 0.1, 1, 10, 100 { } and record the average performances. As shown in Figures 4(a) and 4(d), we observe that, with λ increasing from 0.01 to 100, the mean of ACC with fixed ξ first increases to high value and then decreases. Regarding Figures 4(b)-4(f), similarly, we observe that the result of SMVC_WAGE increases first and then maintains slight variation. erefore, SMVC_WAGE can obtain a stable great performance across a wide range of λ. Obviously, the performance keeps optimal in λ � 10. ese experiments have fully demonstrated that SMVC_WAGE is not so sensitive to the variation of the hyperparameter λ in the final results.

Convergence Study.
To investigate the convergence empirically, we record ACC of SMVC_WAGE in every iteration on each dataset where we set the percentage of labeled data ξ � 10% and the hyperparameter λ � 10, respectively. For a full iteration, SMVC_WAGE firstly calculates the clustering accuracy for all views via performing semi-supervised K-Means in order to obtain the global fused similarity matrix. In this process, the final ACC is not calculated, but it will consume some time. Without loss of generality, we will use semi-supervised K-Means as an iteration each time, while recording the final ACC. We plot ACC in Figure 5. For each subfigure, we can see that the value of the ACC is zero in the first multiple iterations at the beginning because our algorithm uses the prior information of the data, and after a finite number of iterations, the ACC begins to increase and gradually stabilize. Moreover, it reveals that SMVC_WAGE usually converges within 50 iterations for all datasets, which empirically proves the high efficiency of our algorithm.

Conclusion
In this paper, a new semi-supervised multi-view clustering framework is developed, which is conceptually simple and efficiently generates high-quality clustering results in practice. Specifically, our method introduces a simple and effective anchor strategy that exploits the intrinsic and extrinsic view information to bridge all samples and capture more reliable nonlinear relations, which can greatly enhance efficiency and improve stableness. Besides, this can also solve the dilemma that samples sharing no common views cannot be directly used for computing cross-view similarities. Meanwhile, instead of regularizing or weighting the loss of each view in a conventional way, the proposed method constructs the global fused graph that spans multiple views compatibly in the structure via a parameter-free graph fusion mechanism which directly coalesces the view-wise graphs. To this end, the proposed method can not only deal with complete multi-view clustering well, but also be easily extended to the incomplete multi-view instance. Experimental results on six widely used real-world datasets clearly show that our proposed algorithm is superior to some stateof-the-art competitors in clustering ability and time cost.
When handling incomplete multi-view clustering, we found that the main limitation of this approach may be that anchor points can only be generated from common samples appearing in all views, which remains to be further studied.

Data Availability
Six publicly available benchmark multi-view datasets are utilized: the Cornell dataset, 3Sources dataset, UCI Handwritten Digit dataset, ORL dataset, NUS-WIDE-OBJECT dataset, and MSRC-v1 dataset. All the multi-view datasets' homepages are listed in this paper.

Conflicts of Interest
e authors declare that they have no conflicts of interest.