Robust Semisupervised Nonnegative Local Coordinate Factorization for Data Representation

Obtaining an optimum data representation is a challenging issue that arises in many intellectual data processing techniques such as data mining, pattern recognition, and gene clustering. Many existing methods formulate this problem as a nonnegative matrix factorization (NMF) approximation problem. The standard NMF uses the least square loss function, which is not robust to outlier points and noises and fails to utilize prior label information to enhance the discriminability of representations. In this study, we develop a novel matrix factorization method called robust semisupervised nonnegative local coordinate factorization by integrating robust NMF, a robust local coordinate constraint, and local spline regression into a unified framework. We use the l2,1 norm for the loss function of the NMF and a local coordinate constraint term to make our method insensitive to outlier points and noises. In addition, we exploit the local and global consistencies of sample labels to guarantee that data representation is compact and discriminative. An efficient multiplicative updating algorithm is deduced to solve the novel loss function, followed by a strict proof of the convergence. Several experiments conducted in this study on face and gene datasets clearly indicate that the proposed method is more effective and robust compared to the state-of-the-art methods.


Introduction
Owing to the rapid development of data collection and storage techniques, there has been an increase in the demand for effective data representation approaches [1] to cope with image and gene information, particularly in the fields of pattern recognition, machine learning, and gene clustering.For large databases, an efficient representation of data [2][3][4] can improve the performance of numerous intelligent learning systems such as those used for classification and clustering analysis.In many application fields, the input samples are represented in high-dimensional form, which is infeasible for direct calculation.The efficiency and effectiveness of learning models exponentially decrease with each increase in the dimensionality of input samples, which is generally referred to as the "curse of dimensionality."Accordingly, dimensionality reduction [5][6][7] is becoming increasingly important as it can overcome the curse of dimensionality, enhance the learning speed, and even offer critical insights into the essence of the issue.In general, dimensionality reduction methods can be divided into two categories: feature extraction [5,8,9] and selection [10][11][12][13][14]. Feature selection involves selecting discriminative and highly related features from an input feature set, whereas feature extraction combines original features to form new features of data variables.
In recent years, there has been an increasing interest in feature extraction.Many feature extraction methods are designed to obtain a low-dimensional feature of highdimensional data.These methods include singular value decomposition (SVD), principal component analysis (PCA) [5], nonnegative matrix factorization (NMF) [15,16], and concept factorization (CF) [17].Despite the different motivations of these models, they can all be interpreted as matrix decomposition, which often finds two or more lowdimensional matrices to approximate the original matrix.Factorization leads to a reduced representation of highdimensional data and belongs to the category of methods employed for dimension reduction.
Unlike PCA [5] and SVD, NMF [15,16] factorizes a sample matrix as a product of two matrices constrained by nonnegative elements.One matrix comprises new basis vectors that reveal the semantic structure, and the other matrix can be regarded as the set of coefficients composed of linear combinations of all sample points based on the new bases.Owing to their ability to extract the most discriminative features and their feasibility in computation, many extension versions [4,18,19] of NMF have been developed from various perspectives to enhance the original NMF.Sparsenessconstrained NMF [20] has been introduced by adding l 1 norm minimization on the learned factor matrices to enhance sparsity for data representation.Fisher's criterion [21] has been incorporated into NMF formulation and is used to achieve discriminant representation.The semi-and convex-NMF formulations [22] relax the nonnegativity constraint of NMF by allowing the basis and coefficient matrices to have mixed signs, thereby extending the applicability of the method.Liu et al. [23] proposed a constrained NMF in which the label information is incorporated into the standard NMF for data representation.Cai et al. [24] extended NMF and proposed a graph-regularized NMF (GNMF) scheme, which imposes intrinsic geometry latent in a highdimensional dataset onto the traditional NMF using an affinity graph.Chen et al. [9] presented a nonnegative local coordinate factorization (NLCF) method that imposes locality constraint onto the original NMF to explore faithful intrinsic geometry.
Traditional NMF and its variants usually adopt the square Euclidean distance to measure the approximation error.Although it has a solid theoretical foundation in mathematics and has shown encouraging performance in most cases, the square Euclidean distance is not always optimal for decomposition of a data matrix.The squared error has proved to be the best for both Gaussian and Poisson noise [25].However, in real-world applications, data that violate the assumptions are usually involved.The squared loss is sensitive to outlier points and noises when the reconstruction error is measured.Even a single outlier point may sometimes easily dominate the objective function.In recent years, some variants have been presented to enhance the robustness of the classical NMF.A robust type of NMF that factorizes the sample matrix as the summation of two nonnegative matrices and one sparse error matrix was presented by Zhang et al. [26].Zhang et al. [27] presented a robust NMF (RNMF) using the l 2,1 norm objective function, which can deal with outlier points and noises.Zhang et al. [28] presented a robust nonnegative graph-embedding framework (RNGE) that can simultaneously cope with noisy labels, noisy data, and uneven distribution.
Supervised learning algorithms [29][30][31][32] generally can achieve better performance than unsupervised learning techniques when label information is available in many applications.The motivation of semisupervised learning methods [33][34][35][36][37][38] is to employ numerous unlabeled samples as well as relatively few labeled samples to construct a better highdimensional data analysis model.A surge of research interest in graph-based semisupervised learning techniques [37][38][39] [40] has recently occurred.Gaussian fields and harmonic functions (GFHF) [33] is an efficient and effective semisupervised learning methods in which the predicted label matrix is reckoned on the graph with respect to manifold smoothness and label fitness.Xiang et al. [37] presented a method called local spline regression (LSR) in which an iterative algorithm is built on local neighborhoods through spline regression.Han et al. [38] presented a model of video semantic recognition using semisupervised feature selection via spline regression (S2FS2R).These methods not only consider label information but also employ the local and global structure consistency assumption.Despite NMF's appealing advantages, it suffers from the following problems in real-world applications: (1) data may often be contaminated by noise and outliers due to illumination (e.g., specular reflections), image noises (e.g., scanned image data), occlusion (e.g., sunglasses and scarf in front of a face), among others.Although NMF can deal with noise in the test data to some extent, it will suffer from severe performance degradation when the training samples have noise.
(2) In an NMF method, a data point may be represented by the base vectors, which are far from the data point, resulting in poor clustering performance.The standard NMF does not preserve the locality during its decomposition process, whereas local line coding can preserve such properties.(3) One of the challenges for classification tasks in the real world is the lack of labeled training data.Therefore, data labeled by an expert is often used as an alternative.Unfortunately, designating labels requires considerable human effort and is thus time-consuming and difficult to manage.In addition, an accurate label may require expert knowledge.However, unlabeled samples are relatively easy to obtain.
To address all the aforementioned issues, we present an efficient and effective matrix factorization framework called robust semisupervised nonnegative local coordinate factorization (RSNLCF) in which both data reconstruction functions and a local coordinate constraint regularization term are formulated in a l 2,1 norm manner to make our model robust to outlier points and noises.By integrating Green's functions and a set of primitive polynomials into the local spline, the local and global label consistency of data can be characterized based on their distribution.The main work of our study and its contributions are summarized as follows: (i) The proposed RSNLCF model is robust to outlier points and noises as a result of employing the l 2,1 norm formulations of NMF and a local coordinate constraint regularization term.In addition, to guarantee that the data representation is discriminative, local spline regression over labels is exploited.
(ii) Unlike traditional dimension reduction approaches that treat feature extraction and selection separately, the proposed RSNLCF algorithm integrates the two aspects into a single optimization framework.
(iii) We present an efficient algorithm to solve the presented RSNLCF model and provide the proof of rigorous convergence and correctness analysis of our model.

Complexity
The remainder of this paper is organized as follows.Related studies are introduced in Section 2. We introduce our RSNLCF method and the optimization scheme in Section 3 and offer a convergence proof in Section 4. We describe and analyze the results of our experiments in Section 5. We conclude and discuss future work in Section 6.

Related Work
In this section, we summarize the notations and definitions of norm used in this study and briefly review NMF.
2.1.Notations and Definitions.Matrices and vectors are denoted by boldface capital and lowercase letters, respectively.x p = ∑ n i=1 x i p 1/p denotes the l p norm of the vector x ∈ R n .x i and x j denote the ith row and the jth column of matrix X = x ij , respectively.x ij is the element in the ith row and jth column of X, Tr X denotes the trace of X if X is a square matrix, and X T denotes the transposed matrix of X.The Frobenius norm of the matrix X ∈ R M×N is defined as The l 2,1 norm of a matrix is defined as where D is a diagonal matrix with D ii = 1/2 x i 2 .However, x i 2 could approach zero.For this case, we define D ii = 1/2 x i 2 + ε, where ε is a very small constant.Assume that the matrix samples are represented as X = , where x i L i=1 , x j N j=L+1 denotes labeled and unlabeled data, respectively.The labels of x i | L i=1 are denoted as l i ∈ 1, 2, … , L c with L c being the total number of categories.Let F ∈ R L×L c be a label indicator binary matrix with the jth entry f ij = 1 if and only if x i is labeled with the jth class; f ij = 0 otherwise.We also introduce a predicted label matrix Y ∈ R N×L c , where each row is the predicted label vector of the data x i .

+
, each column of X is a sample point.The main idea of NMF is to find two nonnegative matrices U = u ik ∈ R M×K + and V = v jk ∈ R K+N + that minimize the Euclidean distance between X and UV.The corresponding optimization problem is as follows: where • F is the Frobenius norm.To solve the objective function, Lee and Seung [15] proposed an iterative multiplicative updating algorithm as follows: By NMF, each column of U and u i can be viewed as the basis, while the matrix V can be treated as the set of the coefficients.Each sample point x i is approximated by a linear combination of the K bases, weighted by components of V.

The Proposed RSNLCF Framework
In this section, we introduce our novel learning method for image clustering (RSNLCF), which is used to find an effective and robust representation of data.
3.1.Robust Sparse NMF.The square loss function based on the Frobenius norm is used to learn the data representations in NMF.However, it is very sensitive to outlier points and noises.Therefore, our robust representation model is represented as min where λ > 0 is the regularization parameter.Because the l 2,1 norm reduces the components occupied by the large magnitude of error in the loss function, the corrupted samples never dominate the objective function.In this sense, the loss function X − UV 2, 1 is insensitive to outlier points and noises.Meanwhile, the regularization term V 2, 1 ensures that V is sparse in rows.This means that some of V's rows approximate zero.Consequently, V can be considered the combination coefficient for the most discriminative features.
Feature selection is then achieved by V, where only the features related to the nonzero rows in V are chosen.

Robust Local Coordinate
Constraint.Motivated by the concept of local coordinate coding [41], we present a robust local coordinate constraint as a regularization term for image clustering.First, we define coordinate coding.
Definition 1. Coordinate coding [41] can be written as concept pair (γ, C), where C is defined as a set of anchor points with d dimensions and γ is a map of For the local coordinate coding system, NMF can be considered as coordinate coding in which the columns of the matrix U can be viewed as a set of anchor points, and each column of the coefficient matrix V represents the corresponding coordinate coding for each data point.We might further hope that each sample point is represented as a linear combination of only a few proximate anchor points.A natural assumption here would be that if x i is far away from the anchor points u k , then its coordinate coding v ki with respect to u k will tend to be zero and thus achieve sparsity and 3 Complexity locality simultaneously.The local coordinate constraint [41] can be defined as follows: where x i denotes the ith column of X, u k is the kth column of U, v ki is the coordinate of x i with respect to u k , and The local coordinate constraint employs a square loss.When the dataset is corrupted by outlier points and noises, the local coordinate constraint may fail to achieve sparsity and locality simultaneously.In order to alleviate the side effect of noisy data, our robust local coordinate constraint can be formulated as where the Frobenius norm-based square loss function has been substituted by the l 2,1 norm.

Local Spline Regression.
In this subsection, we briefly introduce local spline regression [42].
Given N data points x 1 , x 2 , … , x N sampled from the underlying submanifold M, we use set N x i = x i j k j=1 to denote x i and its k − 1 nearest neighbor points, where i j ∈ 1, 2, … , N , and Y i = y i1 , y i2 , … , y ik T is the local predicted label matrix for the ith region.The task of local spline regression is to seek the predicting function g i R M → ℝ in order to map each data point x i j ∈ R M to the local predicted class label y i j = g i x i j .The model of local spline regression can be expressed as where S g i is a regularization term and γ > 0 is a small positive regularization parameter to control the smoothness of the spline [42].If S g i is defined as a seminorm of a Sobolev space, g i can be solved by the following objective function [43]: where d = C s M+s−1 , in which s is the order of the partial derivatives [43].p j x d j=1 and G i,j are a set of primitive polynomials and a Green's function, respectively.The coefficients α i and β i can be achieved by solving the following problem: where K i is a symmetrical matrix with elements K r,c = G r,c x i r − x i c , and P i is a matrix with its elements P i,j = p i x i j .
The local spline regression model can then be expressed as [42] min where M i is the upper left k × k submatrix of the inverse matrix of the coefficient matrix in (10).Because the local predicted label matrix Y i is a part of the global predicted label matrix Y, we can construct a selection matrix S i ∈ R k×N for each Y i such that where the selection matrix S i is defined as follows: After the local predicted label matrices are established, we combine them by minimizing the following loss function: where Based on the studies of [33,34], the predicted label matrix Y of the labeled data points should be consistent with the ground truth labels matrix F. With the consistence constraints, the objective function ( 14) can be written as follows: where E is a diagonal matrix whose diagonal elements are 1 for labeled data and 0 for unlabeled data, and the elements of F are defined as follows: When η is sufficiently large, the optimal solution Y to the problem (16) makes the second term approximately equal to zero.Thus, the objective function ( 16) guarantees local and global structural consistency over labels.All the elements of Y are restricted to be nonnegative.min where τ and μ are two trade-off parameters.We call (18) our proposed RSNLCF.

Optimization
The objective function (18) involves the l 2,1 norm, which is nonsmooth and cannot have a closed form solution.Consequently, we propose to solve it as follows.
When considering the nonnegative constraint on U, V, and Y, the objective function ( 18) could be reformulated as where A, B, and C are three diagonal matrices with their diagonal elements given as (19) is not convex in U, V, and Y together.Therefore, it is unrealistic to expect an algorithm to find the global minima.In this subsection, we describe our development of an iterative algorithm based on the Lagrangian multiplier method, which can achieve local minima.Following some algebraic steps, the objective function can be written as follows: To tackle the nonnegative constraint on U, V, and Y, the objective (20) can be rewritten as the Lagrangian multiplier.

21
where Ψ = ψ jk , Φ = ϕ ki , and Θ = θ is are the Lagrangian multipliers.Let the partial derivatives of the objective function (21) with respect to U, V, and Y be zero.Thus, we have where H is a diagonal matrix whose entries are row sums of Based on the Karush-Kuhn-Tucker conditions [44] The corresponding equivalent formulas are as follows: Solving ( 24), (25), and ( 26), we obtain the following update rules, given by In this manner, we obtain the solver for the objective function (19).

Convergence Analysis.
In this subsection, we demonstrate that the objective function (20) converges to a local optimum by using the update rules ( 27), (28), and (29) after finite iterations.We adopt the auxiliary function approach [16] to prove the convergence.Here, we first introduce the definition of an auxiliary function.
Definition 1. Z q, q′ is an auxiliary function for F q if the following properties are satisfied: Z q, q′ ≥ F q , Z q, q = F q 30 Lemma 1.If Z is an auxiliary function for F, then F is nonincreasing under the update: F q t+1 ≤ Z q t+1 , q t ≤ Z q t , q t = F q t 32 Lemma 2. For any nonnegative matrices and A, B are symmetric, and then the following inequality holds The convergence of the algorithms is demonstrated in the following: For given X, the optimizing objective function (20) w.r.t.V is equivalent to minimizing is an auxiliary function for O V .
Proof 1.In one sense, Z V, V = O V is obvious.However, we need to prove that Z V, V ′ ≥ O V To accomplish this, we compare (34) and (35) to find out that Z V, V ′ ≥ O V .
By applying Lemma 2, we obtain To obtain the upper bound for the third and fifth terms, we use the inequality a 2 + b 2 ≥ 2ab, which holds for any a, b ≥ 0, and these third and fifth terms in O V are bounded by To obtain lower bounds for the remaining terms, we adopt the inequality z ≥ 1 + log z, ∀z, and then Summing all inequalities, we can obtain Theorem 2. The updating rule (28) can be obtained by minimizing the auxiliary function Z V, V ′ .Proof 1.To find the minimum of Z V, V ′ , we set the derivative ∂Z V, V ′ /∂V ki = 0 and obtain Thus, by simple algebraic formulation, we can obtain the iterative updating rule for V as (28).6 Complexity Based on the properties of the auxiliary, we prove that the objective function (20) monotonically decreases under the updating v ki .
The converge proofs showing that updating u jk and y is can be accomplished using ( 27) and ( 29) are similar to the aforementioned.

Experiments and Discussion
We systematically evaluated the performance of our presented RSNLCF method and compared it to the popular clustering methods.(iii) AR dataset: the AR dataset contains over 4000 frontal face images of 126 individuals (70 men and 56 women) with different facial expressions, illumination conditions, and occlusions (sunglasses and scarf).All individuals participated in two photo sessions, and 26 images of each individual were captured.Each image was scaled to 32 × 32.
(iv) Leukemia dataset: the leukemia dataset contains data related to and samples of acute myelogenous leukemia (AML) and acute lymphoblastic leukemia (ALL).ALL can be further classified as T and B subtypes.This dataset consists of 5000 genes in 38 set of tumor data and contains 19 samples of B cell ALL B, eight samples of T cell ALL T, and 11 samples of AML.

Experimental Design.
In this section, we describe our evaluation metrics, the compared methods, and our parameter selection.

Evaluation Metrics.
In our experiments, two widely used metrics (i.e., accuracy (Acc) and normalized mutual information (NMI)) were adopted to evaluate the clustering results [45].We evaluated the algorithms by comparing the cluster labels of each data point with its label provided by the dataset.The Acc metric is defined as follows: where n refers to the total number of samples, r i denotes the cluster label of x i , and l i is the true class label.In addition, δ x, y is the delta function that is equal to 1 if x = y and 0 otherwise, and map r i is the mapping function that maps the obtained label r i to the equivalent label from the dataset.The best mapping function can be determined by using the Kuhn-Munkres algorithm [46].The value of Acc is equal to 1 if and only if the clustering result and the true label are identical.The second measure is the NMI, which is adopted in order to evaluate the quality of clusters.Given a clustering result, the NMI is defined as follows: where n i denotes the number of images contained in the ith cluster C i based on clustering results, nj is the number of images belonging to the C j ′, and n i,j is the number of images that are in the intersection of C i andC j ′ .

Parameter
Selection.Some parameters had to be tuned in the evaluated algorithms.To compare different algorithms fairly, we ran them using different parameters and chose the best average performance obtained for comparison.We set the number of clusters to be the same as the true number of categories on three image datasets and the leukemia dataset.Note that there was no parameter selection for RNMF and CNMF when the number of clusters was given.The regularization parameters were searched over the grid {0.001, 0.01, 0.1, 1, 10, 100, 1000} for semi-GNMF, URNGE, NLCF, and RSNLCF.The neighborhood size k to build the graph was chosen from 1, 2, … , 10 , and the 0-1 weighting scheme was adopted for its simplicity in the graph-based methods of semi-GNMF and URNGE.We applied the approach 7 Complexity presented in literature [16] to adjust automatically the value of λ for LCSNMF.

Face Clustering under Illumination Variations.
The robustness of the approaches to illumination changes was tested widely with the extended YaleB dataset.Figure 1(a) shows some samples from this dataset.We used only the frontal face images of the first 18 individuals.Our experiments were performed with various numbers of clusters.For the fixed cluster number k, the images of k categories from the extended YaleB dataset were randomly selected and mixed for evaluation.For semisupervised methods semi-GNMG, CNMF, and URNGE, eight face images per individual were randomly chosen as labeled samples; the rest of the dataset was used as unlabeled samples.On the clustering set, the compared methods were used to achieve new data representations.For a fair comparison, we used k-means to cluster samples based on the new data representations.The results of k-means are related to initialization.We repeated the experiments 20 times with different initialization parameters.The clustering results were measured by the commonly used evaluation metrics, Acc and NMI.Table 1 shows the detailed clustering results on different clustering numbers.The final row shows the average clustering accuracy (NMI) over k.Compared with the second best method, our method (RSNLCF) achieves an 11.41% improvement in clustering accuracy.For mutual information, it achieved a 10.63% improvement over the second best algorithm.

Face Clustering under Pixel
Corruptions.Two experiments were designed to test the robustness of RSNLCF against random pixel corruptions on the ORL face dataset.For the semisupervised algorithms of semi-GNMG, CNMF, URNGE, and RSNLCF, three images per individual were randomly chosen as labeled samples, and the remaining images were used as unlabeled samples.In the first experiment, each image was corrupted by replacing the pixel value with independent and identically distributed samples whose lower and upper bounds were the minimum and maximum pixel value of the image, respectively.The corrupted pixels of each image varied from 10 to 90% in increments of 10%. Figure 1(b) shows several examples.Because the corrupted pixels were randomly selected for each test sample, we repeated the experiments 20 times.Figure 2 displays the recognition accuracies over different levels of corruption.The recognition accuracies of the methods decreased rapidly as the level of corruption increased.From Figure 2, which depicts the recognition accuracies, we can observe that the proposed method consistently outperformed the others.When the samples had a high percentage of pixel corruption, the methods failed to obtain improved recognition performance because of inadequate discriminative information.

Complexity
In the second experiment, 40% of the pixels randomly selected from each sample were replaced by setting the pixel value as 255.The number of corrupted samples of each individual is gradually increased from 10 to 90%.We conducted the evaluations 20 times at different corruption percentages and computed the average recognition accuracies of Acc and NMI. Figure 3 illustrates clustering Acc and NMI curves of RSNLCF and the proposed method's six competitors  9 Complexity versus the percentage of corrupted images.From Figure 3, which depicts the comparison results on the ORL dataset, we can clearly see that the RSNLCF obtained the best recognition accuracy in all situations.5.5.Face Clustering under Contiguous Occlusions.We validated the robustness of RSNLCF against partial block occlusions (see Figure 1(c) for examples).Two experiments were conducted on the ORL face dataset.For the semisupervised algorithms of semi-GNMG, CNMF, URNGE, and RSNLCF, we randomly selected three samples from each category and used their category number as the label information.The first experiment was performed with a fixed contiguous block occlusion size of 40 × 40 pixels.We chose r of the face samples of each individual for occlusion, with r varying from 10 to 90%.The position of the block was randomly selected.The evaluations were performed 20 times for each r, and the means of Acc and NMI were recorded.Figure 4 shows the means of clustering Acc and NMI of the compared methods on different percentages of corrupted images.As shown in Figure 4, the performances of NMF, RNMF, semi-GNMF, CNMF, URNGE, and NLCF were lower than that of RSNLCF.With an increasing number of occluded samples, the clustering accuracy of RSNLCF dropped and thus matched expectations considerably.
In the second experiment, we simulated various levels of contiguous occlusions in each image by using an unrelated image of size p × p with p ∈ 5, 10, 20, … , 80 .The evaluations were conducted 20 times at each occlusion level, and the average Acc and NMI curves were recorded.Figure 5 plots clustering Acc and NMI results of the compared methods under different occlusion levels.Although the clustering accuracy of each method degraded with each increment in occlusion level, RSNLCF consistently exceeded other methods.When the occlusion size increased to 50 × 50, the occluding part dominated the image and caused the clustering performance to diminish rapidly.5.6.Face Clustering under Real Occlusions.We evaluated the robustness of RSNLCF against real malicious occlusions.The AR dataset adopted in this experiment contains 2600 frontal face images from 100 individuals (50 males and 50 females from two photo sessions).Figure 1(d) shows some face samples with real occlusions by sunglasses and scarf.Note that because RNMF, LCSNMF, and NLCF are unsupervised algorithms, we did not compare them here.In this experiment, we randomly selected r face images per individual as labeled samples, in which r was varied from four to 18, respectively, in increments of two.The remaining images were unlabeled samples.For each configuration, we conducted 20 test runs with each method.The mean and the standard deviation of clustering accuracy were recorded.Table 2 tabulates the detailed clustering results by Acc and NMI on the AR dataset and shows our algorithm achieved 8.55, 12.82, and 14.53% Acc improvement over URNGE, CNMF, and semi-GNMF, respectively.
For NMI, the recognition rate of RSNLCF was 7.06, 9.66, and 10.87% higher than URNGE, CNMF, and semi-GNMF, respectively.5.7.Gene Data Clustering on the Leukemia Dataset.Finally, we assessed clustering performance on the leukemia dataset.The gene expression dataset was rather challenging in terms of clustering issues, because it contains numerous features but only a few samples.We filtered out genes with max/ min < 15 and max − min < 500, leaving a total of 1999 genes.10 Complexity Note that because RNMF, LCSNMF, and NLCF are unsupervised algorithms, we did not compare them here.For each category of data, c = 2, 3, 4, 5, 6, 7, 8 samples were randomly chosen and labeled, with the remaining samples being unlabeled.As the samples were randomly selected, for each c, we repeated each experiment 20 times and calculated the average clustering accuracy.Figure 6 plots clustering Acc and NMI results of the compared methods under different numbers of labeled samples.We can observe that our RSNLCF approach achieved the best clustering performance of all the compared approaches.
5.8.Parameter Sensitivity.In our proposed method, several parameters were tuned beforehand.We observed that RSNLCF is insensitive to τ in the range of [10 −3 ,10 3 ].Accordingly, we fixed η to be 10 6 and τ to be 10 for both the        12 Complexity extended YaleB and leukemia datasets.To study the sensitivity of RSNLCF with respect to the remaining parameters (i.e., μ and λ), we varied these parameters.In the experiment, we plotted the Acc and NMI of RSNLCF with respect to μ and λ.  5.9.Convergence Analysis.In the previous section, we proved the convergence of our presented method.In our study, an experiment was performed to compare all algorithms' speed of convergence on the extended YaleB and leukemia datasets.The two parameters μ and λ were both fixed at 10.The time is measured using a computer with Intel Core ™ I7 2600 and 16 GB memory.Figure 9 demonstrated the objective function value versus computational time for different algorithms.
The horizontal and vertical axes here represent training times and the value of the objective function, respectively.We can observe from Figure 9 that the objective function value of all algorithms decreases steadily with the time increase, and RSNLCF requires less time than other graphbased methods, demonstrating that the proposed method was effective and efficient.
5.10.Overall Observations and Discussion.In our experiments, we considered several groups of experiments based on different databases, where the extended YaleB mainly involved illumination changes, the ORL database focused on pixel corruptions and block occlusions, the AR database included face images with different facial variations, sunglasses, and scarf occlusions, and the leukemia dataset contained a large number of features but only a few samples.
From the aforementioned experimental results, we gained the following several attractive insights: (i) In most cases, the performance of CNMF was usually lower than that of the graph-based approach, which demonstrates the superiority of intrinsic geometrical structure representation in discovering potential discriminative information.
(ii) Regardless of the datasets, our RSNLCF algorithm outperformed all six other methods.The reason lies in the fact that RSNLCF is designed for simultaneous application to local and global consistencies over labels simultaneously to uncover an underlying subspace structure.In addition, RSNLCF proved robust to outlier points and noises as a result of employing the l 2,1 norm formulations of NMF and the local coordinate constraint regularization term.
(iii) Future research on this topic will include how to use multicore processors [48,49] to accelerate our proposed method and how to extend the idea of semisupervised learning to the existing clustering algorithms.

Conclusion
In this study, we proposed a novel matrix decomposition method (RSNLCF) to learn an efficient representation for data in a semisupervised learning scenario.An efficient iterative algorithm for RSNLCF was also presented.The convergence of the presented method was theoretically proved.
Extensive experiments over diverse datasets demonstrated  13 Complexity that the presented method is quite effective and robust at learning an efficient data representation for clustering tasks.More importantly, experimental results revealed that our optimization algorithm quickly converges, indicating that our method can be utilized to solve practical problems.14 Complexity

Figure 1 :
Figure 1: Sample images.(a) Extended YaleB dataset, (b) ORL dataset with random pixel corruption, (c) ORL dataset with random block occlusions, and (d) AR dataset with contiguous occlusions by sunglasses and scarves.

Figure 2 :
Figure 2: Clustering Acc and NMI curves across percentages of corrupted pixels of each image for the compared methods on the ORL dataset.

Figure 3 :
Figure 3: Clustering Acc and NMI curves across percentages of corrupted images for the compared methods on the ORL dataset.

Figure 5 :
Figure 5: Clustering Acc and NMI curves of the compared methods under different occlusion levels with each image in the ORL dataset.

Figure 4 :
Figure 4: Clustering Acc and NMI curves of the compared methods on percentages of corrupted images with random block occlusions for the ORL dataset.

Figure 6 :
Figure 6: Clustering Acc and NMI curves of the compared methods under different numbers of labeled samples for the leukemia dataset.

Figure 7 :
Figure 7: Clustering accuracy of the proposed method with respect to the parameters μ and λ on the extended YaleB dataset.

Figures
7 and 8 show clearly the 3D results of RSNLCF.The horizontal axes are the parameters μ and λ, and the vertical axis represents the clustering accuracy of RSNLCF.In the 3D graphs, the square/circle marker indicates the best μ /λ for varying μ/λ.Next to each marker at the cross point is a digit number representing the value of Acc or NMI.We can notice from Figures7 and 8that the clustering performance varied with different combinations of μ and λ.However, it is unknown theoretically how to choose the best parameter.The regularization parameters should be associated with the characteristics of the dataset.

Figure 8 :
Figure 8: Clustering accuracy of the proposed method with respect to the parameters μ and λ on the leukemia dataset.
5.1.Datasets.Three standard face datasets and the gene dataset were selected to evaluate different methods.The four datasets are described as follows: (i) Extended YaleB dataset: the extended YaleB dataset contains 2414 frontal face images of 38 individuals.In this dataset, the size of each face image is 192 × 168 and each image was acquired from 64 illuminate conditions and nine individual poses.Each image was resized to 32 × 32 in our experiments.
(ii) ORL face dataset: the OR dataset contains 400 images of 40 individuals.All images were captured at different times and with different variations including lighting, face expressions (open and closed eyes, smiling, and not smiling), and specific facial details (glasses and no glasses).The original images had a size of 92 × 112.Each image was rescaled to 32 × 32.

Table 1 :
Clustering performance on the extended YaleB dataset.

Table 2 :
Clustering performances on the AR dataset.