A Rank-Constrained Matrix Representation for Hypergraph-Based Subspace Clustering

This paper presents a novel, rank-constrained matrix representation combined with hypergraph spectral analysis to enable the recovery of the original subspace structures of corrupted data. Real-world data are frequently corrupted with both sparse error and noise. Our matrix decomposition model separates the low-rank, sparse error, and noise components from the data in order to enhance robustness to the corruption. In order to obtain the desired rank representation of the data within a dictionary, our model directly utilizes rank constraints by restricting the upper bound of the rank range. An alternative projection algorithm is proposed to estimate the low-rank representation and separate the sparse error from the datamatrix. To further capture the complex relationship between data distributed in multiple subspaces, we use hypergraph to represent the data by encapsulating multiple related samples into one hyperedge.The final clustering result is obtained by spectral decomposition of the hypergraph Laplacianmatrix. Validation experiments on the Extended Yale Face Database B, AR, and Hopkins 155 datasets show that the proposed method is a promising tool for subspace clustering.


Introduction
High-dimensional data spaces are frequently encountered in computer vision and machine learning tasks.In most cases, the data points lie in multiple low-dimensional subspaces embedding in a high-dimensional ambient space, and their intrinsic dimension is often much smaller than the dimension of the ambient space [1,2].When the subspace structure and membership of the data points to the subspaces are unknown, it is necessary to cluster the data into multiple subspaces.Subspace clustering is therefore of use in computer vision (e.g., image segmentation, motion segmentation, and face clustering), machine learning, and image analysis [3,4].
Over the past twenty years, several subspace clustering methods [5] have been proposed.The existing methods can be roughly divided into four categories: factorization methods [6][7][8], algebraic methods [9,10], statistical methods [11][12][13], and sparse methods [14][15][16][17].In matrix factorizationbased algorithms, a similarity matrix is built by factorization of the data matrix, followed by spectral clustering of the similarity matrix.This method assumes that the subspaces are independent and the data are clean [6][7][8].Thus, these methods cannot cope well with nonindependent subspaces structure and their performance degenerates with noisy data.Algebraic methods utilize the structure of subspaces for clustering.Generalized principal component analysis (PCA) [9,10] is the archetypal algebraic method.It does not assume that linear subspaces are independent or disjoint, but its complexity increases exponentially with the number of samples and the dimensions of the subspaces.In statistical methods, a distribution model is defined for the data drawn from the subspaces prior to estimating the model parameters using statistical inference.Mixture of Probabilistic PCA (MPPCA) [11] utilizes a mixture of Gaussians to represent the a priori probability of the data.The Agglomerative Lossy Compression (ALC) algorithm [12] assumes that the data are drawn from a mixture of degenerate Gaussians.RANSAC [13] uses a greedy strategy for labeling data as inliers and outliers, before iteratively fitting the sampled point into the statistical model and updating the parameters based on the residual.Sparse-based methods utilize low-rank and sparse properties of the data for subspace clustering.Sparse Subspace Clustering (SSC) [14] represents a data point as the sparse combination of all other data points in the set by minimizing the  1 norm of the coefficients.The low-rank representation (LRR) algorithm [15,16] aims to find a lowrank representation of the data matrix, with the minimization of nuclear norm to constrain the sum of the singular values of the coefficient matrix.Different from sparse representation, LRR represents a data sample as a linear combination of the atoms in a dictionary and jointly constrains the low-rank property of all the coefficients of the sample set, so it captures the global structure of the data [15,17].Due to this advantage, LRR recently attracts much attention.
The LRR algorithm uses the relaxed convex model to find the low-rank representation of the data, with exact decomposition of the data matrix into low-rank and sparse error components.However, when data are noisy, an exact low-rank and sparse matrix decomposition does not always exist for an arbitrary matrix [18,19].Furthermore, the rank range of the LRR model cannot be directly controlled and, in some cases, the range of the rank of the representation needs to be explicitly constrained.For example, in a face recognition problem, the images of the face of an individual under different lighting conditions can be simply characterized as a nine-dimensional linear subspace in the space of all possible images [20].In a motion segmentation problem, if  objects move independently and arbitrarily in a 3D space, then the motion trajectories lie in  independent affine subspaces of three dimensions [5].Thus, these prior ranges can be used as the upper bound to construct an efficient and rankconstrained representation, when the face images or motion trajectories are corrupted by lighting variations or outliers.
In the LRR model, the low-rank representation is used to define the undirected weighted pairwise graph for spectral clustering.In fact, the large coefficients in the low-rank representation usually cluster in groups following the analysis results of [15,16].It is implied that group information among the data is useful for clustering besides the pairwise relationship between two samples.However, the conventional pairwise graph, as in [15,16], fails to effectively describe the complex correlations that exist between samples [21,22].
To overcome the abovementioned limitations, we propose a new rank-constrained matrix representation model with a hypergraph structure.In contrast to previously described low-rank matrix representations, this method directly handles the nonconvex decomposition model to recover the clean data from observation data simultaneously corrupted by sparse error and noise, which seeks the desired rank representation within a dictionary and separates the sparse error.The desired rank representation can be obtained by explicitly restricting the upper bound of the rank range.An alternative projection algorithm is proposed to seek the desired rank representation within a dictionary and separate the sparse error component from the corrupted data matrix.Bilateral random projections (BRPs) [18,23] are adopted to obtain a low-rank approximation of the matrix in the iteration procedure, which avoids the expensive computation seen in SVD.Furthermore, with the aim of utilizing the complex high-order correlations between samples, a hypergraph is constructed by grouping highly related samples into one hyperedge [21,22].The final clustering results are obtained by spectral decomposition of the hypergraph Laplacian matrix.
Different from [15,16], our model produces an approximate representation  of a matrix  in the presence of both noise  and sparse error  with the upper bound constraint of rank ().However, LRR [15,16] assumes  =  +  and decomposes  into  and  with the constraint of minimizing the rank ().Our model constrains the rank range of the coefficient matrix , which is valuable for subspace clustering problems, for example, face clustering and motion segmentation.Furthermore, we develop new ways to construct hyperedges and compute weights in hypergraph to better describe the local group information of each vertex, which make hypergraph clustering significant and very different from the standard Laplacian clustering.In summary, the main contributions of this research are as follows: (1) A rank-constrained matrix representation model is proposed to obtain the desired rank representation of the data by the rank upper bound constraint.An alternating projection algorithm is proposed to solve this model and bilateral random projections are used to seek designed rank approximation.
(2) A hypergraph model is introduced to capture the complex and higher order relationships between data, in order to further improve the performance of subspace clustering.

Rank-Constrained Matrix Representation
Assume a set of data vectors  = [ 1 ,  2 , . . .,   ] in   are drawn from a union of  subspaces {  } =1,..., with unknown dimensions   .The objective is to cluster each sample into its underlying subspace.In real applications, the data are often simultaneously contaminated by both noise and error, and a fraction of the data vectors are grossly corrupted, or even missing.Following GoDec model [18], the observed data should be represented as Some rank-related constraint should be utilized to obtain the low-rank approximation  of . is the sparse error and  is the noise.However, GoDec model implicitly assumes that the underlying data structure is a single low-rank subspace.
Facing the subspace segmentation task, we need to extend the recovery of corrupted data from single subspace to multiple subspaces.
To better handle the mixed data lying near multiple subspaces, a more general representation model should be adopted.Data  can be represented by a linear combination of the atoms in a dictionary  = [ 1 ,  2 , . . .,   ]; that is,  = .The dictionary should be learned to adapt to the property of data , and it is necessary to select the optimal representation under the desired property: where  = [ 1 ,  2 , . . .,   ] is the coefficient matrix,   is the representation of   within the dictionary ,  is the noise, and  is the sparse error.Due to the explicitly noisy corruption, the exact low-rank and sparse decomposition of LRR model [15,16] may not exist.At the same time, the rank range of the representation cannot be directly constrained in the LRR model.However, in some cases, the range of the rank of the representation needs to be explicitly controlled according to the prior knowledge of the problem, so we need to look for a rank range-constrained representation .
where  ∈  × is the noise component,  ∈  × is the sparse error component,  ∈  × is the dictionary, and  ∈  × is the low-rank representation.rank() is defined as the rank of matrix  and  is the desired rank range. will be learned adaptively to better represent .‖‖ 2,1 is called the  2,1 norm, which is defined as the sum of  2 norm of the column of matrix ;  is the regularization parameter that balances the weight of the noise and sparse error components.
We use the matrix  to separate the sample specific corruptions (and outliers), which indicates the phenomenon that a fraction of the samples (i.e., columns of matrix ) are far away from the subspaces.Thus the  2,1 norm is used to encourage the columns of  to be zero and separate the error of some specific samples.It is mentioned that the proper norm for the matrix  should be chosen according to the corruption type.Taking the element-wise sparse error for example, ‖‖ 1 is an advisable constraint to separate the error component.
In (3), the optimal solution  * may not be a block diagonal due to the degeneration of sparse error and noise.However, it still serves as an affinity matrix, and spectral clustering algorithms are used on  * to obtain the final clustering results.For simplicity, we call model (3) the rankconstrained matrix representation (RMR) model, which is intrinsically different from the LRR model in the following: (1) The RMR model produces an approximated low-rank representation of a general matrix  upon dictionary  in the presence of noise  with the upper bound constraint of rank().However, LRR assumes that  =  +  (where  is sparse error) and exactly decomposes  into  and  with the constraint of minimizing the nuclear norm ‖‖ * .Furthermore, the dictionary  is adaptively learned in our model, which is good to represent data .(2) LRR minimizes the convex surrogate of the rank constraint, that is, the nuclear norm of .Although convex relaxation can simplify the optimization procedure, the solution is still a local optimum.However, our RMR model constrains the rank range of  and directly addresses the nonconvex model.Constraint rank() ≤  is utilized to obtain the desired representation, and the prior knowledge of the problem can also be utilized to set the rank range .

Solving the RMR Model
RMR model has multiple optimization variables which are , , , and , respectively.This section proposes an optimization algorithm to solve this multiple variables model.First, we replace the variable  with  −  −  in accordance with the equality constraint (2) and thus objective (3) can be rewritten as a relaxed version: Then, we propose an iterative algorithm to solve (4), that is, how to estimate the low-rank term  and the sparse term  from .Alternating minimization of multiple variables provides a useful framework for the derivation of iterative optimization algorithms. and  are two unknown matrix variables in (4).The optimization of ( 4) can be solved by alternately solving the following two subproblems until convergence.For the th iteration, ( The first subproblem refers to the low-rank matrix representation of , and it can be reformulated as finding the matrix  with rank upper bound  to minimize the Euclidean distance of  −  −  −1 : where  () () is the indicator function of a set (), defined as Here, we adopt the accelerated proximal gradient method [24] to solve (6) and the iteration formula is listed below: In (8), the first formula is the gradient descent operation and the second formula is the projection operator   () ( +1/2 ), which means finding the rank- approximation of  +1/2 .We adopt the bilateral random projections (BRPs) to obtain the rank- approximation fast, as in [18,23].Given two random matrices,  1 ∈  × and  2 ∈  ×  (< ), the rank- bilateral random projections of the data matrix  ∈  × ( ≥ ) are computed as  1 =  1 and  2 =    2 .Then, we can get We can see that the inverse of  ×  matrix and three matrix multiplications need to be calculated.2 floatingpoint multiplications are required to obtain  1 and  2 , and  2 (2 + ) +  multiplications are needed to obtain  +1 .Thus, the computational cost is much less than SVD-based approximation with complexity of  2  +  3 [23].Nesterov's accelerated strategy is also adopted to further improve the convergence speed.Please refer to [24] for more details of Nesterov's accelerated strategy.
For the second subproblem, there is a closed solution.Let  =  −  +1 ;  can be updated by column-wise soft threshold shrinkage of , which is just a linear complexity operation.
For the third subproblem, it corresponds to the dictionary updating.In order to reduce the computational complexity, we adopt the gradient descent method to update , which is defined as follows: where  is the iterative step-size.In terms of the selfexpressive property [15,16], each data point drawing from a union of subspaces can be effectively reconstructed by a linear combination of other data points lying in the space.Namely, the sample set  can be adopted as the dictionary to represent the sample set themselves.Thus, we initialize the  0 as the sample set ; that is,  0 = .
The solving of three subproblems should be repeated until the stopping criterion (1 <  1 and 2 <  2 ) is met or the maximum number,  max , of iterations is reached. 1 ,  2 are a small tolerance constant.1 and 2 measure the reconstruction error and the relative variation of variables ,  between the  + 1th and th iteration, respectively, which are calculated as follows:

𝐹
) . ( The complete optimization algorithm for RMR is summarized in Algorithm 1. Due to the alternative iteration between two variables, it is difficult to give the theoretical proof of the convergence of Algorithm 1.Nevertheless, we find that it converges asymptotically in our experiments.Figure 1 plots the curve of relative reconstruction error 1 and the relative variation 2 of variables ,  (log scale) versus iterations number in face clustering experiment upon the Extended Yale Face Database B. The relative errors 1 and 2 both decay rapidly with the number of iterations, which indicates the convergence of our optimization algorithm.

Hypergraph-Based Subspace Clustering
When we obtain the rank-constrained representation of the data by the RMR model, we can calculate the similarity matrix  for spectral clustering as in [15,16] by The popular way is to construct an undirected pairwise graph with weight  , assigned to the edge linking   and   , and spectral clustering is performed on the Laplacian matrix of the graph.Reference [15] stated that the large coefficients in the low-rank representation matrix usually cluster in groups.Our experiments also demonstrated the same conclusion in the proposed method.As shown in Figure 6, the large coefficients of matrix  cluster in groups along the main diagonal in face clustering experiment upon the Extended Yale Face Database B. The th datum has close relationships with the whole set of prominent data in its rankconstrained reconstruction, and the relation among them is more high order than pairwise.It is implied that local group information is useful for clustering.As a result of the multivariate relation being broken into many pairwise edge connections, the conventional pairwise graph is insufficient to capture the high-order relationship.Group information among the data ought to be utilized for clustering besides the pairwise relationship between two samples.In contrast to pairwise graph, a hypergraph is a generalization of a graph, where each edge (called hyperedges) can connect more than two vertices.Vertices with similar characteristics can all be enclosed by a hyperedge, so highorder information of the data besides pairwise information can be effectively captured, which may be very useful for subspace clustering tasks.
In this section, we propose a method for constructing the so-called RMR-HyperGraph, in which the vertices involve all the samples and the hyperedge   associated with each vertex   describes its rank-constrained driven reconstruction.For each data point   , RMR-HyperGraph seeks the  most relevant neighbors in its rank-constrained representation   to form a hyperedge, so that the data points in hyperedge   have strong dependency.The weight of each hyperedge   is computed to reveal the homogeneity degree of all the data points in the hyperedge.The task of subspace clustering is formulated as a problem of hypergraph partition.

Hypergraph Preliminaries.
Hypergraph  = (, , ) is formed by the vertex set , the hyperedge set , and the hyperedge weight vector .Here, each hyperedge   ∈  is a subset of  and is assigned a positive weight (  ).A || × || incidence matrix  denotes the relationship between the vertex and the hyperedge, defined as Based on , the vertex degree of each vertex V  ∈  and the edge degree of hyperedge   ∈  can be calculated as Let  V and   denote the diagonal matrices containing the vertex and hyperedge degrees, respectively, and let  denote the diagonal matrix containing the weights of hyperedges.

Hyperedge Construction and Weight Computation.
Hypergraph is actually a generalization of pairwise graph, and its key issues are how to build hyperedges and compute their weights.Most previous works have adopted 𝐾 As in [25,26], we also relax the incidence matrix  of hypergraph  with a soft way defined as According to this assignment, V  is "partly" assigned to   based on the similarity  , between V  and V  , if V  belongs to   .It presents not only the local grouping information, but also the importance of a vertex belonging to a hyperedge.In this way, the correlation between vertices is more accurately described.
The hyperedge weight (  ) is computed as the sum of the pairwise similarities within the hyperedge: Based on this definition, the "compact" hyperedge (local group) is assigned by a higher weight.

Hypergraph Spectral Decomposition for Subspace Clustering. Based on the constructed hypergraph, a hypergraph
Laplacian matrix is constructed to find the spectrum signature of the dataset for subspace clustering based on hypergraph spectral analysis [27].The principal idea is to perform spectral decomposition on the Laplacian matrix of the hypergraph model to obtain its eigenvectors and the eigenvalues.Hypergraph Laplacian matrices are also computed as where  V ,   , and  denote the diagonal matrices of the vertex degrees, the hyperedge degrees, and the hyperedge weights, respectively.Similarly, the problem of hypergraph partition can be relaxed into a generalized eigenvalue decomposition of the hypergraph Laplacian matrix.The hypergraph-based subspace clustering algorithmic procedure can be summarized in Algorithm 2.

Experimental Results
In the experiments, we test the proposed method on face clustering and motion segmentation problems and compare it with some state-of-the-art algorithms, including SSC [14], LRR [15,16], and GPCA [9].The parameters of all other competing algorithms are selected to have optimal results.

Simulated Data.
The first experiment tests on the simulated data.We construct seven independent subspaces whose bases are generated by random rotation of the base of the previous one.Each subspace has a dimension of 5 and 30 data vectors with a dimension of 70, which are randomly sampled from each subspace.In order to simulate the noisy and error-corrupted data, each sample point  is added by a small Gaussian noise with zero mean and variance   * ‖‖ 2 .Meanwhile, a certain percentage,   , of points are also selected to add a large Gaussian noise with variance 0.3 * ‖‖ 2 ; these points can be regarded as outliers deviating from the original subspace.
In order to test the stability of various algorithms, we conduct two group experiments which, respectively, perform clustering under various percentages of corruption and noise intensity.First,   is fixed and set as 0.2, while   is varied from 0% to 60%.Second,   is fixed and set as 20%, and   is varied from 0 to 0.6.After performing the RMR decomposition to obtain the coefficient matrix, we use RMR-Graph and RMR-HyperGraph to segment the data into seven clusters and compare the segmentation accuracy with LRR and SSC.The regularization parameters  are set as  SSC = 0.01,  LRR = 0.12, and  RMR = 0.23 for SSC, LRR, and RMR, respectively, which are both tuned to achieve the best performance by cross-validation.The parameter rank  is set at 35 for our method, and the experiments are repeated 10 times to obtain the mean accuracy.
As shown in Figures 2 and 3, RMR is superior to both LRR and SSC, especially when the percentage of the outliers and the noise variances increases.The performance of RMR-HyperGraph is also comparable to or better than RMR-Graph.These results demonstrate that the application of hypergraph clustering techniques improves robustness to noise and corruption for data clustering problem.

Data-Hopkins 155 Motion Dataset.
Hopkins 155 motion dataset is an extensive benchmark for testing the featurebased motion segmentation algorithms [27].It contains 155 sequences with two and three motions (each motion corresponds to a subspace).These sequences can roughly be Input: data matrix , number of classes , rank  of coefficient matrix  (1) Obtain the rank-constrained matrix representation via optimization Algorithm 1.
(2) Construct a K-nearest neighbors hypergraph by using the -rank representation to define the hyperedge and  matrix of the hypergraph.(3) Compute the hypergraph Laplacian matrix via (18).( 4) Spectral decomposition of hypergraph Laplacian matrix and Take the first  eigenvectors with non-zero eigenvalues as the embedded representation.Output: Use a -means clustering algorithm on the eigenspace to partition the vertices of the graph into  clusters.divided into three categories: checkerboard sequences, traffic sequences, and "other" (articulated/nonrigid sequences).Some sample images with superpositioned trajectory points are shown in Figure 4.For each sequence, the trajectories are extracted automatically with a tracker and outliers are manually removed.Therefore, the trajectories are only corrupted by noise but do not have missing entries or outliers.It could be considered that this database only contains slight corruptions.
The task of motion segmentation is to cluster these  trajectories tracked and extracted from the video sequence into  different groups, so that the trajectories in the same group represent a single rigid-body motion [27].Each sequence is a sole clustering task; there are, therefore, 156 clustering tasks in total.The feature point trajectories of a single rigid-body motion lie in an affine subspace of dimension of, at most, three.Given  trajectories of  rigidly moving objects, these trajectories can be approximately regarded as lying in a union of  affine subspaces.
The tested algorithms include SSC, LRR, and RMR-HyperGraph in this experiment.For RMR, the parameters are set as  = 0.26 and  = 3 * , where  is the number of moving objects in the sequences.The parameters of the other methods are also optimally tuned.Segmentation accuracy of To some extent, low-rank representation is good enough to recover the subspace structure in this case of approximately clean trajectory data.The utility of the rank range constraint and hypergraph model is not to be exerted fully.

Extended Yale Face Database B. The Extended Yale Face
Database B [28] consists of 640 frontal face images in 38 classes.For fair comparison, we select the first 10 classes in these experiments as in [15,16].Each class contains 64 images   with different illumination conditions.The image is taken in the different illumination condition (see Figure 5).Most of the data samples are corrupted by shadows and noise, making it an ideal test bed for the proposed algorithm.
As expected for an object showing Lambertian properties, the set of all images taken under varying lighting conditions cluster in a cone of the image space, which can be approximated very well by a 9-dimensional linear subspace [20].With the assumption that the subspace of each individual is independent, the rank is set as  = 90.The parameter  is, respectively, set as  SSC = 0.17,  LRR = 0.18, and  RMR = 0.24 for SSC, LRR, and RMR algorithms to obtain the optimal performance by cross-validation.Figure 6 displays the resulting coefficient matrix  with large coefficients clustering in the main diagonal and small coefficients irregularly scattered.The clustering results are listed in Table 3.It shows that our algorithm significantly outperforms the others.This is because RMR can effectively recover the rank-constrained representation of a set of data vectors from corrupted data.At the same time, the application of hypergraph clustering also enhances its robustness to noise and corruption.Figure 7 compares the decomposition examples of RMR and LRR method.The low-rank component  * is expected to recover the clean sample data.We can see that our model can more effectively remove the shadow or stripe noises.
The rank upper bound  is an important parameter in the proposed model.Figure 8 reports the performance of RMR-HyperGraph with different .It can be seen that RMR model gets approximately best performance when  is approximately 90, which is consistent with the prior knowledge that each subject has a 9-dimensional linear lighting subspace.In this case, the rank of the coefficient matrix of LRR model is about 135, which deviates largely from the prior rank range of 90.And the clustering accuracy of LRR is negatively affected by the inaccurate constraint of rank range.Our RMR model directly controls the rank range, which can address the mentioned problem of LRR and improve clustering accuracy.
In order to further test robustness to noise with various variances, a small Gaussian noise with zero mean and variance   * ‖‖ is added to the face images;   varies from 0.01 to 0.12. Figure 9 displays the clustering accuracy of RMR-HyperGraph, RMR-Graph, and the LRR method for different   values.We can see that RMR-HyperGraph always achieves a better performance than RMR-Graph and LRR.

AR Face Database.
The AR face database contains over 4000 images corresponding to 126 subjects (70 men and 56 women) [29].They have different facial expressions, illuminations, and occlusions.Each class contains about 26 images with resolution 55 × 40.Some sample images are showed in Figure 10.In particular, the large occlusions by sunglasses or scarves make the corruption more severe than that in Extended Yale Face Database B.
We use the first 10 classes to test the proposed methods.Similarly, we set the parameters  = 90 and  = 0.24 for our method empirically on this database.The parameters of the other methods are also tuned for optimal performance.Table 4 lists the clustering results.Our algorithm is more robust for gross corruption and significantly superior to the  11(c), it is also interesting that our RMR model can alleviate the expression change and recover the normal face.Our RMR algorithm can result in robust segmentation accuracy even in the case of very serious corruption.

Running Time.
We continue to analyze the computational complexity of each algorithm.The codes of SSC [14] and LRR [15,16] are downloaded on the authors' homepage, respectively.In particular, the code of CVX version is employed to run SSC algorithm.All the algorithms are implemented in Matlab R2011b running on Windows 7, with an Intel-Core i7-2600 3.40 GHz processor and 8 GB memory.
The running time (in seconds) of each algorithm in the face databases and motion datasets is listed in Table 5.We can see that SSC is the most time-consuming method and our algorithm requires the least time.With regard to LRR  method, the low computational cost of our algorithm mainly benefits from the use of bilateral random projections (BRPs) to compute the low-rank approximation.The computational cost of BRPs is less than SVD-based approximation in LRR method.

Figure 1 :
Figure 1: Relative error 1 and 2 versus iterations number.The parameter of RMR is set to be  = 1.0 × 10 −8 .

Figure 2 :
Figure 2: Subspace clustering accuracy of LRR and RMR versus the percentage corruption   .

Figure 4 :
Figure 4: Sample images from the Hopkins 155 database.

Figure 5 :
Figure 5: Some samples taken from Extended Yale Face Database B.

Figure 6 :
Figure 6: The coefficient matrix  of RMR for Extended Yale Face Database B.
Decomposition examples of the fifth classes

Figure 7 :
Figure 7: The decomposed components of some samples taken from Extended Yale Face Database B. The first rows in (a), (b), and (c) display the decomposition of our RMR method and the second row corresponds to the results of LRR method.

Figure 8 :
Figure 8: Clustering accuracy of our algorithm with different  values upon Extended Yale Face Database B.

Figure 9 :
Figure 9: Subspace clustering accuracy with different   values upon Extended Yale Face Database B.

Figure 10 :
Figure 10: Some samples taken from Extended Yale Face Database B.

Figure 11 :
Figure 11: The decomposed components of some samples taken from AR.The first rows in (a), (b), and (c) display the decomposition of our RMR method and the second row corresponds to the results of LRR method.
); End For Step 2. sparse part update set  =  −    +1 ; shrink each column of  +1 with soft-threshold where   is the th column of .
(11)method is simple, but the fixed number of neighbors (the size of hyperedge) is not adaptive to local data distribution of each data point.Using the similarity matrix  defined in(11), we define a hypergraph  with adaptive neighbors.The  neighbors of each vertex V  are identified as the samples whose coefficients are ranked as the first  which is largest in the th column of  (denoted by   ).Meanwhile, the number of neighborhoods  is adaptively selected for each vertex V  , according to the following rule:   , ) is comprised of the first  largest elements of   .This rule means that the energy of first  largest coefficients corresponding to the -nearest neighbor samples is at least 80% of the energy of   .Then, each vertex (data sample) and its -nearest neighbors are linked as a hyperedge.
-NN searching to generate the hyperedge, whereby each sample and its  nearest neighbors are taken as a hyperedge.

Table 3 :
Clustering accuracy on the Extended Yale Face Database B.

Table 4 :
Clustering accuracy on the AR database.

Table 5 :
Running time (in seconds) of each algorithm in the multiple face and motion datasets.