Low-Rank Representation for Incomplete Data

Low-rank matrix recovery (LRMR) has been becoming an increasingly popular technique for analyzing data with missing entries, gross corruptions, and outliers. As a significant component of LRMR, themodel of low-rank representation (LRR) seeks the lowestrank representation among all samples and it is robust for recovering subspace structures.This paper attempts to solve the problem of LRR with partially observed entries. Firstly, we construct a nonconvex minimization by taking the low rankness, robustness, and incompletion into consideration.Thenwe employ the technique of augmented Lagrangemultipliers to solve the proposed program. Finally, experimental results on synthetic and real-world datasets validate the feasibility and effectiveness of the proposed method.


Introduction
In the community of pattern recognition, machine learning, and computer vision, the investigated datasets usually have intrinsically low-rank structure although they are probably high-dimensional.Low-rank matrix recovery (LRMR) [1][2][3] is just a type of model which utilizes the crucial lowcomplexity information to complete missing entries, recover sparse noise, identify outliers, and build an affinity matrix.It also can be regarded as the generalization of compressed sensing from one order to two orders due to the fact that the low rankness of a matrix is equivalent to the sparsity of its singular values.Recently, LRMR has received more and more attentions in the fields of information science and engineering and achieved great success in video background modeling [4,5], collaborative filtering [6,7], and subspace clustering [3,8,9], to name just a few.
Generally, LRMR is mainly composed of three appealing types, that is, matrix completion (MC) [1], robust principal component analysis (RPCA) [2,4], and low-rank representation (LRR) [3,8].Among them, MC aims to complete the missing entries with the aid of the low-rank property and is initially described as an affine rank minimization problem.In the past few years, the affine rank minimization is convexly relaxed into a nuclear norm minimization [10] and it is proven that if the number of sampled entries and singular vectors satisfy some conditions, then most low-rank matrices can be perfectly recovered by solving the aforementioned convex program [1].
Classical principal component analysis (PCA) [11] is very effective to small Gaussian noise, but it does not work well in practice when data samples are corrupted by outliers or large sparse noise.For this purpose, several robust variants of PCA have been proposed successively during the past two decades [12,13].Since the seminal research work [4], the principal component pursuit (PCP) approach has become a standard for RPCA.This approach minimizes a weighted combination of the nuclear norm and the  1 norm with linear equality constraints.It is proven that both the low-rank and the sparse components can be recovered exactly with dominant probability under some conditions by solving PCP [4].
In subspace clustering, a commonly used assumption is that the data lie in the union of multiple low-rank subspaces and each subspace has sufficient samples compared with its rank.Liu et al. [3] proposed a robust subspace recovery technique via LRR.Any sample in each subspace can be represented as the linear combination of the bases.The low complexity of the linear representation coefficients is very useful in exploiting the low-rank structure.LRR attempts to seek the lowest-rank representation of all data jointly and it is demonstrated that the data contaminated by outliers can be exactly recovered under certain conditions by solving 2 Mathematical Problems in Engineering a convex program [8].If the bases are chosen as the columns of an identity matrix and the  1 norm is employed to measure the sparsity, then LRR is changed into the PCP of RPCA.
For the datasets with missing entries and large sparse corruption, the robust recovery of subspace structures may be a challenging task.The available algorithms to MC are not robust to gross corruption.Moreover, a large quantity of missing values will bring out the degeneration of recovering performance for LRR or RPCA.In this paper, we attempt to address the problem of low-rank subspace recovery in the presence of missing values and sparse noise.Specifically speaking, we present a model of incomplete low-rank representation (ILRR) which is a direct generalization of LRR.The ILRR model can be boiled down to a nonconvex optimization model which minimizes the combination of the nuclear norm and the  2,1 -norm.To solve this program, we design an iterative scheme by applying the method of inexact augmented Lagrange multipliers (ALM).
The rest of this paper is organized as follows.Section 2 briefly reviews preliminaries and related works on LRMR.The model and algorithm for ILRR are presented in Section 3. In Section 4, we discuss the extension of ILRR and its relationship with the existing works.We compare the performance of ILRR with the state-of-the-art algorithms on synthetic data and real-world datasets in Section 5. Finally, Section 6 draws some conclusions.

Preliminaries and Related Works
This section introduces the relevant preliminary material concerning matrices and representative models of low-rank matrix recovery (LRMR).
The choice of matrix norms plays a significant role in LRMR.In the following, we present four important types of matrix norms.For arbitrary X ∈ R × , the Frobenius norm of X is expressed by and the nuclear norm is ‖X‖ * = ∑ min(,)

𝑖=1
, where   is the (, )th entry of X and   is the th largest singular value.Among them, the matrix nuclear norm is the tightest convex relaxation of the rank function, and the  2,1norm and the  1 -norm are frequently used to measure the sparsity of a noise matrix.
Consider a data matrix D ∈ R × stacked by  training samples, where each column of D indicates a sample with the dimensionality of .Within the field of LRMR, the following three proximal minimization problems [3,4] are extensively employed: where  is a positive constant used to balance the regularization term and the approximation error.For given  and D, we define three thresholding operators S  (D), D  (D), and W  (D) as follows: (S  (D))  = max (             − , 0) , where UΣV  is the singular value decomposition (SVD) of D. It is proven that the aforementioned three optimization problems have closed-form solutions denoted by S  (D) [4], D  (D) [14], and W  (D) [8], respectively.We assume that D is low-rank.Because the degree of freedom of a low-rank matrix is far less than its number of entries, it is possible to recover exactly all missing entries from partially observed entries as long as the number of sampled entries satisfies certain conditions.Formally, the problem of matrix completion (MC) [1] can be formulated as follows: where We define a linear projection operator P Ω (•) : R × → R × as follows: Hence, the constraints in problem (3) can be rewritten as P Ω (X) = P Ω (D).PCA obtains the optimal estimate for small additive Gaussian noise but breaks down for large sparse contamination.Here, the data matrix D is assumed to be the superposition of a low-rank matrix A and a sparse noise matrix E. In this situation, robust principal component analysis (RPCA) is very effective to recover both the low-rank and the sparse components by solving a convex program.Mathematically, RPCA can be described as the following nuclear norm minimization [2]: where  is a positive weighting parameter.Sequentially, RPCA is generalized into a stable version which is simultaneously stable to small perturbations and robust to gross sparse corruption [15].We further assume that the dataset is self-expressive and the representation coefficients matrix is also low-rank.
Based on the above two assumptions, the model of low-rank representation (LRR) [3] is expressed as min where Z ∈ R × is the coefficient matrix, E is the noise matrix, and  is a positive trade-off parameter.This model is very effective to detect outliers and the optimal Z is in favor of the robust subspace recovery.In subspace clustering, the affinity matrices can be constructed by the optimal Z to problem (6).Problems (3), ( 5), and ( 6) belong to the nuclear norm minimizations.The existing algorithms to the preceding optimizations mainly include the iterative thresholding, the accelerated proximal gradient, the dual approach, and the augmented Lagrange multipliers (ALM) [16].These algorithms are scalable owing to the adoption of first-order information.Among them, ALM, also called alternating direction method of multipliers (ADMM) [17], is a very popular and effective method to solve the nuclear norm minimizations.

Model and Algorithm of Incomplete Low-Rank Representation
This section proposes a model of low-rank representation for incomplete data and develops a corresponding iterative scheme for this model.

Model.
We consider an incomplete data matrix M ∈ R × and denote the sampling index set by Ω.The (, )th entry   of M is missing if and only if (, ) ∉ Ω.For the sake of convenience, we set all missing entries of M to zeros.
To recover simultaneously the missing entries and the lowrank subspace structure, we construct an incomplete lowrank representation (ILRR) model: where  is a positive constant and D corresponds to the completion argument of M. If there is not any missing entry, that is, Ω = [] × [], then the above model is equivalent to LRR.In other words, LRR is a special case of ILRR.
In order to solve conveniently the nonconvex nuclear norm minimization (7), we introduce two auxiliary matrix variables X ∈ R × and J ∈ R × .Under this circumstance, the above optimization problem is reformulated as min This minimization problem is equivalent to min where the factor  > 0. Without considering the constraint P Ω (D) = M, we construct the augmented Lagrange function of problem ( 9) as follows: where ⟨•, •⟩ is the inner product operator between matrices and Y  is a Lagrange multiplier matrix,  = 1, 2, 3.In the next part, we will propose an inexact augmented Lagrange multipliers (IALM) method to solve problem (8) or problem (9).

Algorithm.
Inexact ALM (IALM) method employs an alternating update strategy and it minimizes or maximizes the function L with respect to each block variable at each iteration.
Computing J.When J is unknown and other variables are fixed, the calculation procedure of J is as follows:

Mathematical Problems in Engineering
Computing Z.If matrix Z is unknown and other variables are given, Z is updated by minimizing L: Let . By setting the derivative of (Z) to zero, we have or, equivalently, where I is an -order identity matrix.
Computing D. The update formulation of matrix D is calculated as follows: Considering the constraint P Ω (D) = M, we further obtain the iteration formulation of D where Ω is the complementary set of Ω.
Computing E. Fix Z, D, J, X, and Y and minimize L with respect to E: Computing X. Fix Z, D, J, E, and Y to calculate X as follows: where Hence, we obtain the update of X by setting ∇ℎ(X) = 0: Computing Y.GivenZ, E, D, J, and X, we calculateY as follows In the detailed implementation, Y is updated according to the following formulations: We denote by O × the  ×  zeros matrix.The whole iterative procedure is outlined in Algorithm 1.The stopping condition of Algorithm 1 can be set as where  is a sufficiently small positive number.

Convergence.
When solving ILRR via the IALM method, the block variables J, Z, D, E, X are updated alternatively.Now, we update simultaneously these five block variables; namely, (J, Z, D, E, X) := arg min The modified method is called the exact ALM method.Since the objective function in problem ( 8) is continuous, the exact ALM method is convergent [18].However, it is still difficult to (2) Update Z according to (14).
prove the convergence of the IALM.There are two reasons for the difficulty: one is the existence of nonconvex constraints in (8) and the other is that the number of block variables is more than two.Nevertheless, the experimental results of Section 5 demonstrate the validity and effectiveness of Algorithm 1 in practice.

Model Extensions
In ILRR model, the main aim of the term ‖E‖ 2,1 is to enhance the robustness to noise and outliers.If we do not consider outliers, then ‖E‖ 2,1 should be replaced with ‖E‖ 1 .For the new ILRR model, we can design an algorithm by only revising Step 4 of Algorithm 1 as follows: If we substitute ‖E‖ 2  for ‖E‖ 2,1 and set Z = Z  , then problem (7) is the incomplete version of low-rank subspace clustering (LRSC) with uncorrupted data [9].If we replace ‖Z‖ * and ‖E‖ 2,1 by ‖Z‖ 1 and ‖E‖ 1 , respectively, and incorporate diag(Z) = 0 into the constraints, then problem (7) is the incomplete version of sparse subspace clustering (SSC) without dense errors [19].
ILRR uses the data D itself as the dictionary.Now, we extend the dictionary and noise sparsity to the more general cases.As a result, we obtain a comprehensive form of ILRR: where A ∈ R × represents the dictionary, Z ∈ R × is the coefficients matrix, and ‖E‖  indicates a certain norm of E. If A is an -order identity matrix and ‖E‖  is chosen as ‖E‖ 1 , then problem (25) corresponds to the incomplete version of RPCA [20].If we further reinforce  = 0, then problem (25) becomes the equivalent formulation of MC [16].Moreover, if A is an unknown orthogonal matrix and  = 0, then problem ( 25) is equivalent to matrix decomposition method for MC [21].Finally, if we let D and A be stacked by the testing samples and the training samples, respectively, let ‖Z‖ * be replaced by ‖Z‖ 1 , and let  = 0, then problem ( 25) is changed into the incomplete version of robust pattern recognition via sparse representation [22].

Experiments
In this section, we validate the effectiveness and efficiency of the proposed method by carrying out experiments on synthetic data and real-world datasets.The experimental results of incomplete low-rank representation (ILRR) are compared with that of other state-of-the-art methods: sparse subspace clustering (SSC), low-rank subspace clustering (LRSC), and low-rank representation (LRR).For the latter three methods, the missing values are replaced by zeros.For SSC and LRSC, their parameters are tuned to achieve the best performance.The tolerant error  is set to 10 −8 in all experiments.
5.1.Synthetic Data.We generate randomly an orthogonal matrix U 1 ∈ R 200×4 and a rotation matrix T ∈ R 200×200 .Four other orthogonal matrices are constructed as U +1 = TU  ,  = 1, 2, 3, 4. Thus, five independent low-rank subspaces {  } 5 =1 ⊂ R 200 are spanned by the columns of U  , respectively.We draw 40 data vectors from each subspace by M  = U  Q  ,  = 1, . . ., 5, where the entries of Q  ∈ R 4×40 are independent of each other and they obey standard normal distributions.We set M  = (M 1 , . . ., M 5 ) and choose randomly its 20 column vectors to be corrupted.In this part, the chosen column vector m is contaminated by Gaussian noise with zero mean and standard deviation 0.1‖m‖ 2 .The resulting matrix is expressed as M  .
We draw samples on M  according to a uniform distribution and denote by Ω the sampling index set.The sampling ratio (SR) is defined as SR = |Ω|/200 2 , where |Ω| is the cardinality of Ω and SR = 1 means that no entry is missing.Hence, an incomplete matrix is generated by M = P Ω (M  ).The trade-off parameter  in both LRR and ILRR is set to 0.1.After running Algorithm 1, we obtain the optimal low-rank representation matrix Z.On the basis of Z, we construct an affinity matrix Z = (|Z|+|Z  |)/2, where |•| means the absolute value operator.Then we choose spectral clustering [23] as the clustering algorithm and evaluate the clustering performance by normalized mutual information (NMI).Let  be a set of true cluster labels and let   be a set of clusters obtained from the spectral clustering algorithm.NMI is calculated as where MI(,   ) is the mutual information metric and () is the entropy of .NMI takes values in [0, 1] and a larger NMI value indicates a better clustering performance.
In the experimental implementation, we vary the SR from 0.2 to 1 with an interval of 0.1.For each fixed SR, we repeat the experiments 10 times and report the average NMI.We first compare the affinity matrices Z produced by SSC, LRSC, LRR, and ILRR, as shown partially in Figure 1.From Figure 1, we observe that our method exhibits obvious block structures compared to other three methods, whereas SSC, LRSC, and LRR show no block-diagonal structure with SR = 0.3.This observation means that only ILRR can keep compact representation for the same subspace and divergent

Face Clustering.
We carry out face clustering experiments on a part of Extended Yale Face Database B [24] with large corruptions.This whole database consists of 38 objects and each object has about 64 images.We choose the first 10 objects and each image is resized into 32 × 32.Thus, the face images dataset is represented by a matrix with size of 1024 × 640.Each column of the data matrix is normalized to an  2 unit length in consideration of variable illumination conditions and poses.
The generation manner of sampling set Ω is similar to that of the previous part.We vary the values of SR from 0.1 to 1 with an interval of 0.1 and set  = 1.2 in LRR and ILRR.The comparison of NMI values is compared among SSC, LRSC, LRR, and ILRR, as shown in Figure 3, where each NMI value is the average result of ten repeated experiments.It can be seen from Figure 3 that ILRR achieves relatively stable NMI values if SR ≥ 0.2, while other three methods obtain much worse NMI values than ILRR if SR ≤ 0.9.This observation shows that SSC, LRSC, and LRR are more sensitive than ILRR on SR.
Furthermore, ILRR has an advantage in recovering the low-rank components over other three methods.In the following, we give the recovery performance contrast of ILRR.Here, we only consider two different SR; that is, SR = 0.3 and SR = 0.6.For these two given SR, the original images, sampled images, recovered images, and noise images are shown partially in Figures 4 and 5, respectively.Among these sampled images, the unsampled entries are shown in white.From these two figures, we can see that ILRR not only corrects automatically the corruptions (shadow and noise) but also recovers efficiently the low-rank components.

Motion Segmentation.
In this part, we test the proposed ILRR method for the task of motion segmentation.We consider eight sequences of outdoor traffic scenes, a subset of  the Hopkins 155 dataset [25].These sequences were taken by a moving handheld camera and they tracked two cars translating and rotating on a street, as shown partly in Figure 6.These sequences are drawn from two or three motions, where one motion corresponds to one subspace.There are 8 clustering tasks in total since motion segmentation of each sequence is a sole clustering task.The tasks of motion segmentation are carried out according to the given features extracted and tracked in multiple frames of the above video sequences.The extracted feature points of each sequence can be reshaped into an approximately low-rank matrix with  columns, where  equals the number of feature points.In the detailed experimental implementation, we consider different SR varying from 0.2 to 1 with an interval 0.2.The regularization parameter set  in LRR and ILRR is set to 2. For fixed sequence and fixed SR, each method is repeated 10 times and the mean NMI values are recorded.We report the mean and the standard derivation of NMI values among 8 sequences, as shown in Table 1.
From Table 1, we can see that ILRR obtains very high NMI values especially for SR ≥ 0.2, while NMI values of other three methods are unacceptable for SR ≤ 0.8.Although SSC segments exactly each subspace when SR = 1.0, small SR has a fatal influence on SSC's segmentation performance.In summary, ILRR is more robust to a large number of missing values than other three methods.

Conclusions
In this paper, we investigate the model of low-rank representation for incomplete data, which can be regarded as the generalization of low-rank representation and matrix completion.For the model of incomplete low-rank representation, we propose an iterative scheme based on augmented Lagrange multipliers method.Experimental results show that the proposed method is feasible and efficient in recovering

Figure 3 :
Figure 3: NMI comparison on Extended Yale B.

Figure 4 :
Figure 4: Face images recovery of ILRR on Extended Yale B with SR = 0.3.

Figure 5 :
Figure 5: Face images recovery of ILRR on Extended Yale B with SR = 0.6.

Figure 6 :
Figure 6: Images from traffic sequences on the Hopkins 155 dataset.