A Generalized Robust Minimization Framework for Low-Rank Matrix Recovery

This paper considers the problem of recovering low-rank matrices which are heavily corrupted by outliers or large errors. To improve the robustness of existing recoverymethods, the problem is solved by formulating it as a generalized nonsmoothnonconvex minimization functional via exploiting the Schatten p-norm (0 < p ≤ 1) and L q (0 < q ≤ 1) seminorm. Two numerical algorithms are provided based on the augmented Lagrange multiplier (ALM) and accelerated proximal gradient (APG) methods as well as efficient root-finder strategies. Experimental results demonstrate that the proposed generalized approach is more inclusive and effective compared with state-of-the-art methods, either convex or nonconvex.


Introduction
In many practical applications, such as removing shadows and specularities from face images, separating foregrounds and backgrounds from monitored videos, ranking, and collaborative filtering, the observed data matrix  can naturally be decomposed into a low-rank matrix  and a corrupted matrix .That is,  =  + , where  can be arbitrarily large and is usually assumed to be sparse and unknown.The problem is whether it is possible to recover the lowrank matrix  from the observed .Recently, it has been shown that the answer is affirmative as long as the corrupted matrix  is sufficiently sparse and the rank of the low-rank matrix  is sufficiently low [1,2].That is to say, under certain conditions one can exactly recover from  the lowrank matrix  with high probability by solving the following convex optimization problem, that is, the idealization of robust principal component analysis (RPCA): min , ‖‖ * + ‖‖ 1 , s.t. = +, (1) where ‖‖ * denotes the nuclear norm of the low-rank matrix , that is, the sum of the singular values of , ‖‖ 1 denotes the  1 norm of the matrix  when seen as a vector, and  is a positive tuning parameter.Recently, there has been a lot of research focusing on solving the RPCA problem, for example, [3][4][5][6].
In spite of the great theoretical success and wide practical applications of RPCA (1), its major limitation should be claimed due to the use of nuclear and  1 norms as regularizers.Specifically, compared with the intrinsic rank constraint, that is, rank () <  0 , the nuclear norm regularizer will not only do more harm to the large singular values of  but also lead to weaker shrinkage of the disturbed small singular values.It is not hard to make a similar analysis with the case of the  1 norm regularizer.Then, the performance of RPCA (1) in dimensionality reduction and outlier separation will not be as good as expected in some scenarios.In Section 3, it has been empirically demonstrated that RPCA (1) is not robust to the case that either the matrix  is not sufficiently low-rank or the matrix  is more grossly corrupted.
To improve the robustness of RPCA [3][4][5][6], this paper proposes a generalized nonsmooth nonconvex minimization 2 Mathematical Problems in Engineering framework for low-rank matrix recovery by exploiting the Schatten -norm (0 <  ≤ 1) and   (0 <  ≤ 1) seminorm.And two numerical algorithms are deduced based on the augmented Lagrange multiplier (ALM) and accelerated proximal gradient (APG) methods as well as efficient root-finder strategies.Experimental results show that the proposed generalized approach is more inclusive and effective compared with state-of-the-art methods [3][4][5][6].Notice that much recently a nonconvex relaxation approach for low-rank matrix recovery [7] is proposed exploiting a nonconvex penalty called minmax concave plus and also a nonconvex loss function.However, our approach is different from [7] and is better in the terms of recovery accuracy and robustness than [7] as well as the other two nonconvex methods [8,9].The paper is organized as follows.Section 2 provides the generalized nonsmooth nonconvex minimization framework, including the problem formulation and two numerical algorithms based on the ALM and APG methods.Section 3 verifies the recovery performance of the proposed method and compares it against state-of-the-art methods.Finally, the paper is concluded in Section 4.

Proposed Model and Algorithms
2.1.Problem Formulation.Taking into account both the recovery robustness and computational efficiency, the Schatten -norm is used to better approximate the intrinsic rank constraint rank() <  0 ; similarly, the   seminorm is exploited to replace the  1 norm of a matrix when seen as a vector.It is now intuitive to generalize RPCA as follows: where 0 < ,  ≤ 1, ‖‖  , and ‖‖  are defined, respectively, in the following.Assume , ,  ∈ R × ; then the   seminorm of a matrix  when seen as a vector can be defined as where  , is the ,th element of ; and the Schatten -norm of a matrix  can be defined as where   is the th singular value of  and the singular value decomposition (SVD) of  is  = Σ  .Clearly, as  =  = 1, (2) reduces to convex RPCA in (1); as 0 < ,  < 1, (2) corresponds to a constrained nonsmooth and nonconvex minimization problem.Now, it comes to study the numerical iteration schemes of (2).In recent several years, communities of signal processing and computational mathematics show more and more interests in developing efficient algorithms for nonlinear nonsmooth optimization problems [10], such as iterative soft thresholding, split Bregman iteration, accelerated proximal gradient, augmented Lagrange multiplier, and so on, which have significantly simplified sparse optimization problems including RPCA (1).In a similar spirit to [1,3,4,7], this paper exploits the accelerated proximal gradient (APG) and augmented Lagrange multiplier (ALM) methods to solve the generalized minimization problem (2), considering the fact that APG and ALM are the two most popular numerical algorithms currently.As for ALM, it has been shown [11] that under certain general conditions, ALM converges  linearly to the optimal solution.As for APG, though little is known about the actual convergence of its produced sequences, the (1/ 2 ) rate of convergence of the objective function that they achieve is optimal [10].However, we should note that the above mentioned convergence results are not applicable to the new problem (2) because of its nonsmooth and nonconvex properties.In spite of that, empirical studies in Section 3 demonstrate that the two deduced algorithms in the following can both solve (2) very well, with empirically fast convergence rate.

ALM-Based
Algorithm.This subsection exploits ALM to solve problem (2), which is a nonconvex extension of [3].First of all, define functions (, ) and ℎ(, ) as According to ALM, the Lagrange function for (2) is given as where  is a matrix of Lagrange multipliers and  ≥ 0 is the augmented Lagrange penalty parameter.It is seen that ,  can be solved iteratively by alternating minimization of (, , , ).In the meanwhile, a continuation strategy is applied to  ≥ 0 in order to improve both the accuracy and efficiency of low-rank matrix recovery.Specifically, the iteration process is described in Algorithm 1.

APG-Based
Algorithm.This subsection exploits AGP as well as the continuation technique to solve problem (2), which is a nonconvex extension of [4].First of all, a relaxed minimization problem is produced from (2); that is, min where (, ) = ‖ℎ(, )‖ 2 2 /2 and ] ≥ 0 is a relaxation parameter.Obviously, (, ) is different from the Lagrange function of (2) in Section 2.2.However, instead of directly minimizing (, ), a sequence of separable quadratic approximations to (, ) is minimized, denoted as (, , Λ, Γ), where To assure both the accuracy and efficiency of minimizing (, , Λ, Γ, ]), two key strategies are taken into account deliberately.For one thing, (Λ, Γ) are determined by iterative smoothed computation as suggested in [14].For another, the continuation technique is also applied to ] ≥ 0, just similar to Algorithm 1.Moreover, the stopping criterion is identical to the one proposed in [15] and utilized in [4].The iteration scheme is presented in Algorithm 2 specifically.
In the following,  +1 ,  +1 can be solved, respectively, by Similar to ( 12) and ( 13) in Section 2.2, both (19) and ( 20) are instances of rooter-find problems ( 14) and hence can be solved efficiently by borrowing the numerical idea in [13].

Experimental Settings.
In this section, simulation experiments are designed and conducted to show the validity of our proposed approach.It firstly needs to produce available data using  =  * +  * , in which  * and  * are, respectively, the true low-rank and sparse matrices that we wish to recover.Without loss of generality,  * is generated as a product of two  ×  matrices whose entries are sampled i.i.d.from Gaussian distribution (0, 1/), and the sparse matrix  * is constructed by setting a proportion of entries to be ±1 and the rest to be zeros.More specifically, if  and spr represent, respectively, the matrix rank and sparsity ratio, then the MATLAB v7.0 scripts for generating  * and  * can be given as In the following experiments, set  to 500,  to 50, 100, 150, and 200, and spr to 5%, 10%, 15%, and 20%.To be noted, the matrix recovery problem (2) roughly changes from easy to hard as  or spr changes from small to large.To assess the accuracy of low-matrix recovery, the relative squared error (RSE) is used, defined as where Ã is the recovered low-rank matrix.And the number of SVD's is used to evaluate the computational efficiency, since the running time of Algorithms 1 and 2 as well as [1,3,4,7] is dominated by the SVD in each iteration.The experiments in this paper are conducted on a Lenovo computer equipped with an Intel Pentium (R) Core i5-3470 CPU (3.20 GHz) and 8 GB of RAM.[3].In the literature, although several different numerical algorithms solving RPCA have been reported [3][4][5][6], the ALM method [3] is shown possessing the best performance in both accuracy and efficiency.Hence, this subsection compares Algorithm 1 with its convex and reduced version, that is, RPCA [3].

Comparison between Algorithm 1 and RPCA
As implementing Algorithm 1, the parameters , , ,  are uniformly set as  = 1.5,  = 1/ √ 500,  = 1−7,  = 100, and the parameter  is set as  =  0 ⋅1−7, where  0 is set as 1.25/σ and σ is the largest singular value of .Besides,  0 ,  0 are both set as zero matrices.As for value choices of  and , we set them as 0.85 based on empirical studies, despite the fact that it may produce more accurate recovery with choices adaptive to different  and spr.Experimental results of Algorithm 1 and [3] are provided in Tables 1, 2, 3, and 4 corresponding to different settings.As the sparsity ratio spr equals 5%, it is obviously observed that Algorithm 1 performs perfectly in recovering the true rank of  * and is better than [3] in the term of RSE.It is also noticed that, as the sparsity ratio spr becomes larger, the recovery accuracy of both Algorithm 1 and [3] reduces, too.But it is still the case that Algorithm 1 behaves better than [3] no matter in the term of RSE or true rank recovery.
One more point should be claimed which is that slightly lower RSE's can be achieved by Algorithm 1 as setting  = 200.However, since the improvement in recovery accuracy is very limited, we just choose  = 100 for computational efficiency.[4].As running Algorithm 2, the parameters , , ,  are uniformly set as  = 0.9 < 1,  = 1/ √ 500,  = 1 − 7, and  = 200, and the parameter ] is set as ] = ] 0 ⋅ 1 − 9, where ] 0 is set as the largest singular value of , that is, σ.In addition,  0/−1 ,  0/−1 , Λ 0 , and Γ 0 are all set as zero matrices.As for  and , similar to the above manner, they are both set as 0.9 based on intensive empirical studies.Experimental results of Algorithm 2 and [4] are provided in Table 5, 6, 7, and 8 corresponding to different settings.It is also remarkable that Algorithm 2 recovers the true rank of  * in almost all the scenarios, which is much superior to Algorithm 1.Its second advantage over Algorithm 1 is that it achieves slightly more robust recovery as  * is much grossly corruptedand * is not sufficiently low-rank; for example, rank( * ) is 200.In spite of that, it is observed in other majority of cases that Algorithm 1 outperforms Algorithm 2 by (1 − 1) in the term of RSE.Therefore, it can be concluded that both algorithms possess their own advantages and disadvantages, and on the whole, Algorithm 1 shows better performance in terms of both recovery accuracy and efficiency.

Comparison between Proposed
Approach and [7].In the literature several nonconvex approaches for low-rank matrix recovery have also been proposed, for example, [7][8][9].However, only [7] announces that it outperforms ALM-based RPCA [3] in the term of recovery accuracy.Table 9 presents the RSE, number of SVD, and recovered rank achieved by [7] with sparsity ratios equal to 5% and 20%.Making comparison among Tables 1, 4, 5, 8, and 9, we can claim that both Algorithms 1 and 2 outperform [7] in terms of RSE and true rank recovery when  * is not sufficiently low-rank or  * is much grossly corrupted.In the meanwhile, we should also note that our method is computationally less efficient than [7] because of slightly more SVD's used in each iteration, which is one of the future works to be studied.

Empirical Analysis on the Convergence of Algorithms 1 and 2.
As mentioned earlier, existed convergence results on ALM and APG in [10,11] are not applicable to problem (2) due to the usage of nonconvex   seminorm and Schatten -norm, which makes it difficult to conduct theoretical convergence analysis of the proposed algorithms, either.In spite of that, the empirical analysis can be made by plotting the residual error curve against iteration number for each algorithm.Specifically, the residual error curves are provided as the sparsity ratio equals 20% for both Algorithms 1 and 2, as shown, respectively, in Figures 1 and 2. It is obvious that the two deduced algorithms are of empirically fast convergence in each recovery scenario.Actually, this observation is also valid to other easier recovery cases with lower sparsity ratios.
In addition, the number of iterations can be deduced from the residual error curves for each recovery problem.

Conclusions and Discussions
In this paper, a generalized robust minimization framework is proposed for low-rank matrix recovery by exploiting the Schatten -norm (0 <  ≤ 1) and   (0 <  ≤ 1) seminorm.And two numerical algorithms are deduced based on the ALM and APG methods as well as efficient rootfinder techniques.Experimental results demonstrate that the proposed algorithms possess their own advantages and disadvantages and both perform more effectively than stateof-the-art methods, either convex or nonconvex, in terms of both RSE and true rank recovery.
Note that this paper does not consider the influence of additive noise on the proposed algorithms, which actually corresponds to the problem of noisy RPCA [16,17].As claimed in [17], noisy RPCA is intrinsically different from the RPCA problem, that is, the focus of this paper.Indeed, the proposed algorithms in this paper are not quite robust to additive noise, just the same as many existing approaches to RPCA, for example, [1][2][3][4][6][7][8][9].To some degree, this observation coincides with the investigations in [18,19]; that is, the   seminorm as a sparsity-enforcing penalty is vulnerable against the influence of additive noise on the data, as it resembles the  0 seminorm when  approaches 0, in spite of the fact that  in Algorithms 1 and 2 is chosen, respectively, as 0.85 and 0.9.Our future research topic is to extend the proposed algorithms to the noisy RPCA problem with applications to the field of image and vision computing.