A Proximal Fully Parallel Splitting Method for Stable Principal Component Pursuit

As a special three-block separable convex programming, the stable principal component pursuit (SPCP) arises in many different disciplines, such as statistical learning, signal processing, and web data ranking. In this paper, we propose a proximal fully parallel splitting method (PFPSM) for solving SPCP, in which the resulting subproblems all admit closed-form solutions and can be solved in distributed manners. Compared with other similar algorithms in the literature, PFPSM attaches a Glowinski relaxation factor η ∈ (√3/2, 2/√3) to the updating formula for its Lagrange multiplier, which can be used to accelerate the convergence of the generated sequence. Under mild conditions, the global convergence of PFPSM is proved. Preliminary computational results show that the proposed algorithm works very well in practice.


Introduction
Recovering the low-rank and sparse components of a given matrix is a fundamental task in many scientific fields, which can be modeled as the following two-block separable nonconvex and nonsmooth programming: min , rank () +  ‖‖ 0 , s.t.+ = , (1) where  ∈ R × is a given matrix, rank() is the rank of the matrix  ∈ R × , ‖‖ 0 denotes the ℓ 0 -norm of , which is defined by the number of the nonzeros of its entries, and  > 0 is a trade-off parameter balancing the low rank and sparsity.Recent studies [1][2][3] indicate that the NPhard task (1) can be accurately accomplished by solving one of its convex relaxation problems: the principal component pursuit (PCP) problem, in which the rank and the ℓ 0 -norm are replaced by their tightest convex relaxations, that is, the nuclear norm and the ℓ 1 -norm, respectively.The PCP problem can be mathematically modeled as the following optimization problem: min where the nuclear norm ‖‖ * is the sum of all singular values of , which is to induce the low-rank component of the given matrix , and the ℓ 1 -norm ‖‖ 1 is the sum of the absolute values of all entries of , which is to induce the sparse component of the given matrix .
To capture even more applications, Zhou et al. [4] considered that the observation matrix  is corrupted by both small entry-wise noise and gross sparse errors.Then more general case was studied by Tao and Yuan [5] wherein only a fraction of entries of the given matrix  can be observed, and they suggested recovering the low-rank and sparse components of  by solving the following stable principal component pursuit (SPCP) problem: where  > 0 is related to the impulsive or Gaussian noise level, ‖ ⋅ ‖ denotes the Frobenius-norm, Ω ⊆ {1, 2, . . ., } × {1, 2, . . ., } is the index set of the observable entries of , and  Ω : R × → R × is the projection operator defined by In [6], Hou et where  > 0 is a penalty parameter.Problems (3) and ( 5) are both convex optimization programming, which can be equivalently reformulated as some semidefinite programming (SDP) problems and then solved by using some state-ofthe-art SDP solvers in polynomial time, such as SeDuMi [7] and SDPT3 [8].However, these solvers are not applicable to high-dimensional (3) and ( 5), because a linear system needs to be solved (in)exactly at each iteration, which may be more costly and require very large amount of memory in actual implementations [9].By introducing an auxiliary variable  ∈ R × , the model (5) which decouples the term ‖ Ω ( −  − )‖ 2 of (5).Recently, there has been an intensive study on the iteration methods for solving (6) [5,6,[10][11][12].Interestingly, these methods usually only exploit the first-order information of the objective function of (6), and most of them are based on the well-known alternating direction method with multipliers (ADMM), such as the sequential ADMM [11], the partially parallel ADMMs [5,6,10,12], and the fully parallel ADMMs [13,14].As an influential first-order iteration method in optimization, ADMM was independently proposed by Glowinski and Marrocco [15] and Gabay and Mercier [16] about forty years ago, and due to its high efficiency in the numerical simulation, ADMM has been receiving considerable attention in recent years.ADMM can be viewed as an application of the Douglas-Rachford splitting method to the dual of the two-block separable convex programming (2) [17] or a special case of the proximal point method for the general convex programming [18], or a generalization of the classical Uzawa method for solving the saddle-point problems [19].Chen et al. [20] showed that the sequential ADMM cannot directly extend to three-block separable convex programming by a simple counterexample, and similarly He et al. [13] showed that the full parallel ADMM may be divergent even for two-block separable convex programming.There are three strategies to deal with the issue: (1) the substitution procedure [13,21]; (2) the technique of regularization [14,22]; (3) the technique of small step size [23].For other recent developments of ADMM, including the (sub)linear convergence rate, various acceleration techniques, indefinite regularized matrices, larger step size, and its generalization to solve nonconvex and nonsmooth multiblock separable programming, we refer to [24][25][26][27][28].
In this paper, we focus the attention on the fully parallel ADMM, which is based on the substitution procedure and is more suitable for distributed computation.Following the full Jacobian decomposition of the augmented Lagrangian method (FJDALM) in [13], we shall propose a new fully parallel ADMM, termed as the proximal fully parallel splitting method (PFPSM) in the following analysis.Compared to other similar methods in the literature, the new iteration method has three properties.The first one is that all of its subproblems are regularized by some proximal terms, which can enhance its numerical efficiency in practice.The second one is that a Glowinski relaxation factor  ∈ ( √ 3/2, 2/ √ 3) is attached to the updating formula for its Lagrange multiplier, and larger values of  often can accelerate the convergence of the proposed method (as to be reported in Section 4).To the best of our knowledge, this is the first fully parallel ADMMtype method with the Glowinski relaxation factor for solving problem (6).The third one is that it dynamically updates the step size   .
The remainder of this paper is organized as follows.In Section 2, we present some necessary notations and useful lemmas.In Section 3, we propose the proximal fully parallel splitting method for solving problem (6) and prove its global convergence step by step.Some numerical results about the proposed method are shown and discussed in Section 4. Finally, in Section 5, some concluding remarks are drawn and several future research directions are discussed also.

Preliminaries
In this section, we briefly introduce some necessary notations, definitions, and lemmas. For Then, for a convex function  : R  → R, we have the following basic inequality: where () = { ∈ R  : () ≥ ()+⟨, −⟩, for all  ∈ R  } denotes the subdifferential of (⋅) at the point .The above definition and property (8) can be easily extended to the matrix-valued function.
For the convenience of our analysis, let us relabel  as  1 ,  as  2 , and  as  3 and denote  1 We make the following standard assumption about problem (6) in this paper.
Assumption 1.The generalized Slater's condition holds; that is, there is a point ( X1 , X2 , X3 ) ∈ ri(dom( 1 ) × dom( 2 ) × dom( 3 )) ∩ , where the set  is defined by Under Assumption 1, it follows from Theorems 3.22 and 3.23 of [29] that ( * 1 ,  * 2 ,  * 3 ) is an optimal solution of problem (6) if and only if there exists a matrix ) is a solution of the following KKT systems: The following lemma is the basis to reformulate (10) as a mixed variational inequality problem.
Furthermore, we set Obviously, (17) can be written as the following mixed variational inequality problem, abbreviated as VI(W, , ): find a matrix where The set of the solutions of VI(W, , ), denoted by W * , is nonempty by Assumption 1 and Remark 3.Because () is a linear mapping and its coefficient matrix is skew-symmetric, it satisfies the following nice property:

The Proximal Fully Parallel Splitting Method
In this section, we shall present the proximal fully parallel splitting method for solving (6) and discuss its convergence results.
Firstly, the augmented Lagrangian function of the convex programming (6) can be written as where  > 0 is the penalty parameter and Λ ∈ R × is the Lagrange multiplier for the linear constraint of (6).To simplify our notation in later analysis, we define a block diagonal matrix as follows: where   ∈ R × ( = 1, 2, 3) are some symmetric positive semidefinite matrices and  ∈ ( √ 3/2, 2/ √ 3) is a predefined parameter.Note that the matrix  is symmetric positive definite.
Step 3 (correction step).Generate the new iterate by where  is any symmetric positive definite matrix and   is the step size defined by with Set  fl  + 1 and go to Step 1.
Remark 5.In PFPSM, we apply the classical proximal point algorithm (PPA) to regularize its subproblems.Compared to the iteration method in [13], the new iteration method PFPSM extends the scope of  from 1 to the interval ( √ 3/2, 2/ √ 3), and larger values of  are often more preferred in practice; see Figure 2. Furthermore, compared to the iteration method in [14], the new iteration method PFPSM removes the restriction imposed on the regularized matrices   ( = 1, 2, 3); see condition (2.10) in [14].
The following lemma provides a lower bound of the (  , W ), that is, the numerator of the step size   , and this lower bound plays an important role in the proof of the global convergence of PFPSM.Lemma 6.Let W = ( X 1 , X 2 , X 3 , Λ ) be the matrix generated by PFPSM from the given   and (  , W ) be defined by (28).Then we have where Proof.By the matrix form of the inequality It follows from ( 23) and (28) that where the first inequality comes from (31) and the second inequality follows from  ∈ ( √ 3/2, 2/ √ 3).The proof is completed.
Remark 7. It follows from ( 27) and ( 29) that where  min () is the minimum eigenvalue of the positive definite matrix .Therefore the sequence {  } generated by PFPSM is bounded away from zero.Specially, if we set  = , then Now, we shall show that the stopping criterion (25) is reasonable.
Proof.Deriving the first-order optimal condition of  1 subproblem in (24), we have From the updating formula for Λ in (24), one has and, substituting it into the above inequality, we get which can be reduced to Similarly, for the other two variables  2 ,  3 , we have Furthermore, the updating formula for Λ in (24) can be rewritten as the following variational inequality problem: Then, adding the above four inequalities, and by manipulation, one has Therefore, substituting    = X  ( = 1, 2, 3), Λ  = Λ into (41), we get Mathematical Problems in Engineering which indicates that (  1 ,   2 ,   3 , Λ  ) is a solution of VI(W, , ).This completes the proof.
If the iteration method PFPSM terminates at Step 2, then the iterate   is an approximate solution of VI(W, , ).Thus, it is assumed, without loss of generality, that the iteration method PFPSM generates an infinite sequence {  }.Now, we intend to prove that − −1 (  − W ) is a descent direction of the merit function (1/2)‖ −  * ‖ 2  at the point  =   , where  * ∈ W * .Lemma 9. Let {  } and { W } be the two sequences generated by PFPSM.Then, one has Thus, where the second inequality follows from  * ∈ W * , and the second equality comes from (24).Therefore, we have which completes the proof of the lemma.
From inequalities ( 29) and ( 44), one has which indicates that − −1 (  − W ) is a descent direction of the merit function (1/2)‖− * ‖ 2  at  =   .Therefore, it is natural to generate the new iterate  +1 along this direction as the iterative scheme (26).Now, let us investigate how to choose a suitable step size to achieve progress as much as possible.The new iterate with an undeterminate step size  is denoted by  +1 (), so one has be the profit function of the th iteration.However, we cannot maximize Θ  () directly because  * ∈ W * is unknown.Motivated by the strategy in [13], we derive a tight lower bound of Θ  () in the next lemma, which is independent of the unknown matrix  * , to obtain a suboptimal step size.
It follows from Lemma 9 that Φ  () is a lower bound of Θ  () for any  > 0. Therefore, we can maximize Φ  () to accelerate the convergence of PFPSM.Note that Φ  () is a quadratic function of  and it reaches its maximum at   defined by (27).The following theorem shows that the sequence {  } generated by PFPSM is Fejér monotone with respect to the solution set W * of VI(W, , ).
where the second equality follows from the definition of   in (27), and the second inequality comes from ( 29) and (33).Then, assertion (52) is thus proved.
Remark 12.For some very large-scale (6), computing the step size   defined by ( 27) is a fussy work needing much calculating.For these instances, set  =  in PFPSM, and the step size   can be fixed as the constant  0 defined in Lemma 6.
The new iterate can be generated by Then, referring to the proof of the above theorem, we have  (56) Based on the Fejér monotonicity of the sequence {  } generated by PFPSM, the global convergence of PFPSM can be proved by similar arguments to those in [13], which is omitted.

Numerical Results
In this section, we run a series of numerical experiments and apply the proposed PFPSM to solve some concrete examples of SPCP (6).We compare it with the splitting method for separable convex programming (denoted by HTY) in [30], the full Jacobian decomposition of the augmented Lagrangian method (denoted by FJDALM) in [13], and the fully parallel ADMM (denoted by PADMM) in [14].Through these numerical experiments, we want to illustrate that (1) suitable proximal terms can enhance PFPSM's numerical efficiency in practice, (2) larger values of the Glowinski relaxation factor  can often accelerate PFPSM's convergence speed, and (3) compared with the other three ADMM-type methods the dynamically updated step size   defined in ( 27) can accelerate the convergence of PFPSM.All codes were written by Matlab R2010a and conducted on a ThinkPad notebook with Pentium(R) Dual-Core CPU T4400@2.2GHz, 4 GB of memory.
Now, let us first introduce two shrinkage operators, which are used to solve the subproblems of the four tested methods.Let  > 0 be a constant and  ∈ R × be a given matrix, and let  = Σ ⊤ be the singular value decomposition (SVD) of .Then, the solution of the optimization problem min is given by the operator D  () ∈ R × , which is defined by [31] The solution of the optimization problem min is given by the operator S  () ∈ R × , which is defined componentwisely by [32] (S  () where sign(⋅) is the sign function.
In the following, let us discuss how to solve the three subproblems of PFPSM.In fact, these three subproblems are simple enough to have closed-form solutions, which are listed as follows.For simplicity, we set   = ]  ( = 1, 2, 3) with ] ≥ 0 in PFPSM.Based on the two shrinkage operators (58) and ( 60), the closed-form solutions of the three subproblems in the prediction step of PFPSM are summarized as follows: (ii) The explicit solution X 2 is (iii) The subproblem related to X3 is Then, the explicit solution X 3 is given as follows: where 4.1.Synthetic Data.In the experiment, following [5], we used the following synthetic data: the true low-rank matrix  * 1 =  * =  1  ⊤ 2 is generated using normally distributed  1 ∈ R × and  2 ∈ R × .Hence, the rank of  * is , and the low-rank ratio rr of  * is defined by rr = /.The nonzero entries of sparse matrix  * 2 or  * are generated uniformly in [−500, 500], whose sparsity ratio is denoted by spr, defined by ‖ * ‖ 0 /, where ‖ * ‖ 0 represents the number of nonzero entries of  * .Then, we get the matrix  =  * +  * .Here, we consider that only a fraction of entries of  can be observed, and indices of the observed entries are collected in the index set Ω, which is determined randomly with a certain sample ratio (the ratio is denoted by sr = (100|Ω|/)%, where |Ω| is the cardinality of Ω).Then, we get the matrix  Ω (), which is further corrupted by adding Gaussian noise with standard deviation  = 10 −3 .Finally, we get the true observed matrix  Ω () + *rand(, ).Here, we test the following scenarios:  =  ∈ {100, 200}, sr ∈ {0.9, 0.8, 0.7}, rr ∈ {0.05, 0.1}, and spr ∈ {0.05, 0.1}.The parameters  and  in (6) are set as  = 1/√,  = √ + √8/10, respectively.
Let us first test the sensitivity of PFPSM with respect to the proximal parameter ] and the Glowinski relaxation factor , and the numerical advantage of attaching the parameters ] and  to the iterative scheme of PFPSM is thus clear.We choose ] = 0 : 0.1 : 1 and fix  = 1.5,  = 1.15 and  =  = 200, rr = spr = 0.05, sr = 0.8.The maximum number of iterations is set to 500.The numerical results are plotted in Figure 1, in which "RLR" denotes the relative error of the recovered low-rank matrix (‖  1 −  * 1 ‖/‖ * 1 ‖) and "RSP" denotes the relative error of the recovered sparse matrix (‖  2 −  * 2 ‖/‖ * 2 ‖) when the stopping criterion (65) is satisfied.Figure 1 shows that small values of ] > 0 can improve the accuracy of the solutions, while large values of ] > 0 can accelerate the convergence speed of PFPSM.Then taking into consideration the numerical performance of other comparing methods, in the following we set ] = 0.9.Now let us make the sensitivity analysis of the parameter .We choose  = 0.86 : 0.02 : 1.15 and fix  = 1.5, ] = 0.9 and  =  = 100, rr = spr = 0.05, sr = 0.8.The maximum number of iterations is also set to 500.The numerical results are plotted in Figure 2. The two subplots in Figure 1 indicate that the number of iterations is descent and the elapsed CPU time has a descent tendency as the parameter  increases, which verify the numerical advantage of PFPSM with large values of .Then, based on this observation, we set  = 1.15 in the following experiments.Now, we compare PFPSM with PPSA, FJDALM, and PADMM to examine its numerical advantage from aspects of the number of iterations (denoted by "Iter."), the elapsed CPU time in seconds (denoted by "Time"), the relative error of the recovered low-rank matrix, and the relative error of the recovered sparse matrix when the stopping criterion (65) is satisfied.Some numerical results are listed in Tables 1 and 2, which are the average results over 20 trials for each setting of parameters.
The numerical results in Tables 1 and 2 indicate the following: (1) From the "Iter."criterion, the tested methods all successfully recover the given matrix  * ; (2) from the "Iter."and "Time" criteria, our new method PFPSM performs better than the other three tested methods.It is almost the fastest for all the cases, and the reason may be that the suitable choice of the parameter  can accelerate the convergence of PFPSM; (3) compared to HTY, PFPSM is a fully parallel splitting method, which only recorded the maximum time taken by its three subproblems.Thus, we observe in Tables 1 and 2 that PFPSM takes less CPU time than HTY, though it needs to calculate a complicated correction step at each iteration or/and takes more iterations to terminate than HTY.In fact, considering the case  =  = 100, rr = spr = 0.05, sr = 0.9 as an example, we find that the correction step of PFPSM takes about 7.26% of its whole CPU time, while the computation of the recovered sparse matrix and the simple correction step of HTY takes about 13.10% of its whole CPU time.In conclusion, the proposed PFPSM can be used as a surrogate of HTY, FJDALM, and PADMM.
In Figures 3 and 4, we display the evolution of ‖  1 −  * 1 ‖/‖ * 1 ‖ and ‖  2 −  * 2 ‖/‖ * 2 ‖ with respect to the iteration counter , respectively, from which we can see that PFPSM performs better than FJDALM and PADMM and slightly worse than HTY in terms of the accuracy.The reason may be that PFPSM, FJDALM, and PADMM are fully parallel methods, while HTY is a partially parallel method.Therefore, HTY can fully utilize the latest iteration information to speed up its convergence.The relationship is just like that of the Jacobi iteration method and the Gauss-Seidel iteration  method for linear system of equations.However, as illustrated in Table 1, the CPU time of PFPSM is almost always smaller than that of HTY, which is the main advantage of the parallel methods over the partially parallel or sequential methods.

Background Extraction from Surveillance Video with
Missing and Noisy Data.In this subsection, we are going to show the efficiency of the proposed method by computing the real-world problem: background extraction from surveillance video with missing and noisy data.In the experiment, following [6], we focus on a well-tested video taken in an  airport, which consists of 50 grayscale frames with each frame having 144 × 176 pixels.We vectorize all frames of the video and get a matrix  ∈ R 25344×50 where each column represents a frame, which is further contaminated with additive Gaussian noise of mean zero and standard deviation  = 10 −3 .Similar to the last experiment, only a fraction of entries of  can be observed, and indices of the observed entries are also collected in the index set Ω.In the experiment, we set  = 1/√,  = 0.01 and the initial iteration is set to be zero.Other parameters remain the same as those used in the last experiment for all the tested algorithms.The numerical results are given in Table 3, where Obj.represents the value  of the objective function of (1) when the tested methods terminate.
Table 3 shows that although PFPSM implements more iterations than HTY for all test instances, the performance of PFPSM is much better than the other two methods in recovering low-rank and sparse components.In Figure 5, we exhibit a frame in the tested video and the separation results with 30% random missing data.Clearly, all the tested methods can extract the disturbed image quite well.

Conclusion
In this paper, we have proposed a new fully parallel splitting method for the stable principal component pursuit problem, which inherits all merits of ADMM and parallel method.
Compared to other similar methods in the literature, the new method attaches a Glowinski relaxation factor  to the updating formula for the Lagrange multiplier, and large values of  are often beneficial to its convergence in general.Preliminary numerical results indicate that the new method is quite efficient.
To ensure the convergence of the proposed iteration method, the scope of the involved parameter  is ( √ 3/2, 2/ √ 3), which is a proper subset of the interval (0, (1+ √ 5)/2), the scope of the Glowinski relaxation factor  of the sequential ADMM-type methods for two-block separable convex programming [25,33].Therefore, it is worth extending the scope of the involved parameter  and studying its optimal choice.14 Mathematical Problems in Engineering numerical experiment.All authors read and approved the final manuscript.

Figure 5 :
Figure 5: The 10th frame of the clean video and the corresponding corrupted frame with 30% random missing data (a); the extracted background and foreground by HTY (b), PADMM (c), and PFPSM (d).

Table 3 :
Numerical results for background extraction problem.