Sensitivity Analysis of the Proximal-Based Parallel Decomposition Methods

The proximal-based parallel decomposition methods were recently proposed to solve structured convex optimization problems. These algorithms are eligible for parallel computation and can be used efficiently for solving large-scale separable problems. In this paper, compared with the previous theoretical results, we show that the range of the involved parameters can be enlarged while the convergence can be still established. Preliminary numerical tests on stable principal component pursuit problem testify to the advantages of the enlargement.

To solve (1), the classical alternating direction method generates the new iterate via the following scheme: where  ∈ R  is the Lagrange multiplier associated with the linear constraint and  > 0 is a penalty parameter for the violation of the linear constraint.At each iteration, ADM essentially splits the subproblem of the augmented Lagrangian method into two subproblems in Gauss-Seidel fashion.The subproblems can be solved in consecutive order, which makes ADM possible to exploit the individual structure of  and .The decomposed subproblems in (2) are often easy when  and  in (1) are both identity matrices and the resolvent operators of  and  have closed-form solutions or can be efficiently solved up to a high precision.Here, the resolvent operator of a function (say,  R  → R) is defined by where  ∈ Z and  > 0. However, in some cases, both  and  are not identity matrices; the two subproblems in ADM (2) are difficult to solve because the evaluation of the following minimization style could be costly, where  is a given nonidentity matrix, for example,  or .
For the purpose of parallel and easy computing, the first parallel decomposition method [8] (abbreviated as FPDM) generates the new iterative as follows: where the parameters ,  are required to satisfy  > 2‖  ‖ and  > 2‖  ‖.Here, ‖‖ denotes the largest eigenvalue of matrix .It is easy to verify that the proximal-based decomposition method proposed in [9] is a special case of the FPDM.When (4) is easy to evaluate for  and , the second parallel decomposition method [8] (abbreviated as SPDM) can be used, which generates the new iterative as follows: where the parameters ,  are required to satisfy  > ‖  ‖ and  > ‖  ‖.
Note that the subproblems in FPDM and SPDM can be processed in a parallelized fashion because the first subproblem involving  is independent on the second subproblem involving .Thus, FPDM and SPDM are suitable for solving large-scale distributed machine learning and big-data-related optimization problems.
ADM was first described in [10] and is closely related to many other algorithms, such as augmented Lagrangian methods, proximal point algorithm [11], and split Bregman methods [12].Recently, the convergence of ADM has been analyzed under certain assumptions (see e.g., [13][14][15][16]) and the direct extension of ADM for multiblock convex minimization problems has already been proved not necessarily convergent [17].
In this paper, we study the proximal-based parallel decomposition methods from the perspective of variational inequalities.We show that the requirement ranges of the parameters , , and  can be significantly enlarged.Our contributions are as follows.
(i) For the FPDM, we show that the requirements of the step sizes , , and  can be uniformly relaxed by (iii) We provide a new application example in machine learning, that is, stable principal component pursuit problem.Preliminary numerical experiments testify to the advantages of the enlargement.
The rest of this paper is organized as follows.In Section 2 we derive a variational reformulation of (1) and summarize some preliminaries of variational inequalities.In Section 3, we describe our main theoretical results and analyze their convergence.We report some numerical results in Section 4 and make some conclusions in Section 5.

Variational Inequality Characterization.
In this section, we derive a variational reformulation of (1) which will be used in subsequent analysis.
Since the functions  and  are all assumed to be convex, by invoking the first-order optimality condition for (1), we can easily verify that solving (1) amounts to finding a vector  * ∈ Ω of the variational inequality (VI): with where The problem ( 9) is referred to as a structured variational inequality (SVI) and has been studied extensively both in the theoretical frameworks and applications.Recently, He et al. [18,19] proposed a unified framework of proximal-like contraction methods for monotone VI.They also construct the (1/) convergence rate of the projection and contraction methods for VI with Lipschitz continuous monotone operators [20].Xu et al. [21] proposed two classes of correction methods for the SVI in which the mapping  does not have an explicit form.Yuan and Li [22] developed a logarithmicquadratic proximal (LQP) based decomposition method by applying the LQP terms to regularize the ADM subproblems.Tao and Yuan [23] established the (1/) convergence rate of ADM with LQP regularization.Bnouhachem et al. [24] studied a new inexact LQP alternating direction method by solving a series of related systems of nonlinear equations.

Some Properties of Variational Inequalities.
In this section, we summarize some basic knowledge and related definitions of variational inequalities.
Let  be a symmetric positive definite matrix; the -norm of the vector  is denoted by ‖‖  := √⟨, ⟩.In particular, when  = , ‖‖ := √⟨, ⟩ is the Euclidean norm of .Let  Ω, (⋅) be the projection operator onto Ω under the -norm; that is, From the above definition, we have the following well-known properties: The mapping  is said to be monotone with respect to Ω if The following lemma [25, page 267] states an important result which characterizes a VI by a projection equation.

Lemma 1.
Let Ω be a closed convex set in R  and let  be any positive definite matrix; then  * is a solution of VI(Ω, ) if and only if it satisfies

Theoretical Results of the Relaxation
In this section, we show that the range of the parameters , , and  can be enlarged in FPDM and SPDM, which is broader than the previous theoretical results.We also establish the global convergence of FPDM and SPDM under the new conditions for the parameters.
then for any  * = ( * ,  * ,  * )  , one has Proof.Since  * +  * =  and ( On the other hand, by setting   =  * in ( 16), we have Using the fact that  is a monotone operator, we have With rearrangement of the term (26) and using (27), we derive that which completes the proof.
Remark 4. Compared to the requirement of the parameters , ,  in [8], we now allow the step sizes , ,  to be chosen according to rule (7).In fact, the restriction on  and  proposed in [8] is that is,   ∈ (0, 1).Hence, the requirement on the parameters is significantly relaxed.
Lemma 5.For a given   = (  ,  Proof.Analogically, we have Applying the Cauchy-Schwarz inequality to the right term, we get On the other hand, by setting   =  * in (33), we have By using the monotonicity of , we have With rearrangement of the term (43), we derive that which completes the proof.
Remark 7. Compared to the requirement of the parameters , ,  in [8], we now allow the step sizes , ,  to be chosen according to the rule (8).In fact, the restriction on  and  proposed in [8] is that is,   ∈ (0, 1).Hence, the requirement on the parameters is significantly relaxed.

The Convergence.
In this subsection, we give the main convergence theorem of the FPDM and SPDM under the new required parameters conditions.
Proof.Theorem 3 (resp.,Theorem 6) means that the sequence {  } generated is Fejér monotone with respect to the solution set Ω * and the assertion follows immediately by using the property of Fejér monotonicity.

Numerical Experiments
In this section, we report the sensitivity of the involved parameters , ,  of FPDM on the stable principal component pursuit problem (SPCP).Since SPDM is the extended version of FPDM and the sensitivity results of SPDM are similar to those of FPDM, we omit the numerical results of SPDM for the sake of succinctness.The problem tested is from Example 2 of [26].Codes were all written in Matlab 2009b and all programs were run on HP notebook with Intel Core CPU 2.0 GHZ and 2 G memory.SPCP arising from compressed sensing seeks to decompose a given observation matrix  into the sum of three matrices:  :=  +  + , where  is a nonnegative and lowrank matrix,  is a sparse matrix, and  is a noise matrix.The model of SPCP can be cast as min ,, where ‖ ⋅ ‖ * is the so-called nuclear norm (the sum of all singular values), ‖ ⋅ ‖ 1 is the  1 norm, and I(⋅) is an indicator function.
Following the procedure described in [26], by introducing an auxiliary variable , grouping  and  as one big block [; ], and grouping  and  as another big block [; ], (50) can be reformulated as the standard form of (1) as follows: min ,,, There are two main advantages of FPDM applied to SPCP.First, all the generated minimizations in (52a)-(52d) have closed-form solutions.Second, the subproblems (52a)-(52d) are highly parallel, making FPDM appealing for parallel or distributed computing.Now, we elaborate on the strategy of solving the resulting subproblems at each iteration.
(i) The -subproblem (52a) amounts to evaluate the proximal operator of the nuclear norm function and is given by the matrix shrinkage operation where the matrix shrinkage operator MatShrink(, ) ( > 0) is defined as and Diag()  is the SVD of matrix .
In our experiment, we generate the data of (50) randomly in the same way as [26].For given ,  < , the  ×  rank- matrix  * was generated by  * =  1   2 , where  1 and  2 are  ×  random matrices whose entries are independently and identically (i.i.d.) uniformly distributed in [0, 1].Note that, in this experiment,  * is a componentwise nonnegative and low-rank matrix we want to recover.The support of the sparse matrix  * was chosen uniformly and randomly, and the nonzero entries of  * were i.i.d.uniformly in the interval [−500, 500].The entries of matrix  * for noise were generated as i.i.d.Gaussian with standard deviation 10 −4 .
As in [26], we set  :=  * +  * +  * ; we chose  := 1/√.The initial iterate for FPDM is For each instance, we randomly created ten examples, so the results were averaged over ten runs.The computational results are presented in Table 1.For a different instance, the value of , , and  chosen to satisfy the condition (7), respectively.It can be seen that if we choose  a little smaller than 2‖  ‖ and  a little larger than 2‖  ‖, the numerical performance of FPDM with the new selected parameters shows better than the case, where  = 2‖  ‖,  = 2‖  ‖.

Table 1 :
Numerical results for stable principal component pursuit problem.Rank  = 0.01, Card  = 0.01 50 2 1 2 2 Rank  = 0.03, Card  = 0.03 50 2 1 2 2 the global convergence of the new scheme under the new conditions.Preliminary numerical experiments on the stable principal component pursuit problem testify to the advantages of the enlargement.