We design a novel preconditioned alternating direction method for solving a class of bilinear programming problems, where each subproblem is solved by adding a positive-definite regularization term with a proximal parameter. By the aid of the variational inequality, the global convergence of the proposed method is analyzed and a worst-case O(1/t) convergence rate in an ergodic sense is established. Several preliminary numerical examples, including the Markowitz portfolio optimization problem, are also tested to verify the performance of the proposed method.
1. Introduction
Let R, Rn, Rm×n be the set of real numbers, the set of n-dimensional real column vectors, and the set of m×n real matrices, respectively. For any x,y∈Rn, we use the symbol 〈x,y〉=xTy to denote their inner product and the symbol x=x,x to stand for the Euclidean norm of x, where the superscript T is the transpose. Symbol In is n×n identity matrix, which is simply denoted by I with proper dimension in the context. Consider the following generalized bilinear programming:(1)minfx∣A1x=b1,A2x=b2,x∈X∩Y,where f(x):Rn→R is a closed proper convex function (possibly nonsmooth); Ai∈Rmi×n,bi∈Rmi(i=1,2) are given matrices and vectors, respectively; X and Y are closed convex sets with nonempty intersection. Throughout this article, we assume that the solution set of (1) is nonempty.
The bilinear programming (1) arises in many applications, for instance, the Linear-Max-Min problem, the Location-Allocation problem, and the classic Markowitz portfolio optimization problem; see, for example, [1, 2] for more details. As an extension of the linear programming, problem (1) can be recast as the following existing model:(2)minfx∣Ax=b,x∈Ωwith(3)A=A1A2∈Rm1+m2×n,b=b1b2∈Rm1+m2,Ω=X∩Y,which can be solved by some classical optimization methods if the constrained set Ω is easy, such as the proximal point algorithm [3] and the augmented Lagrangian method [4]. However, using such transformations will lead to a large-scale coefficient matrix and increase the computational complexity and storage requirement. In particular, when sets X and Y are complex or the objective function is not well defined, we would better deal with the variables separately and make the best of the structure properties of the given sets.
Next, we focus on developing a preconditioned alternating direction method of multipliers (P-ADMM) for solving (1), where each subproblem can solved by adding a proximal regularization term and the Lagrange multipliers can be updated by using some preconditioned symmetric positive-definite (SPD) matrices. By the aid of new variables x1 and x2, problem (1) is firstly reformulated as follows:(4)minf1x1+f2x2s.t.A1x1=b1,A2x2=b2,x1-x2=0,x1∈X,x2∈Y,where f1(x1)=f(x1), f2(x2)=f(x2). An obvious advantage of such reformulation is that the given sets X and Y can be treated separately instead of regarding them as a whole. Given symmetric positive-definite matrices P1∈Rm1×m1 and P2∈Rm2×m2, the augmented Lagrangian function of (4) is given by (5)Lβx1,x2,λ=Lx1,x2,λ+β2A1x1-b1P12+A2x2-b2P22+x1-x22,where β>0 is a penalty parameter and(6)Lx1,x2,λ=f1x1+f2x2-λ1,A1x1-b1-λ2,A2x2-b2-λ3,x1-x2is the Lagrangian function of problem (4) with a multiplier λ≔(λ1,λ2,λ3)∈Rm1×Rm2×Rn. Then, the introduced P-ADMM obeys the following iterative scheme:(7)x1k+1=argminx1∈XLβx1,x2k,λk+σ1β2x1-x1k2,x2k+1=argminx2∈YLβx1k,x2,λk+σ2β2x2-x2k2,λ1k+1=λ1k-βP1A1x1k+1-b1,λ2k+1=λ2k-βP2A2x2k+1-b2,λ3k+1=λ3k-βx1k+1-x2k+1,where σ1∈(1,+∞), σ2∈(1,+∞) are two independent proximal parameters that control the proximity of the new iterative value to the last one; see, for example, [5] for more explanations.
Noting that scheme (7) can be regarded as an extended alternating direction method of multipliers (ADMM) of [6], because the two parameters σ1, σ2 are independent instead of the same value and all the preconditioned matrices are not always identity matrices. For excellent reviews of the ADMM, we refer the reader to, for example, [6–14] and the references therein, and also the recent published symmetric ADMM with larger step sizes in [15] which is an all-sided work for the two-block separable convex minimization problem. Besides, Goldstein et al. [16] developed a Nesterov’s accelerated ADMM for problem (2) with partitions: (8)fx=f1x1+f2x2;x=x1,x2T;A=A1,A2,where the global convergence bounds were derived in terms of the dual objective to the original under the assumption that the two objectives were strongly convex. In 2016, He et al. [6] proved that both the Jacobian decomposition of the augmented Lagrangian method and the proximal point method were equivalent for solving the multiblock separable convex programming with one linear equality constraint, that is, problem (2) with partitions: (9)fx=f1x1+⋯+fmxm;x=x1,…,xmT;A=A1,…,Am.Although there are lots of results about ADMM, most of concerned problems are still with one linear equality constraint, and research results of a preconditioned ADMM for (1) are few as far as we know.
Contributions of this paper are summarized as two aspects. One is that we introduce a novel preconditioned alternating direction method for solving the bilinear programming problem (1), and we prove that the constructed method is global convergent with a worst-case O(1/t) convergence rate in an ergodic sense. Another contribution is that several large-scale practical examples are tested to show the effectiveness of the proposed method. In Section 2, we analyze the convergence of the proposed method in detail. In Section 3, we first introduce a linearization technique for solving the involved subproblems of the P-ADMM (7) and then carry out some numerical experiments about the large-scale quadratic programming with two linear equality constraints. Finally, we conclude the paper in Section 4.
2. Convergence Analysis
By making use of the variational inequality, this section presents a unified framework to characterize the solution set of the reformulated problem (4) and the optimality conditions of the involved subproblems in (7). Moreover, the global convergence and the worst-case O(1/t) convergence rate of the proposed method are analyzed in detail. We begin with a basic lemma given in [15].
Lemma 1.
Let f(x):Rm→R and h(x):Rm→R be convex functions defined on a closed convex set Ω⊂Rm and h(x) be differentiable. Assume that the solution set of the problem minx∈Ω{f(x)+h(x)} is nonempty. Then we have (10)x∗=argminfx+hx∣x∈Ω⟺x∗∈Ω,fx-fx∗+x-x∗T∇hx∗≥0,∀x∈Ω.
Any tuple (x1∗,x2∗,λ∗) is called a saddle point of Lagrangian function (6) if it satisfies (11)Lx1∗,x2∗,λ≤Lx1∗,x2∗,λ∗≤Lx1,x2,λ∗,which implies that finding a saddle point of L(x1,x2,λ) is equivalent to finding a point (12)w∗=x1∗,x2∗,λ1∗,λ2∗,λ3∗∈M=X×Y×Rm1×Rm2×Rnsuch that (13)f1x1-f1x1∗+x1-x1∗,-A1Tλ1∗-λ3∗≥0,∀x1∈X,f2x2-f2x2∗+x2-x2∗,-A2Tλ2∗+λ3∗≥0,∀x2∈Y,λ1-λ1∗,A1x1∗-b1≥0,∀λ1∈Rm1,λ2-λ2∗,A2x2∗-b2≥0,∀λ2∈Rm2,λ3-λ3∗,x1∗-x2∗≥0,∀λ3∈Rn.By rewriting the above inequalities as a compact variational inequality (VI), we have(14)VIf,F,M:fu-fu∗+w-w∗,Fw∗≥0,∀w∈M,where (15)fu=f1x1+f2x2,u=x1x2,w=x1x2λ1λ2λ3,Fw=-A1Tλ1-λ3-A2Tλ2+λ3A1x1-b1A2x2-b2x1-x2.Note that the mapping F(w) is skew-symmetric, so the following fundamental property holds:(16)w-w^,Fw-Fw^=0,∀w,w^∈M.By the assumption that the solution set of (1) is nonempty, the solution set M∗ of VI(f,F,M) is also nonempty. The next theorem describes a concrete way to characterizing the set M∗, whose proof is the same as that of Theorem 2 [10] and is omitted here.
Theorem 2.
The solution set of VI(f,F,M) in (14) is convex and can be expressed as(17)M∗=⋂w∈Mw^∈M∣fu-fu^+w-w^,Fw≥0.
For any w∈K(w^)=w∈M∣w-w^≤1, Theorem 2 shows that if (18)supfu^-fu+w^-w,Fw≤ϵ,then the vector w^∈M is called an ϵ-approximate solution of VI(f,F,M), where ϵ>0 is an accuracy, especially ϵ=O(1/t).
Lemma 3.
Let the sequence {wk+1} be generated by the algorithm P-ADMM. Then we have(19)fu-fuk+1+w-wk+1,Fwk+1+Gwk+1-wk≥0,∀w∈M,where (20)G=σ1βIβI000βIσ2βI000001βP1-1000001βP2-1000001βI,wk=x1kx2kλ1kλ2kλ3k.
Proof.
Applying Lemma 1, the optimality conditions of the two subproblems in (7) are(21)f1x1-f1x1k+1+x1-x1k+1,-A1Tλ1k-λ3k+βA1TP1A1x1k+1-b1+βx1k+1-x2k+σ1βx1k+1-x1k≥0,∀x1∈X,x1k+1∈X,f2x2-f2x2k+1+x2-x2k+1,-A2Tλ2k+λ3k+βA2TP2A2x2k+1-b2-βx1k-x2k+1+σ2βx2k+1-x2k≥0,∀x2∈Y,x2k+1∈Y.Since the update of the Lagrange multipliers in (7) satisfies(22)λ1k=λ1k+1-βP1A1x1k+1-b1,λ2k=λ2k+1-βP2A2x2k+1-b2,λ3k=λ3k+1-βx1k+1-x2k+1,substituting (22) into (21) we obtain(23)f1x1-f1x1k+1+x1-x1k+1,-A1Tλ1k+1-λ3k+1+σ1βx1k+1-x1k+βx2k+1-x2k≥0,f2x2-f2x2k+1+x2-x2k+1,-A2Tλ2k+1+λ3k+1+βx1k+1-x1k+σ2βx2k+1-x2k≥0.Notice that (22) can be rewritten as(24)λ1-λ1k+1,A1x1k+1-b1+1βP1-1λ1k+1-λ1k,∀λ1∈Rm1,λ1k+1∈Rm1,λ2-λ2k+1,A2x2k+1-b2+1βP2-1λ2k+1-λ2k,∀λ2∈Rm2,λ2k+1∈Rm2,λ3-λ3k+1,x1k+1-x2k+1+1βλ3k+1-λ3k,∀λ3∈Rn,λ3k+1∈Rn.Combining (23) and (24), we immediately complete the proof.
Note that matrix G in Lemma 3 is strictly SPD, because the upper-left 2×2 block matrix is SPD for any σ1>1, σ2>1 and the lower-right 3×3 diagonal matrix is SPD from the symmetric positivity of the matrices P1,P2 and β>0. Compared with the inequalities (14) and (19), the key to prove the convergence of the algorithm P-ADMM is to verify that the cross term of (19) converges to zero, that is,(25)limk→∞w-wk+1,Gwk+1-wk=0,∀w∈M.In other words, the sequence {wk-w∗} would be contractive under the weighted matrix G. In what follows, we will show such assertion by using the definition wG=wTGw for any w∈Rm1+m2+3n.
Lemma 4.
The sequence {wk+1} generated by the algorithm P-ADMM satisfies(26)wk+1-w∗G2≤wk-w∗G2-wk-wk+1G2.
Proof.
Setting w=w∗ in (19), it follows that (27)w∗-wk+1,Gwk+1-wk≥fuk+1-fu∗+wk+1-w∗,Fwk+1.By making use of (16) and (14), we get (28)fuk+1-fu∗+wk+1-w∗,Fwk+1≥fuk+1-fu∗+wk+1-w∗,Fw∗≥0,which leads to(29)wk+1-w∗,Gwk-wk+1≥0.Based on (29) and the symmetric positivity of G, we can obtain (30)wk-w∗G2=wk-wk+1+wk+1-w∗G2=wk-wk+1G2+wk+1-w∗G2+2wk+1-w∗,Gwk-wk+1≥wk-wk+1G2+wk+1-w∗G2.
Theorem 5.
Let the sequence {wk+1} be generated by the algorithm P-ADMM, then the following assertions hold:
limk→∞wk-wk+1=0.
The sequence {wk+1} is bounded.
Any accumulation point of {wk+1} is a solution point of VIf,F,M.
There exists w∞∈M∗ such that limk→∞wk+1=w∞.
Proof.
Summing the inequality (26) over k=0,1,2,…,∞, we have (31)∑k=0∞wk-wk+1G2≤w0-w∗G2,which implies limk→∞wk-wk+1=0 because of the symmetric positivity of the matrix G. The assertion (b) is evident followed by (a). By taking the limit of (19) and using the assertion (a), we get (32)limk→∞fu-fuk+1+w-wk+1,Fwk+1≥0,∀w∈M,which shows that limk→∞wk+1 is a solution point of VI(f,F,M), that is, the assertion (c) holds.
Let w∞ be an accumulation point of {wk+1}. Then the third assertion implies that w∞∈M∗ and (33)wk+1-w∞G2≤wk-w∞G2-wk-wk+1G2.Using the above inequality together with the assertion (a), the proof of (d) is completed.
Theorem 6.
For any integer t>0 and the sequence {wk+1} generated by the P-ADMM (7), let(34)w^t=1t+1∑k=0twk+1.Then it holds that (35)fu^t-fu+w^t-w,Fw≤12t+1w0-wG2,∀w∈M.
Proof.
Clearly, w^t∈M since it can be treated as a convex combination of wk+1(k=0,1,…,t). Substituting (16) into (19), we deduce that(36)fu-fuk+1+w-wk+1,Fw≥wk+1-w,Gwk+1-wk,∀w∈M.By utilizing an identity (37)2a-b,Gc-d=a-dG2-a-cG2+c-bG2-d-bG2,we obtain (38)wk+1-w,Gwk+1-wk=12wk+1-wG2+wk+1-wkG2-wk-wG2≥12wk+1-wG2-wk-wG2,which makes (36) become (39)fu-fuk+1+w-wk+1,Fw+12wk-wG2≥12wk+1-wG2.Summing the above inequality over k=0,1,…,t, we have (40)t+1fu-∑k=0tfuk+1+t+1w-∑k=0twk+1,Fw+12w0-wG2≥0⟺1t+1∑k=0tfuk+1-fu+w^t-w,Fw≤12t+1w0-wG2.Since f(u) is convex and u^t=(1/(t+1))∑k=0tuk+1, so it holds that (41)fu^t≤1t+1∑k=0tfuk+1.Substituting it into (40), the proof is completed.
Remark 7.
Theorem 5 shows that the P-ADMM (7) is globally convergent. And Theorem 6 tells us that for any given compact set K⊂M and η≔supw∈Kw0-wG2, the vector w^t must satisfy (42)supw∈Kfu^t-fu+w^t-w,Fw≤η2t+1,which shows that the proposed method converges in a worst-case O(1/t) rate in an ergodic sense.
Remark 8.
The penalty parameter β in (7) can be updated by the formula βk+1=τβk with τ>0, and it is a constant when taking τ=1. The preconditioned matrices P1 and P2 are usually chosen as the identity matrix, the diagonal matrix with positive diagonal entries, or the tridiagonal matrix.
3. Numerical Experiments
In this section, we investigate the feasibility and efficiency of the proposed method by some numerical experiments about the quadratic programming model with two linear equality constraints. The codes of the algorithm P-ADMM are written in MATLAB 7.10 (R2010a) and the experiments are carried out on a PC with Intel Core i5 processor (3.3 GHz) with 4 GB memory. Inspired by Theorem 5, we take an easily implementable stopping criterion for the proposed method, that is, (43)ERRk=maxx1k-x1k+1∞,x2k-x2k+1∞≤tol,where tol is the given tolerance and xik is the kth iteration generated by scheme (7).
To avoid the case that the subproblems of (7) have no explicit solution, the two preconditioned matrices are simply chosen as the identity matrix, and we then can use a linearized strategy to accelerate the convergence of solving the subproblems. Without loss of generality, we take the x1-subproblem, for example. In such case, the x1-subproblem is equivalent to(44)x1k+1=argminx1∈Xf1x1+β2A1x1-b1-λ1kβ2+x1-x2k-λ3kβ2+σ1x1-x1k2=argminx1∈Xf1x1+β2Ax1-a2,where (45)A=A1Iσ1I∈Rm1+2n×n,a=b1+λ1kβx2k+λ3kβσ1x1k∈Rm1+2n.By the well-known Taylor formula in mathematical analysis, the quadratic term Ax1-a2/2 can be approximated by (46)12Ax1-a2≈12Ax1k-a12+gk,x1-x1k+12ηx1-x1k2,where η>0 is a proximal factor and gk=AT(Ax1k-a) is the gradient of the quadratic term at x1k. Hence, the objective function in (44) is of the equivalent form:(47)f1x1+β2ηx1-x1k-ηg1k2,which makes the x1-subproblem have closed solution form. The x2-subproblem can be also tackled in a similar way as the x1-subproblem. For more cases that are analogous to (47), the explicit solution form can be dated back to Lemmas 1 and 3 given in [17].
In what follows, the penalty parameter β is updated by the formula βk+1=5βk with β0=0.5e-3, the proximal parameters are chosen as (σ1,σ2)=(2,2),η=0.25e-2, and the iterative variables are initialized as fixed values (x10,x20)=(ones(n,1),ones(n,1)). Generally speaking, the quadratic programming with two linear equality constraints is of the following form:(48)min12xTHx+cTxs.t.A1x=b1,A2x=b2,x∈Ω,where H∈Rn×n is a positive semidefinite matrix. Model (48) includes the classic Markowitz portfolio optimization problem as a special case; see, for example, [2] and Example 10.
Example 9.
Consider model (48) with a simple case Ω=Rn, where the given data are randomly generated by the following MATLAB codes: (49)H1=randnn;H=H1T∗H1;c=randn,1;A1=randnn;A2=randnn;b1=randn,1;b2=randn,1.
For this example, Table 1 reports several experimental results of Example 9 with n=200 by the algorithm P-ADMM with different tolerance error, including the number of iterations (denoted by “IT”), the CPU time (denoted by “CPU”), the iterative error of the solution (denoted by “ERR”), and the residual of the objective (denoted by “ROB”). Figure 1 still draws the convergence curves of the residual of the objective and the iterative error of the solution under the tolerance tol=1.0×10-15, respectively.
Experimental results of Example 9 by the P-ADMM.
tol
IT
CPU
ERR
ROB
1e-1
8
0.0265
0.0273
-11.9488
1e-3
12
0.0593
9.7241e-4
-12.0118
1e-5
19
0.0518
7.7188e-6
-12.0160
1e-7
26
0.0646
6.0438e-8
-12.0161
1e-9
32
0.0846
9.4608e-10
-12.0161
1e-11
39
0.0790
7.4072e-12
-12.0161
1e-13
46
0.1039
5.7954e-14
-12.0161
1e-15
61
0.1232
8.8818e-16
-12.0161
Convergence curves of the residual ROB (a) and the iterative error ERR (b).
From Table 1, we can see that when using the P-ADMM to solve Example 9, the CPU time cost is less than 0.2 seconds and the number of the iterations is not bigger than 65 steps. The obtained results, including the residual error ERR listed in Table 1 and the convergence curves depicted in Figure 1, verify the feasibility and efficiency of the P-ADMM scheme for solving the small-scale problem. Besides, the last two columns of Table 1 imply that we can choose a relatively tiny stopping criterion (e.g., tol=1.0×10-5) to obtain nearly the same value of the objective and to save the CPU time.
Example 10.
Consider model (48) with Ω={x∈Rn∣x≥0} and (50)c=zerosn,1;A1=ones1,n;b1=1;A2=r∈R1×n;b2=p∈R+.
Then model (48) immediately becomes the Markowitz portfolio optimization problem:(51)min12xTHxs.t.eTx=1,rTx=p,x≥0,where the matrix H∈Rn×n stands for the covariance matrix of the return on the n assets in the portfolio, the variable x denotes the vector of portfolio weights that represent the amount of capital to be invested in each asset, r is the vector of expected returns of the different assets, and p is a given total return. In such case, the solution of (51) is sparse, which also verifies why some researchers try to find the sparse solution of the problem (51), see, for example, [18].
For Example 10, we test eleven large-scale experiments, in which matrix H is generated in the same way as that of Example 9 and r,p are, respectively, generated by the MATLAB inner functions rand(1,n) and rand(1,1). Table 2 reports some experimental results of this example with different dimension n∈[800,2800], where the tolerance error of the algorithm is set as tol=10-5. The notations IT, CPU, ERR, and ROB are the same meanings as mentioned in Example 9, the number of the nonzero entries of the solution x∗ is denoted by x∗0, and the sparsity ratio is defined as x∗0/n×100%. The convergence curves of the residual of the objective and the iterative error of the solution for Example 10 with n∈[800,2800] are depicted in Figure 2.
Experimental results of Example 10 by P-ADMM.
n
IT
CPU
ERR
ROB
x∗0
x∗0/n
800
19
0.7516
9.5680e-6
2.5181e-4
444
55.50%
1000
19
1.2269
8.4815e-6
1.7696e-4
548
54.80%
1200
20
2.0222
7.4162e-6
3.1982e-4
651
54.25%
1400
22
3.4151
5.8193e-6
4.3066e-4
792
56.57%
1600
20
4.3842
6.1633e-6
0.0089
877
54.81%
1800
19
5.4484
7.5718e-6
0.0015
940
52.22%
2000
19
6.8253
8.5260e-6
0.0207
1070
53.50%
2200
19
8.8434
9.5873e-6
0.0039
1180
53.64%
2400
20
11.1692
9.6233e-6
0.0482
1302
54.25%
2600
20
14.0278
5.9587e-6
0.0289
1387
53.35%
2800
18
15.5509
6.4949e-6
0.0067
1507
53.82%
Convergence curves of the residual ROB (a) and the iterative error ERR (b).
An outstanding observation from Table 2 is that both the number of the iteration (<25) and the CPU time (<16 s) are small, and the CPU time increases along with the increase of the dimension n of x. Another observation is that the sparsity ratio of the solution is over 50%, which implies that more than half of the assets are not necessary to be invested and also provides some useful suggestions for an investor in finance. Both Table 2 and Figure 2 show that the P-ADMM (7) is robust for solving the large-scale Markowitz portfolio optimization problem.
4. Conclusion
Instead of studying the optimization problem with one linear constraint, in this paper, we concentrate our attentions on the generalized bilinear programming problem and develop a preconditioned alternating direction method. Based on the traditional proof of the ADMM, the global convergence of the proposed method is proved and the worst-case O(1/t) convergence rate in an ergodic sense is established. In order to avoid the case that the subproblem has no explicit solution, we still use a linearized strategy to approximately tackle the involved subproblems in the proposed method. Numerical results show that the proposed method is feasible and efficient.
Nowadays, many researchers are interested in the ADMM which can be regarded as an alternating update method for the variables and Lagrange multipliers for the separable convex programming. For the nonseparable convex optimization problem, the famous Taylor formula motivates us to use the first-order approximation to linearly deal with the objective function of the problem, and then one can design the corresponding ADMM to solve it. Noticing that the proposed method in current paper can be applied to the above scenarios and can be also used to solve the matrix minimization problem with two linear constraints.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
KonnoH.ThachP. T.TuyH.MarkowitzH. M.MoreauJ.-J.Proximite et dualite dans un espace hilbertienHestenesM. R.Multiplier and gradient methodsRockafellarR. T.Augmented Lagrangians and applications of the proximal point algorithm in convex programmingHeB.XuH.-K.YuanX.On the proximal Jacobian decomposition of ALM for multiple-block separable convex minimization problems and its relationship to ADMMBaiJ. C.ZhangH. C.LiJ. C.XuF. M.Generalized symmetric alternating direction method for separable convex programminghttp://www.optimization-online.org/DB_FILE/2016/10/5699.pdfDengW.YinW.On the global and linear convergence of the generalized alternating direction method of multipliersFangE. X.HeB.LiuH.YuanX.Generalized alternating direction method of multipliers: new theoretical insights and applicationsHeB.YuanX.On the O(1/n) convergence rate of the Douglas-Rachford alternating direction methodHeB.YuanX.Block-wise alternating direction method of multipliers for multiple-block convex programming and beyondSunD.TohK.-C.YangL.A convergent 3-block semiproximal alternating direction method of multipliers for conic programming with 4-type constraintsWangJ. J.SongW.An algorithm twisted from generalized ADMM for multi-block separable convex minimization modelsYangJ.YinW.ZhangY.WangY.A fast algorithm for edge-preserving variational multichannel image restorationHeB.MaF.YuanX.Convergence study on the symmetric version of ADMM with larger step sizesGoldsteinT.O'DonoghueB.SetzerS.BaraniukR.Fast alternating direction optimization methodsTaoM.YuanX.Recovering low-rank and sparse components of matrices from incomplete and noisy observationsChenX.XiangS.Sparse solutions of linear complementarity problems