Preconditioned ADMM for a Class of Bilinear Programming Problems

We design a novel preconditioned alternating direction method for solving a class of bilinear programming problems, where each subproblem is solved by adding a positive-definite regularization term with a proximal parameter. By the aid of the variational inequality, the global convergence of the proposed method is analyzed and a worst-case O(1/t) convergence rate in an ergodic sense is established. Several preliminary numerical examples, including the Markowitz portfolio optimization problem, are also tested to verify the performance of the proposed method.


Introduction
Let R, R  , R × be the set of real numbers, the set of dimensional real column vectors, and the set of  ×  real matrices, respectively.For any ,  ∈ R  , we use the symbol ⟨, ⟩ =    to denote their inner product and the symbol ‖‖ = √⟨, ⟩ to stand for the Euclidean norm of , where the superscript  is the transpose.Symbol   is  ×  identity matrix, which is simply denoted by  with proper dimension in the context.Consider the following generalized bilinear programming: where () : R  → R is a closed proper convex function (possibly nonsmooth);   ∈ R   × ,   ∈ R   ( = 1, 2) are given matrices and vectors, respectively; X and Y are closed convex sets with nonempty intersection.Throughout this article, we assume that the solution set of (1) is nonempty.The bilinear programming (1) arises in many applications, for instance, the Linear-Max-Min problem, the Location-Allocation problem, and the classic Markowitz portfolio optimization problem; see, for example, [1,2] for more details.
As an extension of the linear programming, problem (1) can be recast as the following existing model: min { () |  = ,  ∈ Ω} (2) with which can be solved by some classical optimization methods if the constrained set Ω is easy, such as the proximal point algorithm [3] and the augmented Lagrangian method [4].However, using such transformations will lead to a large-scale coefficient matrix and increase the computational complexity and storage requirement.In particular, when sets X and Y are complex or the objective function is not well defined, we would better deal with the variables separately and make the best of the structure properties of the given sets.
Next, we focus on developing a preconditioned alternating direction method of multipliers (P-ADMM) for solving (1), where each subproblem can solved by adding a proximal regularization term and the Lagrange multipliers can be updated by using some preconditioned symmetric positivedefinite (SPD) matrices.By the aid of new variables  1 and  2 , problem (1) is firstly reformulated as follows: where  1 ( 1 ) = ( 1 ),  2 ( 2 ) = ( 2 ).An obvious advantage of such reformulation is that the given sets X and Y can be treated separately instead of regarding them as a whole.Given symmetric positive-definite matrices  1 ∈ R  1 × 1 and  2 ∈ R  2 × 2 , the augmented Lagrangian function of ( 4) is given by where  > 0 is a penalty parameter and is the Lagrangian function of problem (4) with a multiplier Then, the introduced P-ADMM obeys the following iterative scheme: where  1 ∈ (1, +∞),  2 ∈ (1, +∞) are two independent proximal parameters that control the proximity of the new iterative value to the last one; see, for example, [5] for more explanations.
Noting that scheme (7) can be regarded as an extended alternating direction method of multipliers (ADMM) of [6], because the two parameters  1 ,  2 are independent instead of the same value and all the preconditioned matrices are not always identity matrices.For excellent reviews of the ADMM, we refer the reader to, for example, [6][7][8][9][10][11][12][13][14] and the references therein, and also the recent published symmetric ADMM with larger step sizes in [15] which is an allsided work for the two-block separable convex minimization problem.Besides, Goldstein et al. [16] developed a Nesterov's accelerated ADMM for problem (2) with partitions: where the global convergence bounds were derived in terms of the dual objective to the original under the assumption that the two objectives were strongly convex.In 2016, He et al. [6] proved that both the Jacobian decomposition of the augmented Lagrangian method and the proximal point method were equivalent for solving the multiblock separable convex programming with one linear equality constraint, that is, problem (2) with partitions: Although there are lots of results about ADMM, most of concerned problems are still with one linear equality constraint, and research results of a preconditioned ADMM for (1) are few as far as we know.
Contributions of this paper are summarized as two aspects.One is that we introduce a novel preconditioned alternating direction method for solving the bilinear programming problem (1), and we prove that the constructed method is global convergent with a worst-case O(1/) convergence rate in an ergodic sense.Another contribution is that several large-scale practical examples are tested to show the effectiveness of the proposed method.In Section 2, we analyze the convergence of the proposed method in detail.In Section 3, we first introduce a linearization technique for solving the involved subproblems of the P-ADMM (7) and then carry out some numerical experiments about the large-scale quadratic programming with two linear equality constraints.Finally, we conclude the paper in Section 4.

Convergence Analysis
By making use of the variational inequality, this section presents a unified framework to characterize the solution set of the reformulated problem (4) and the optimality conditions of the involved subproblems in (7).Moreover, the global convergence and the worst-case O(1/t) convergence rate of the proposed method are analyzed in detail.We begin with a basic lemma given in [15].Lemma 1.Let () : R  → R and ℎ() : R  → R be convex functions defined on a closed convex set Ω ⊂ R  and ℎ() be differentiable.Assume that the solution set of the problem min ∈Ω {() + ℎ()} is nonempty.Then we have Any tuple which implies that finding a saddle point of ( 1 ,  2 , ) is equivalent to finding a point such that By rewriting the above inequalities as a compact variational inequality (VI), we have where ) ) ) , Note that the mapping () is skew-symmetric, so the following fundamental property holds: By the assumption that the solution set of ( 1) is nonempty, the solution set M * of VI(, , M) is also nonempty.The next theorem describes a concrete way to characterizing the set M * , whose proof is the same as that of Theorem 2 [10] and is omitted here.
Theorem 2. The solution set of (, , M) in ( 14) is convex and can be expressed as For any then the vector ŵ ∈ M is called an -approximate solution of VI(, , M), where  > 0 is an accuracy, especially  = O(1/).
Lemma 3. Let the sequence { +1 } be generated by the algorithm P-ADMM.Then we have where ) . (20) Proof.Applying Lemma 1, the optimality conditions of the two subproblems in (7) are Since the update of the Lagrange multipliers in (7) satisfies substituting ( 22) into (21) we obtain Notice that (22) can be rewritten as Combining ( 23) and ( 24), we immediately complete the proof.
Note that matrix  in Lemma 3 is strictly SPD, because the upper-left 2 × 2 block matrix is SPD for any  1 > 1,  2 > 1 and the lower-right 3 × 3 diagonal matrix is SPD from the symmetric positivity of the matrices  1 ,  2 and  > 0. Compared with the inequalities ( 14) and ( 19), the key to prove the convergence of the algorithm P-ADMM is to verify that the cross term of (19) converges to zero, that is, In other words, the sequence {  −  * } would be contractive under the weighted matrix .In what follows, we will show such assertion by using the definition ‖‖  = √    for any  ∈ R By making use of ( 16) and ( 14), we get which leads to (d) There exists  ∞ ∈ M * such that lim →∞  +1 =  ∞ .
Proof.Summing the inequality (26) over  = 0, 1, 2, . . ., ∞, we have which implies lim →∞ (  −  +1 ) = 0 because of the symmetric positivity of the matrix .The assertion (b) is evident followed by (a).By taking the limit of (19) and using the assertion (a), we get which shows that lim →∞  +1 is a solution point of VI(, , M), that is, the assertion (c) holds.Let  ∞ be an accumulation point of { +1 }.Then the third assertion implies that  ∞ ∈ M * and Using the above inequality together with the assertion (a), the proof of (d) is completed.
which shows that the proposed method converges in a worstcase O(1/) rate in an ergodic sense.

Mathematical Problems in Engineering
Remark 8.The penalty parameter  in ( 7) can be updated by the formula  +1 =   with  > 0, and it is a constant when taking  = 1.The preconditioned matrices  1 and  2 are usually chosen as the identity matrix, the diagonal matrix with positive diagonal entries, or the tridiagonal matrix.

Numerical Experiments
In this section, we investigate the feasibility and efficiency of the proposed method by some numerical experiments about the quadratic programming model with two linear equality constraints.The codes of the algorithm P-ADMM are written in MATLAB 7.10 (R2010a) and the experiments are carried out on a PC with Intel Core i5 processor (3.3 GHz) with 4 GB memory.Inspired by Theorem 5, we take an easily implementable stopping criterion for the proposed method, that is, where tol is the given tolerance and    is the th iteration generated by scheme (7).
To avoid the case that the subproblems of ( 7) have no explicit solution, the two preconditioned matrices are simply chosen as the identity matrix, and we then can use a linstrategy to accelerate the convergence of solving the subproblems.Without loss of generality, we take the  1subproblem, for example.In such case, the  1 -subproblem is equivalent to where By the well-known Taylor formula in mathematical analysis, the quadratic term ‖A 1 − a‖ 2 /2 can be approximated by where  > 0 is a proximal factor and   = A  (A  1 − a) is the gradient of the quadratic term at   1 .Hence, the objective function in (44) is of the equivalent form: which makes the  1 -subproblem have closed solution form.
The  2 -subproblem can be also tackled in a similar way as the  1 -subproblem.For more cases that are analogous to (47), the explicit solution form can be dated back to Lemmas 1 and 3 given in [17].
In what follows, the penalty parameter  is updated by the formula  +1 = 5  with  0 = 0.5 − 3, the proximal parameters are chosen as ( 1 ,  2 ) = (2, 2),  = 0.25 − 2, and the iterative variables are initialized as fixed values ( 0 1 ,  0 2 ) = (ones(, 1), ones(, 1)).Generally speaking, the quadratic programming with two linear equality constraints is of the following form: where  ∈ R × is a positive semidefinite matrix.Model (48) includes the classic Markowitz portfolio optimization problem as a special case; see, for example, [2] and Example 10.For this example, Table 1 reports several experimental results of Example 9 with  = 200 by the algorithm P-ADMM with different tolerance error, including the number of iterations (denoted by "IT"), the CPU time (denoted by "CPU"), the iterative error of the solution (denoted by "ERR"), and the residual of the objective (denoted by "ROB").Figure 1 still draws the convergence curves of the residual of the objective and the iterative error of the solution under the tolerance tol = 1.0 × 10 −15 , respectively.
From Table 1, we can see that when using the P-ADMM to solve Example 9, the CPU time cost is less than 0.2 seconds and the number of the iterations is not bigger than 65 steps.The obtained results, including the residual error ERR listed in Table 1 and the convergence curves depicted in Figure 1, verify the feasibility and efficiency of the P-ADMM scheme for solving the small-scale problem.Besides, the last two columns of Table 1 imply that we can choose a relatively tiny stopping criterion (e.g., tol = 1.0 × 10 −5 ) to obtain nearly the same value of the objective and to save the CPU time.
where the matrix  ∈ R × stands for the covariance matrix of the return on the  assets in the portfolio, the variable  denotes the vector of portfolio weights that represent the amount of capital to be invested in each asset,  is the vector of expected returns of the different assets, and  is a given total return.In such case, the solution of (51) is sparse, which also verifies why some researchers try to find the sparse solution of the problem (51), see, for example, [18].For Example 10, we test eleven large-scale experiments, in which matrix  is generated in the same way as that of Example 9 and ,  are, respectively, generated by the MATLAB inner functions rand(1, ) and rand(1, 1).Table 2 reports some experimental results of this example with different dimension  ∈ [800, 2800], where the tolerance error of the algorithm is set as tol = 10 −5 .The notations IT, CPU, ERR, and ROB are the same meanings as mentioned in Example 9, the number of the nonzero entries of the solution  * is denoted by ‖ * ‖ 0 , and the sparsity ratio is defined as ‖ * ‖ 0 /×100%.The convergence curves of the residual of the objective and the iterative error of the solution for Example 10 with  ∈ [800, 2800] are depicted in Figure 2.
An outstanding observation from Table 2 is that both the number of the iteration (<25) and the CPU time (<16 s) are small, and the CPU time increases along with the increase of the dimension  of .Another observation is that the sparsity ratio of the solution is over 50%, which implies that more than half of the assets are not necessary to be invested and also provides some useful suggestions for an investor in finance.Both Table 2 and Figure 2 show that the P-ADMM

Conclusion
Instead of studying the optimization problem with one linear constraint, in this paper, we concentrate our attentions on the generalized bilinear programming problem and develop a preconditioned alternating direction method.Based on the traditional proof of the ADMM, the global convergence of the proposed method is proved and the worst-case O(1/) convergence rate in an ergodic sense is established.In order to avoid the case that the subproblem has no explicit solution, we still use a linearized strategy to approximately tackle the involved subproblems in the proposed method.Numerical results show that the proposed method is feasible and efficient.
Nowadays, many researchers are interested in the ADMM which can be regarded as an alternating update method for the variables and Lagrange multipliers for the separable convex programming.For the nonseparable convex optimization problem, the famous Taylor formula motivates us to use the first-order approximation to linearly deal with the objective function of the problem, and then one can design the corresponding ADMM to solve it.Noticing that the proposed method in current paper can be applied to the above scenarios and can be also used to solve the matrix minimization problem with two linear constraints.

Example 9 .Figure 1 :
Figure 1: Convergence curves of the residual ROB (a) and the iterative error ERR (b).

Figure 2 :
Figure 2: Convergence curves of the residual ROB (a) and the iterative error ERR (b).

( 7 )
is robust for solving the large-scale Markowitz portfolio optimization problem.

Table 1 :
Experimental results of Example 9 by the P-ADMM.

Table 2 :
Experimental results of Example 10 by P-ADMM.