A General Self-Adaptive Relaxed-PPA Method for Convex Programming with Linear Constraints

. We present an efficient method for solving linearly constrained convex programming. Our algorithmic framework employs an implementable proximal step by a slight relaxation to the subproblem of proximal point algorithm (PPA). In particular, the stepsize choice condition of our algorithm is weaker than some elegant PPA-type methods. This condition is flexible and effective. Self-adaptive strategies are proposed to improve the convergence in practice. We theoretically show under mild conditions that our method converges in a global sense. Finally, we discuss applications and perform numerical experiments which confirm the efficiency of the proposed method. Comparisons of our method with some state-of-the-art algorithms are also provided.

Assumption 1.The solution set of (1) is denoted by X * , and it is assumed to be nonempty.
Assumption 2. The objective function is simple.This means that, for a given constant , the following proximal problem admits a closed-form solution or can be solved efficiently with high precision: where  is any given vector.At first sight, this assumption seems to be quite restrictive, but this is indeed for many problems in practice.For example, nuclear norm function in matrix completion problem,  1 -norm function in basis pursuit problem, and so forth.
Many fundamental methods have been developed over the past decades to solve problem (1).Proximal point algorithm (PPA) is one of the leading approaches for solving convex optimization problems.It is earlier used for regularized linear equations and has been applied to convex optimization by Martinet [11].There are some significant theoretical achievements [12][13][14][15][16][17][18][19] in the field of PPA for convex optimization and monotone variational inequalities (VIs).Nowadays, it is still the object of intensive investigation [20] and leads to a variety of primal and dual methods.Common to PPA and its variants is the difficulty of their subproblems; this restricts the practical interest.Augmented Lagrangian method (ALM) [21] is a powerful method for linearly constrained problems.It can be regarded as a variant of PPA applied to the dual problem of (1).However, with the additional regularized term ‖ − ‖ 2 , its subproblems require an inverse operator of the form (+(1/)  ) −1 which is hard to implement in some cases.Particularly,    is general or large scale, so the computation of inverse operator may fail.Hence, ALM is not sufficiently competitive when the objective function () is "simple." Extragradient method (EGM) [22] is a practical method for (1) which employs the information of current iteration.In fact, EGM is an explicit type method and requires two calls to the gradient per iteration; therefore, it might suffer slow convergence.Recently, He and Yuan [23] proposed a contraction method based on PPA (PPA-CM) to solve (1), which is elegant and simple.Inspired by PPA-CM, a Lagrangian PPA-based (LPPA) contraction method is presented in [24] which employs an asymmetrical proximal term [25].These two PPA-based methods have nice convergence properties that are similar in many ways to PPA, but they both require a quite restrictive condition for convergence and are sensitive to the initial choice of stepsizes.
In this paper, we focus on development of PPA-type method for solving (1).Based on LPPA, we propose a general self-adaptive relaxed-PPA method (SRPPA) which is simple yet efficient.Our algorithm capitalizes certain features of PPA, hence, we term it relaxed-PPA.The proposed algorithm has several nice fronts.First, our method is a PPA-type method with asymmetrical linear term, which is clearly a different nature to classical PPA.It relaxes the jointly structure of subproblem to a tractable one.Second, we provide two simple search directions for new iterate.Third, the stepsize choice is flexible, and the condition for convergence guarantee is weaker than both PPA-CM and LPPA.Finally, simple adaptive strategies are employed to choose stepsize, and this appealing aspect is significantly important in practice.We also demonstrate that our method is relevant for various applications whose practical success is made possible by our efficient algorithm.
This paper is organized as follows.In Section 2, we provide some notations and preliminaries which are useful for subsequent analysis.In Section 3, we review some related works.The general relaxed-PPA and its variant are formally presented in Section 4. Self-adaptive strategies to choose stepsize are also described.Next, in Section 5, the convergence analysis is provided.In Section 6, we present some concrete applications of (1) and elaborate on the implementation of our method; preliminary numerical results are also reported to verify the efficiency of our proposed method.Finally, in Section 7, we conclude the paper with a discussion about the future research directions.

Preliminaries
In this section, we first establish some important notations used throughout this paper.Then, we describe the variational inequality formulation of (1) which is convenient for the convergence analysis.
() denotes the subdifferential set of the convex function (): and  ∈ () is called a subgradient of (), see [26].Let () ∈ () and () ∈ (), by the convexity of the function , we have which indicates that the mapping  is monotone.Now, we show that (1) can be characterized by a variational inequality; see, for example, [27].By attaching a Lagrange multiplier vector  ∈ Λ to the linear constraint  =  (or  ≥ ), the Lagrangian function of (1) is here, and (, ) is defined on X × Λ.Then, by the optimality condition, we can easily see that (1) amounts to finding a pair of ( * ,  * ) which satisfies where ( * ) ∈ ( * ).
the system (7) can be characterized by the following variational inequality denoted by VI(Ω, ): Recalling the monotonicity of , it is easy to get that VI(Ω, ) (9) is monotone.Since the solution set of (1) is assumed to be nonempty, the solution set of VI(Ω, ), denoted by Ω * , is also nonempty.Our analysis will be built upon this equivalent VI formulation.

The Existing Related Methods
There are basically two lines of research for VI(Ω, ) (9), either deal with it by implicit methods that are in general computationally intractable or concentrate on relaxing it with explicit methods.In this section, we first briefly review the well-known classical PPA and EGM.And then, PPA-CM [23] and LPPA [24] are discussed, which will provide motivation for our general self-adaptive relaxed-PPA.
Then, the update step is taken as follows: PPA has a nice convergence property Although classical PPA is conceptually appealing, subproblem (10) presents certain computational challenges.More specifically, primal variable  and dual variable  are tied together, and their subproblems are treated as a joint problem.
In most cases, this joint subproblem may be as difficult as the original problem (9).As a result, PPA is "conceptual" rather than implementable.It lacks capability in exploiting potential decomposable/specific structure of subproblem.Variants of classical PPA have been explored in the literature, in order to solve the proximal subproblem (10), inexactly, see, for example, [14,15,17,19].Unfortunately, inexact variants take expensive computation for obtaining approximative solutions.

The Methods Based on the Simplest Relaxation.
To overcome the drawbacks of the classical PPA, it is instinctive to relax subproblem (10) to a solvable one.The most straightforward and simplest relaxation is to replace (ũ  ) with (  ) in the proximal subproblem (10), which amounts to the following subproblem: The solution of the relaxed problem ( 13) is given by ũ =  Ω [  − (1/)(  )].It is clear that methods with such relaxation are explicit type methods.However, ũ cannot be accepted directly as the new iterate.Using the terminology "predictor-corrector, " such point can be viewed as a predictor.
Here, we list two simple methods which employ predictor ũ to obtain corrector as the new iterate.

PPA-Type Contraction
Method.The algorithms that are closely related to ours are PPA-CM [23] and LPPA [24].The PPA-CM obtains the predictor ũ by solving the following subproblem: find ( x , λ ) ∈ Ω such that where And perform the update The framework of LPPA is as follows: where And the new iterate is defined by Both procedures are simple and can solve subproblem efficiently; but their nice convergence results require a quite restrictive condition, that is;  > ‖  ‖ in PPA-CM and  > (1/2)‖  ‖ in LPPA, respectively.The stepsizes ,  are directly determined by such condition; hence, it is important to estimate ‖  ‖.Overestimation may lead to poor stepsizes and slow convergence, while underestimation may result in divergence.In addition, they are both quite sensitive to the choice of , .To overcome those drawbacks, we propose a general self-adaptive relaxed-PPA, and as mentioned earlier, it can provide improved guarantee for convergence and has potential progress in the choice of stepsize.Furthermore, selfadaptive strategies are designed to improve performance.

The General Self-Adaptive Relaxed PPA-Method
In this section, we weave together the ideas of the previous section to present general self-adaptive relaxed-PPA method (SRPPA) which is mostly inspired by LPPA [24].At first sight, the predictor applied in SRPPA is much the same as LPPA, but the stepsize choice condition for convergence is quite different; moreover, we prove that it is weaker than LPPA.Selfadaptive strategies are elaborately designed to ensure the robustness of our algorithm.Two simple yet efficient constructions of new iterate are also presented which will provide some inspirations for designing various search directions.

General
Relaxed-PPA Method.The general primal-dual relaxed-PPA method with implementable structure for (1) is summarized in Algorithm 1.Note that the order of  and  can be changed to obtain a variant, which is summarized in Algorithm 2. Our relaxed-PPA is intended to blend the implementable properties of EGM (or PCM) with the fast convergence performance of PPA.Now, it is helpful to introduce additional notations that will be used in the rest of this paper.
Let  be a positive symmetry definite matrix (we will specify it later), The relaxed-PPA described here involves two steps.First, we solve the relaxed subproblem ( * ), ( * * ) to obtain predictor, which is nice and efficient for the nature of the problem under study.Note that the -predictor step ( * ) involves minimizing  plus a convex quadratic function, and under Assumption 2, it can be efficiently solved or it admits a closed form solution.
And then, -predictor step ( * * ) is just a projection onto Λ which is tractable and computationally efficient.It is clear that the prediction step employs a Gauss-Seidel manner to update information efficiently.The correction step ($) only involves matrix-vector multiplication which is very simple and straightforward.
Remark 3. We first make some insight into the correction step in Algorithm 1.The obtained ũ plays no direct role as the new iterate.Instead, we need some kind of "corrector" defined in ($).Although matrix  in ($) is just a required positive symmetry definite, our goal here is to fully integrate the information of   and ũ to construct effective, simple search direction  −1 (  − ũ ) for the corrector.Based on this consideration, we elaborately provide two simple choices of .
Case 1.It is natural to set  =  to induce a simple update form.Then, it is easy to get that Case 2. Let  =  −1   .This case is a little less intuitive, but it can lead to a simple update form as well as Case 1.The underlying derivation is a little more complicate.Applying  in ($), we get Recalling  is a lower triangular matrix, by the fact that its inverse is also a lower triangular, we have Plugging the previous relationship to (31), we have In fact, this is a scheme of Gaussian back substitution.
Both cases only involve one matrix-vector multiplication which makes the update form simple.And the computational cost is usually inexpensive.
and its compact form We observe that subproblem ( 32) is similar to (10) in PPA, except for the construction of asymmetry matrix .As mentioned before, (32) is the same as the prediction subproblem in [24].Even though they are closely related, the stepsize choice here is quite different.We provide more specific and weaker condition for stepsize , .It is clear that condition (#) does not need prior knowledge of matrix .Furthermore, it only involves matrix-vector multiplication, and so, it is easy to verify, and it is amenable to large-scale .If ,  fail to meet this convergence condition (#), one should appropriately increase , .In the following subsection, we will elaborate on the self-adaptive strategies to increase the stepsizes.At this point, condition (#) may be seen somewhat unmotivated.Some insight into this will be provided later, as we proceed with the convergence analysis.The convergence condition in [24] has a quite different feature: ,  satisfy It is stronger than condition (#).The following lemma is devoted to the proof of this result.
Case 2. If  =  −1   , by the definition of (  , ũ ), we obtain ( The first inequality follows from the Cauchy-Schwarz inequality, and the last one follows directly from condition (33).
Condition (33) is not only stronger than Condition (#), but it also requires that matrix  is positive semidefinite, while condition (#) does not.Furthermore, condition (33) may require the explicit expression of  or knowledge of ‖  ‖.Despite these drawbacks, condition (33) is appealing to the problems in which ‖  ‖ is known beforehand or easy to compute/obtain.For instance,  is small scale, an identity matrix or a projection operator, and so forth.It is clear that both condition (#) and (33) are more flexible than the one in PPA-CM [23].The most aggressive condition (#) may lead to further improvement in stepsize choice.Moreover, it is worthwhile to notice that condition (#) is elegantly designed and provides (  , ũ ) with favourable property.In fact, for general matrix , condition (#) also can guarantee convergence.Remark 6.The update stepsize   plays an important role here.In fact, it can be regarded as an optimal stepsize which will be illustrated in the following section.
Remark 7. We should restrict the adjustment in Step 5 of Algorithm 1 within a limited number to avoid divergence.
In Algorithm 1, we carry out the -predictor before performing -predictor.The roles of  and  are symmetric; hence, sweeping the order will not break the Gauss-Seidel structure.We switch  and  and obtain a variant of relaxed-PPA with the order of the -predictor step and -predictor step reversed.This variant is illustrated in Algorithm 2. However, there is no a priori information to know which algorithm is superior.Here, we let 4.2.Adaptive Enhancements.Both PPA-CM and LPPA employ fixed stepsizes , .Experiments reveal that they will suffer slow convergence when stepsizes ,  are chosen inappropriately.A natural question is, how to choose the proper initial stepsizes , .Here, we propose self-adaptive strategies with the goal of improving the convergence in practice, as well as making performance less dependent on the initial choice of stepsizes.Our strategies dynamically incorporate the information of the current iteration to perform more informative choice of stepsizes for the next iteration [31].When doing so, the algorithm will be adaptive and free from the initial choice.Denote and then, (31) can be rewritten as Under -norm, the quantity    can be viewed as a residual for the dual feasibility condition, and    can be viewed as a primal residual.These two residuals converge to zero as relaxed-PPA proceeds.Note that        (33) is stronger than condition (#).If one chooses condition (33), our RPPA also converges.It must have predetermined stepsizes satisfying  = ‖  ‖ (here,  ≥ 0.5).However, there is no priority knowledge of the choice of individual  or .Here, we can also adjust ,  automatically when choosing condition (33) It is worth noting that too many adjustments of stepsizes by Algorithm 4 might cause the algorithm to diverge in practice, and we therefore restrict these adaptations within a limited number of iterations.If one chooses Algorithm 4, there is no need to carry out Step 5 in Algorithm 1 (or Algorithm 2).These techniques embedded into relaxed-PPA automatically choose a "better" stepsize for the next iteration.

Convergence Analysis
In this section, we analyze convergence of our primal-dual relaxed-PPA.The convergence analysis of dual-primal scheme can follow a similar procedure.
Let  * = ( * ,  * ) be any solution point, setting  =  * in (32) yields Since ũ ∈ Ω, we have (ũ  −  * )  ( * ) ≥ 0. Consequently, by using the monotonicity of , the right hand side of ( 46) is nonnegative, and thus Now, we are writing the update as and Here, () can be viewed as a progress function.The following lemma shows that () is a lower bound of ().
Lemma 8. Let () and () be defined in (49); then one has Proof.Let  * be any solution, from the definition of (), we have Applying (47) to the first term of (51) gives Substituting (52) into (51), we immediately obtain the assertion.
We note that () is a quadratic function of  and it is natural to maximize () to obtain an "optimal" : We now show that the "optimal"  *  is bounded above from zero in the following Lemma.
The inequality follows from condition (#).
Setting  =   =  *   in (50) yields       +1 −  *      Proof.First, for each  ∈ Ω, we have It follows from (57) that {  } is a bounded sequence and lim Consequently, Because {ũ  } is bounded, it has at least a cluster point.Let  ∞ be a cluster point of {ũ  } and let the subsequence {ũ and consequently, This means that  ∞ is a solution of VI(Ω, ).Note that the inequality (57) is true for all solution points of VI(Ω, ), and hence, we have Since ũ  →  ∞ ( → ∞) and   − ũ → 0 ( → ∞), for any given  > 0, there exists an integer  > 0 such that This implies that the sequence {  } converges to  ∞ , which is a solution of VI(Ω, ) (9).

Applications and Preliminary Numerical Experiments
The general self-adaptive relaxed-PPA (SRPPA) offers a flexible framework for solving many interesting problems.We illustrate our algorithm with different applications: basis pursuit problem, nearest correlation matrix problem.In this section, we describe the results of experiments whose goal is to demonstrate the efficiency of general relaxed-PPA (RPPA) and its self-adaptive version.To that end, we compare RPPA with certain state-of-the-art algorithms on different problems.Our experiments focus on efficiency and speed of convergence and evaluate the methods in terms of their number of iterations and computational times.All the codes were written by Matlab R2009b version, and all the numerical experiments were performed on a Lenovo desktop computer with Intel (R) Core (TM) i5 CPU with 3.2 GHz and 3.5 GB RAM.(66) And here, ‖ ⋅ ‖ 1 denotes the  1 norm defined as ‖‖ 1 := ∑  =1 |  |.BP is a fundamental problem in image processing and modern statistical signal processing, particularly the theory of compressed sensing; see, for example, [1-4] for intensive study.We now discuss our approach to BP problem of over-complete representations.Our experiments in this subsection use synthetic data which were mainly designed to illustrate the nice performance of our RPPA.The synthetic problem that we test here is similar to the one employed in [32].We generate the data as follows: matrix  is a random  ×  matrix, with Gaussian i.i.d.entries of zero mean and variance 1, with  = /2. original ∈   is the original sparse signal, constructed with /5 nonzero values, randomly from standard normal distribution.We use  original to generate the measurements as  =  original .It is desirable to use test problems that have a precisely known solution.In fact, when  original is very sparse, it is the solution to the minimization problem (66).Hence, in our synthetic problem,  original is exactly the solution.
In our first experiment, we compared general RPPA using two different 's mentioned in Section 4.1.For BP problem, we use condition (#) and Algorithm 3. Since  constructed here is a general random matrix, and when  is large scale, ‖  ‖ might be obtained costly.A simple stopping criterion err .=        −  original      ≤ Tol (67) was used in this experiment, and the stopping tolerance Tol was set to 10 −10 .In all the tests, initial stepsizes were set as  = 10,  = 1, the primal variable  0 was initialized as zeros(, 1), and the dual  0 was ones(, 1) in Matlab.Basically, SRPPAs converge very quickly and achieved tight error 10 −10 in a few hundred iterations.For this experiment, one can see that SDPRPPAG1-I is fastest in all cases.Both SDPRPPAG1-I and SPDRPPAG1-I are Gaussian type methods, with  = , and they exhibit very similar performance.SDPRPPAG2-I and SPDRPPAG2-I with  =  −1   are Gaussian back substitute form methods and perform a little slower than Gaussian type methods.We also plot a figure to graphically illustrate the performance of four SRPPAs.Figure 1 shows the results from the test with  = 1000 and  = 6000, depicting error versus CPU time.Qualitywise, SPDRPPAG1-I was on par with SDPRPPAG1-I.
In the second experiment, we compare the performance of SPDRPPAG1-I with TFOCS (source code can be found at http://cvxr.com/tfocs/)[32], ADMM (source code can be found at http://www.stanford.edu/∼boyd/papers/admm/)[33], and PPA-CM.To make the comparison independent of the stopping criterion for each algorithm, we first run TFOCS to get its solution  TFOCS and set a benchmark error benchmark err .= Since we found that Tol = 10 −12 is small enough to guarantee very high accuracy, we set Tol = 10 −12 in TFOCS.The parameters of TFOCS and ADMM were taken with their defaults.
To guarantee the convergence, fixed stepsizes ,  were set to  = 100,  = 1.01 * ‖  ‖/ for PPA-CM.In SPDRPPAG1-I, we also choose the same convergence condition (#) and initial step size  = 10,  = 1 as the previous experiment.We varied the size of  from  = 512 ( = /2) to  = 8500.The results of this experiment are summarized in Table 2. There, we report the run time in seconds, the number of iterations, and the error of the recovery solution.In Table 2, "-" means "out of memory." We observe from Table 2 that four algorithms reach high accuracy around 10 −9 .SPDRPPAG1-I is about two times faster than the first-order method implemented in the TFOCS package, and moreover, it usually outperforms TFOCS in terms of iterations.For medium size problems, SPDRPPAG1-I is clearly faster than ADMM.Even for small size problems, SPDRPPAG1-I shows its superior performance.The main reason lies in that ADMM computed ( +   ) −1 to solve its subproblem exactly which would take expensive computational cost.Not surprisingly, the general SPDRPPAG1-I performs better than the primary PPA-CM.Here, the total iterations of SPDRPPAG1-I are less than 50% of PPA-CM.As we have mentioned, "optimal" update stepsize   and more flexible condition for convergence may provide SPDRPPAG1-I improved performance.SPDRPPAG1-I is faster than PPA-CM in terms of CPU times.However, the superiority of CPU time is not so significant as iteration number.For the cases  = 4300, it is just about 62% of PPA-CM.This is not particularly surprising; compared to PPA-CM, SPDRPPAG1-I has to take extra computation for convergence condition and "optimal"   in each iteration.
where  ∈ R  is the vector whose entries are all 1,   + denotes the cone of positive definite symmetric matrices, diag() is the vector of diagonal elements of , and ‖ ⋅ ‖  denotes the matrix Fröbenius norm ‖‖  = trace (  ) 1/2 .
Here, we apply PPA-CM, LPPA, and SDPRPPA1-II for solving (70).The standard Matlab Mex interface mexsvd is used to conduct the eigenvalue decomposition.We constructed test data sets and stopping criterion like those of [24].As mentioned in the prequel, we expect our SRPPA to produce robust performance.To assess the effectiveness of the adaptive strategies proposed in Section 6, we now move on to the description of experiments that focus on the consequences of the initial stepsizes.For investigating, we used dimensions  ∈ {500, 1000} and varied  from 0.05 to 100, and initial points were set to 0 in all cases.Note that  = ; we fixed  = 1.01/ for PPA-CM,  = 0.65/ for LPPA and chose  = 0.65/ as initial start for SDPRPPAG1-II.Since the experiments with other values of  give qualitatively similar results, we therefore do not plot those results to avoid clutter in the figures.The respective numerical results are plotted in Figure 2.
It is clear that, for PPA-CM and LPPA, the convergence performance was a result of the stepsize selection.They are both fairly sensitive to initial stepsize  (or ).The results confirm that, with inappropriate stepsizes, both PPA-CM and LPPA become significantly slow.SDPRPPAG1-II yields significantly robust performance with adaptive strategy.And it is independent of the initial stepsizes and illustrates its superior performance.Furthermore, SDPRPPAG1-II yields competitive results even when PPA-CM and LPPA chose the "good" initial stepsize.This underlies the importance of adaptive strategy in producing good performance.Of course, care should be taken.For instance, the cost of computing optimal stepsize   here is negligible, compared to the computation of SVD; when they are more costly, general LPPA will be expected to perform slower than PPA-CM.

Conclusions
In this paper, we proposed an efficient general self-adaptive relaxed-PPA method for linearly constrained convex programming and provided theoretical convergence analysis for this method.The stepsizes choice condition is flexible and simple.Self-adaptive strategies are provided to make our method more efficient and robust.Experiments of the method in comparison to other state-of-art methods are provided to confirm the efficiency of the proposed method.Our numerical results suggest that SRPPA is effective and simple to implement.There are a few directions for further research, but we list here only two.The first is the question of whether we may modify the algorithm to work with more general constrained convex problems.Second, we aim to provide various relaxations of the subproblem for the practical purpose.

Figure 2 :
Figure 2: CPU times as a function of the initial stepsize  for PPA-CM, LPPA, and SDPRPPAG1-II.The plot on the left is for  = 500, while the plot on the right is for  = 1000.

Table 1 :
Comparison of behaviours of four SRPPAs.