Convergence Analysis on an Accelerated Proximal Point Algorithm for Linearly Constrained Optimization Problems

Proximal point algorithm is a type of method widely used in solving optimization problems and some practical problems such as machine learning in recent years. In this paper, a framework of accelerated proximal point algorithm is presented for convex minimization with linear constraints. *e algorithm can be seen as an extension to G€ uler’s methods for unconstrained optimization and linear programming problems. We prove that the sequence generated by the algorithm converges to a KKTsolution of the original problem under appropriate conditions with the convergence rate of O(1/k2).


Introduction
Proximal point algorithm (abbr. PPA) is a type of method widely used in solving optimization problems, fixed point problems, maximal monotone operator problems, and so on. e framework of the proximal point method is closely related to many algorithms. It even can do more interpretation and generalization of some other methods and algorithms. In recent years, combining the idea of the proximal point method or the proximal terms with some existing algorithms shows that it can improve the performance of the original algorithms in a certain extent. e main step of the PPA is to compute a subproblem consisting of proximal point operator. In some practical problems with suitable conditions, the proximal point subproblems may be expressed as convex optimization problems with smaller scales or better properties which make them even have closed form solutions. In recent years, the proximal point method, together with its relative models, and several types of PPAs have been used in machine learning, image recognition, signal processing, and so on [1][2][3]. e original PPA can be seen in Martinet [4,5] about the research of the fixed point problem and the variational inequality. For any given x 0 in a Hilbert space, Martinet [4] gave an iterative sequence x k k∈N for variational inequality problem by a PPA based on the following inclusion relation: where c k is a sequence of positive integers and T is a maximal monotone operator. Later, in the unconstrained minimization of f(x), where f(·) is a proper lower semicontinuous convex function in a Hilbert space, Rockafellar [6,7] presented a more practical PPA which had the iteration as (2) For solving the convex programming problem with Lipschitz continuous gradients, Nesterov [9] gave an accelerated gradient method, which added a step to the gradient descent direction in order to reduce the times of iterations and obtained the iterative estimation of general first-order methods. e idea of acceleration was then introduced into other optimization algorithms, such as G € uler [10], who gave two new PPAs by adding auxiliary point series and adopting acceleration strategy. Compared with (3), his accelerated algorithms obtained better convergence rate: G € uler [11] also applied this method to linear programming and proposed new augmented Lagrangian multiplier algorithms. While the iteration subproblems were solved inexactly, the algorithms not only maintained the global convergent property but also provided faster global convergence rates and can terminate in finite iterations to obtain the primal and dual optimal solutions. In the literature of Birge et al. [12], the authors further generalized G € uler's accelerated proximal point method to unconstrained nonsmooth convex optimization problems. ey gave a model algorithm and then presented a family of PPAs which generated x k+1 under a different rule from the classical PPA. Under weaker conditions, the estimation of the global convergence rate is obtained. ey also discussed the application of PPA on stochastic programming and the variant proximal bundle method. More research studies on the proximal point method can be found in the review of Parikh and Boyd [13]. In recent years, much effort has been made to accelerate various first-order methods and the convergence analysis for linearly constrained convex optimization. For example, Ke and Ma [14] proposed an accelerated augmented Lagrangian method for solving the linearly constrained convex programming and showed its convergence rate is O(1/k 2 ). Xu [15] proposed two accelerated methods for solving structured linearly constrained convex programming and discussed their convergence rate under different conditions. Zhang et al. [16] applied the proximal method of multipliers for equality constrained optimization problems and proved that, under linear independence constraint qualification and the second-order sufficiency optimality condition, it is linearly convergent. And when the penalty parameter increases to +∞, it is superlinear. Now, we consider an accelerated PPA for solving convex optimization problem with linear equality constraints: where f(x): R n ⟶ R ⋃ +∞ { } is a proper lower semicontinuous (not necessary smooth) convex function and A ∈ R m×n , b ∈ R m . Our main work is as follows. (1) By using the Lagrange function, the KKT system, and the originaldual relations, we construct appropriate auxiliary sequences by quadratic convex functions φ k (x) and auxiliary points y k . en, we update x k and the Lagrange multipliers, respectively, to extend the accelerated PPA to general convex optimization problems. (2) In the extended algorithm, the parameter α k which is related to the convergence rate is updated with an introduced constant c. e update of α k in G € uler's algorithm can be seen as a special case with c � 1. (3) When the iteration subproblems are solved exactly, the algorithm has the convergence rate of O(1/k 2 ) in terms of the objective residual of the associated Lagrange function. e remaining parts of this paper are organized as follows. In Section 2, a framework of the accelerated PPA is presented for constrained optimization problem (5). In Section 3, the global convergence is established under mild assumptions. In Section 4, the convergence rate based on the function values is given. In Section 5, we conclude this paper with some remarks.

An Accelerated PPA for Constrained Convex Optimization
For an unconstrained optimization problem, the classical PPA generates the next iteration point by And the main accelerating idea of Nesterov [9] and G € uler [10] is to construct a sequence of auxiliary quadratic convex functions φ k (x) which can be seen as estimations of f(x) but with better functional states. While k increases, the difference between the auxiliary functions and the original objective function is compressed such that, for any x ∈ R n , en, in each iteration of the algorithm, it produces x k satisfying where v * k is the minimal point of φ k (x). us, while the minimal point x * exists, since f(x) is proper, it obtains 2 Mathematical Problems in Engineering . e key steps in the accelerated algorithm are the construction of the auxiliary quadratic convex functions with suitable estimation on f(x), the producing of x k satisfying (15), and the selection of the compressing factor α k . Inspired by Nesterov's estimate sequences and the PPA given by G€ uler [10], we consider the constrained optimization problem (5). It is known that the dual problem of (5) is where f * is the Fenchel conjugate function of f. And the augmented Lagrangian function associated with (5) is defined as where λ ∈ R m is the multiplier and σ > 0 is a penalty parameter. To simplify the discussion, we denote the proximal point about function f at a given y as Firstly, we construct a series of quadratic regular functions for (12) before the accelerated PPA is given. For given x 0 , λ 0 , and a 0 > 0, let Since f(x) is convex and φ k (x) is a quadratic regular function, for any k ∈ N, φ k (x) can be written in the canonical form: where v * k means the minimizer of φ k (x). On the contrary, from (15), we have Comparing the second terms of (16) and (17), it implies It is not hard to see that the minimum values and the minimizers between two adjacent quadratic functions have the following relationships: and If we generate the iteration points by x k+1 ≔ y p k for k ≥ 0, the following lemma shows the relationship between the Lagrange function, the proximal point y p k , the constructed quadratic estimation function φ k (x), and its minimizer v * k+1 .
are defined as previously mentioned. Let c ∈ (0, 2), en, we have Mathematical Problems in Engineering Proof. We prove it by induction. While k � 0, (22) is obvious. Now, assume By convexity, for any x ∈ R n , it has en, by (23), Substitute (18) and (26) into (20), it implies Notice that, in the last term of (27), by using us, by (20), we gain Substituting this formula into (27), it deduces 4 Mathematical Problems in Engineering (30) From Lemma 1, φ k+1 (v * k+1 ) can be seen as an upper bound estimation of the Lagrange function on (x k+1 , λ k ). v * k , y k , α k , and λ k all can be updated by explicit formulas. us, we give the accelerated PPA for the constrained optimization problem (5) as follows.
Accelerated PPA for constrained convex optimization (C − APPA): (iii) Step 2: compute the proximal point and the updating � argmin x∈R n f(x) (iv) Step 3: let k ≔ k + 1 and go to Step 1.
□ Remark 1. It is obvious that 0 < α k < 1 by the definition. Actually, from (10), the convergence of the algorithm relates to (1 − α k ). In the later eorem 1, it needs to assume the nonzero lower bound such that α k ≥ α > 0. In particular, when cα k β k � 1, it obtains α k �

Remark 2.
In the iterations, x k+1 can be computed from (34) or (35), which has the same result though they have different quadratic terms. We will use these two different subproblem forms in later discussion on convergence for convenience. One is used to prove the sequence converges to the KKT solution and the other (35), which is more suitable for measuring changes in function values and the constraints, is used in the analysis on the convergence rate.

Mathematical Problems in Engineering
Remark 3. In the problem (35), scaled proximal terms also can be used such as ‖x − y k ‖ 2 Q , where Q is a positive semidefinite matrix. For some structured function f(x) and appropriately chosen Q, the augmented term can be linearized, and it makes (35) have a closed form solution. Or in practice, we can solve the problem inexactly, but it may not retain O(1/k 2 ) convergence rate, which we will show in the following sections under the same assumptions. Further discussion will be out of the scope of this paper, and we leave the discussion on inexactly solving cases and their numerical experiments to another paper.

Global Convergence
Assumption 1. Assume that f is a proper lower semicontinuous convex function and x 0 ∈ R n is a feasible point of (5).
From Corollaries 28.2.2 and 28.3.1 of [17], under Assumption 1, the KKT set of (5) is nonempty. Moreover, x is the solution of (5) iff there exists λ ∈ R m such that (x, λ) satisfies the KKT system: We have the lemmas as follows. (x k , λ k ) is generated by C- APPA  and (x, λ) is a KKT solution of (5). en, for k � 1, 2, . . . ,, it has

Lemma 2. Suppose that
Proof. By the first-order optimization condition of (34) and together with (36), we have Since f(x) is proper lower semicontinuous convex function and by eorem 12.17 of Rockalellar and Wets [18], zf is a maximal monotone operator and there exists a selfadjoint positive semidefinite linear operator Σ f such that, for any x, x ′ ∈ dom(f) with u ∈zf(x) and u ′ ∈ zf(x ′ ), Notice that x and λ satisfy the KKT system, and it implies By using of (36) and (39), it is easy to know that us, By (36) and (39) in the algorithm, we have and en, From the basic relations, it deduces 6 Mathematical Problems in Engineering (53) Since Ax � b, Ax k+1 − b � Ax e k+1 , we gain (41) by (53). □ Lemma 3. Suppose that (x k , λ k ) is generated by C-APPA and (x, λ) is a KKT solution of (5). en, for k � 1, 2, . . . , it has (54) Proof. From and together with (42), (43), and (45), we have en, By (55), it deduces From (50) and (55), we can obtain Also by using of (50), it has

Mathematical Problems in Engineering
Since 0 < α k < 1 for all k and by (59) us, (54) is true. Now, for k > 0, we construct the following ϕ k and turn to the analysis of the convergence of the algorithm: (62) e theorem shows that the sequence ϕ k k > 0 decreases monotonically under appropriate conditions, and then C-APPA is convergent since ϕ k ≥ 0 for all k ≥ 0. □ Theorem 1. Suppose the solution set of (5) is nonempty. Denote (x, λ) as one solution satisfying the KKT system. Let ϕ k (k > 0) be defined as (62) and (x k , λ k ) is generated by C-APPA. en, Furthermore, for all k � 1, 2, . . . ,, if there exists α > 0 and β > 0 such that α k ≥ α and β k ≥ β, then lim k⟶∞ x k � x,