An Accelerated Proximal Algorithm for the Difference of Convex Programming

In this paper, we propose an accelerated proximal point algorithm for the difference of convex (DC) optimization problem by combining the extrapolation technique with the proximal difference of convex algorithm. By making full use of the special structure of DC decomposition and the information of stepsize, we prove that the proposed algorithm converges at rate ofO(1/k2) under milder conditions. ,e given numerical experiments show the superiority of the proposed algorithm to some existing algorithms.

It is well known that the method to solve the DCP is the so-called difference of the convex algorithm (DCA) [14] in which the concave part is replaced by a linear majorant in the objective function and a convex optimization subproblem needs to be solved at each iteration. Note that the difficulty of the involved subproblem relies heavily on the DC decomposition of the objective function, and it can be easily solved when the objective function can be written as the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function, and a continuous concave function [15]. Motivated by this, Gotoh et al. [16] proposed the socalled proximal difference of the convex algorithm (PDCA) for solving DCP, in which not only the concave part is replaced by a linear majorant in each iteration but also the smooth convex part is replaced by some techniques. Furthermore, if the proximal mapping of the proper closed convex function can be easily computed, then the subproblem involved in the PDCA can be solved efficiently. However, when the concave part of the objective is void, the PDCA reduces to the proximal gradient algorithm which may be slow in computing [17]. In fact, since the convergence rate of the PDCA heavily depends on the Lojasiewicz exponent of the objective function, the PDCA converges linearly in general [18,19]. To accelerate the convergence rate of the proximal difference of the convex algorithm, researchers recall the well-known extrapolation technique to design some efficient algorithms [20][21][22][23][24]. is technique has been extensively used in accelerating the proximal type algorithms for convex programming [25,26], and the convergence rate of the algorithms can be improved from O(1/k) to O(1/k 2 ). Motivated by this, Wen et al. [27] proposed the proximal difference of the convex algorithm with extrapolation (PDCAE) for solving the DCP. e numerical experiments [27] show that the PDCAE has a better performance although it converges linearly in theory [27]. Now, a question is posed naturally: can we propose new type of the PDCA in which the convergence rate can be improved in theory? is constitutes the motivation of the paper.
In this paper, inspired by the work in [20][21][22][23]27], we establish an accelerated proximal DC programming algorithm (APDCA) for the DCP by combining the extrapolation technique and the PDCA. In the algorithm, the current iteration point is replaced by a linear combination of the previous two points, and extrapolation technique is involved in the stepsize. By making full use of the special structure of DC decomposition and the information of stepsize, we prove that the APDCA converges at rate of O(1/k 2 ) under milder conditions. e given numerical experiments show the superiority to some existing algorithms. e remainder of the paper is organized as follows. In Section 2, we describe the DC optimization problem considered in this paper and present our new designed algorithm. In Section 3, we establish the global convergence and the quadratic convergence rate of the new designed algorithm. Some numerical experiments are provided in Section 4. Some conclusions are drawn in Section 5.
To end this section, we recall some definitions used in the subsequent analysis [28][29][30].
For an extended real valued function f: R n ⟶ [− ∞, +∞], we denote its domain by dom f � x ∈ R n : { f(x) < + ∞}. e function f is said to be strongly convex if there exists an a > 0 such that ∇ 2 f(x) ≽ aI for all x ∈ S, where S is a convex set and I is a identity matrix. e function f is said to be proper if it never equals − ∞ and dom f ≠ ∅. Moreover, a proper function is closed if it is lower semicontinuous. A proper closed function f is said to be level-bounded if the lower level sets of f are bounded; that is, x ∈ R n : f(x) ≥ r are bounded for any r ∈ R. Given a proper closed function f: R n ⟶ R ∪ +∞ { }, the limiting subdifferential of f at x ∈ domf is given as follows: It is well known that the (limiting) subdifferential reduces to the classical subdifferential in convex analysis when f is a convex function; that is, Furthermore, if f is continuously differentiable, then the (limiting) subdifferential reduces to the gradient of f and denoted by ∇f.

Algorithms for DC Programming
Consider the following difference of convex programming: where f: R n ⟶ R is a strongly convex function with a > 0, g: R n ⟶ R is a smooth convex function, ∇g is Lipschitz continuous with L g > 0, h: R n ⟶ R is a continuous convex function, and ∇h is Lipschitz continuous with L h > 0. For the DCP, the following is a classical DCA which takes the following iterative scheme [14]: By replacing the concave part in the objective function by a linear majorant and replacing the smooth convex part by a quadratic majorant, Gotoh et al. [16] proposed a proximal DCA for the DCP. For the sake of completeness, we list Algorithm 1 as follows.
Despite a simple subproblem is involved in the algorithm, the PDCA is potentially slow [19,27]. To accelerate the convergence rate of the PDCA, we incorporate extrapolation technique into the PDCA to obtain the following algorithm (Algorithm 2).

Convergence Analysis of the APDCA
In this section, we establish the global convergence of the algorithm and its convergence rate. To continue, we first recall the following conclusions.
Lemma 1 (see [25]). Let f be a continuously differentiable function with Lipschitz continuity gradient whose Lipschitz constant L(f) > 0. en, for any L ≥ L(f), it holds that Proof. Since f is strong convex function, there exists constant a > 0 such that where ξ k+1 ∈zf(x k+1 ).
Connecting the fact that ∇h(x) is Lipschitz continuous with constant L h > 0 with Lemma 1, we have Mathematical Problems in Engineering It follows from g is convex function that Connecting (7) and (9) with (10), we have On the other hand, since h is convex, it follows that which means that Connecting the fact that ∇g(x) is Lipschitz continuous with constant L h > 0 with Lemma 1, we have where 0 < μ ≤ (1/L g ). Summing (13) and (14), we have Adding f(x) to both sides of (15) yields By taking x � x k+1 , (16) yields that Initial step. Take ε > 0, μ � (1/L g ), and x 0 ∈ domf. Iterative step. Compute the new iterate by the following iterative scheme: ALGORITHM 1: PDCA.
Iterative step. Compute the new iterate by the following iterative scheme:

Mathematical Problems in Engineering
By the optimality conditions of (8), one has that is, en, for 0 < μ ≤ (1/max L g , L h ), it follows from (11) and (17) that where the first equality follows from (19), the second equality follows from the fact that 2〈a − b, a − c〉 � ‖a − c‖ 2 + ‖a − b‖ 2 − ‖b − c‖ 2 , a, b, c ∈ R n , and the last inequality follows from 2aμ ≥ 1. We have conclusion (6). Before proceeding further, we need the following conclusions.

Lemma 4. Let x k , y k be a sequence generated by the APDCA. en,
where , and x * is the critical point of problem (3).
Proof. From (7) and (6), we have y k � x k + ((t k − 1)/ (t k+1 ))(x k − x k− 1 ). en, it follows that Hence, to show the assertion, we only need to show that In fact, by taking x � x k , one has from Lemma 2 that Hence, Using Lemma 2 again, one has from x � x * that Multiplying (25) by t 2 k and (27) by t k+1 , respectively, and summing them yield where the first equality follows from the fact that t 2 k+1 � t 2 k + t k+1 and the last equality follows by some manipulation. e desired result follows. Now, we are ready to show the convergence rate of the APDCA. □ Theorem 1. For the sequence x k generated by the APDCA, it holds that where x * is a stationary point of (3).
Proof. Using the notations used in Lemma 4, let k � 0, and it follows from (27) that Hence, en, from Lemma 4, we know that the sequence μt 2 k v k + ‖u k ‖ 2 is nonincreasing. erefore, where the second inequation follows from t 1 � 1 and u 1 � x 1 − x * , and the last equation follows from t 1 � 1. en, it follows from Lemma 3 that e desired result follows.

Numerical Experiments
In this section, we evaluate the performance of the APDCA by applying it to the DC regularized least squares problem. We will compare the performance of the APDCA with the algorithm in [15] (PDCA) and GIST in [32]. On APDCA and PDCA, we set (1/μ) � L g � λ max (A T A) and c � (L g /2). On GIST, we set σ � 10 − 5 , m � 5, η � 2, and (1/α min ) � α max � 10 30 . We initialize the three algorithms at the origin point and terminate the algorithms when Furthermore, we terminate PDCA when the number of iteration is more than 5000 (denoted by "max" on the report). Example 1. Least squares problems with l 1− 2 regularizer are as follows: where A ∈ R m×n , b ∈ R m , c > 0, and λ > 0 is the regularization parameter. is problem takes the form of (3) with f(x) � c‖x‖ 2 + λ‖x‖ 1 , g(x) � (1/2)‖Ax − b‖ 2 , and h(x) � c‖x‖ 2 + λ‖x‖. Note that the purpose of adding c‖x‖ 2 is to ensure strong convexity of f(x).
To compare the performance of the three algorithms, we report the number of iterations (denoted by Iter), CPU times in seconds (denoted by CPU time), the sparsity of the solution (denoted by sparsity), and the function values at termination (denoted by fval), averaged over the 30 random instances. e numerical results are reported in Tables 1 and 2, from which we can see that the APDCA always outperforms PDCA and GIST. Specifically, from Table 1, we can see that the APDCA is about 2.5 times faster than GIST and is about 5.2 times faster than PDCA for the parameter λ � 5 × e − 4 . From Table 2, we can see that the APDCA is about 2.1 times faster than GIST and is about 8.4 times faster than PDCA for the parameter λ � 1 × e − 3 . Tables 1 and 2 also show that the APDCA requires fewer iteration steps than the other two Mathematical Problems in Engineering algorithms. Specifically, from Table 1, the iteration step of APDCA is about 53% of GIST for the parameter λ � 5 × e − 4 . From Table 2, the iteration step of APDCA is about 64% of GIST for the parameter λ � 1 × e − 3 . Meanwhile, Tables 1 and 2 also show that the solution given by APDCA is more sparse than that given by GIST and PDCA.
Example 2. Least squares problems with logarithmic regularizer are as follows: Note that the purpose of adding c‖x‖ 2 is to ensure strong convexity of f(x). For this example, we set ϵ � 0.5.
To compare the performance of the three algorithms, we report the number of iterations (denoted by Iter), CPU times in seconds (denoted by CPU time), the sparsity of the solution (denoted by sparsity), and the function values at termination (denoted by fval), averaged over the 30 random instances.
e numerical results are reported in Tables 3 and 4, from which we can see that the APDCA always outperforms PDCA and GIST. Specifically, from Table 3, we can see that the APDCA is about 1.9 times faster than GIST and is about 8.3 times faster than PDCA for the parameter λ � 5 × e − 4 . From Table 4, we can see that the APDCA is about 1.6 times faster than GIST and is about 11.3 times faster than PDCA for the parameter λ � 1 × e − 3 . Tables 3 and 4 also show that the APDCA requires fewer iteration steps than the other two algorithms. Specifically, from Table 3, the iteration step of APDCA is about 72% of GIST for the parameter λ � 5 × e − 4 . From Table 4, the iteration step of APDCA is about 83% of GIST and is about 8.6% of PDCA for the parameter λ � 1 × e − 3 . Meanwhile, Tables 3 and 4 also show that the solution given by APDCA is more sparse than that given by GIST and PDCA.

Conclusions
In this paper, we propose an accelerated proximal point algorithm for the difference of convex optimization problem by combining the extrapolation technique with the proximal difference of the convex algorithm. By making full use of the special structure of difference of convex decomposition and the information of stepsize, we prove that the proposed algorithm converges at rate of O(1/k 2 ) under milder conditions. e given numerical experiments show the superiority of the proposed algorithm to some existing algorithms.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.