MPEMathematical Problems in Engineering1563-51471024-123XHindawi10.1155/2021/99940159994015Research ArticleAn Accelerated Proximal Algorithm for the Difference of Convex ProgrammingShenFeichao1ZhangYing2https://orcid.org/0000-0002-2855-764XWangXueyong3WangZhenbo1School of Mathematics & Computing ScienceGuilin University of Electronic TechnologyGuilinGuangxi 541004Chinagliet.edu.cn2School Basic TeachingShandong Water Conservancy Vocat CollRizhaoShandong 276800China3School of ManagementQufu Normal UniversityRizhaoShandong 276800Chinaqfnu.edu.cn202126420212021432021842021154202126420212021Copyright © 2021 Feichao Shen et al.This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In this paper, we propose an accelerated proximal point algorithm for the difference of convex (DC) optimization problem by combining the extrapolation technique with the proximal difference of convex algorithm. By making full use of the special structure of DC decomposition and the information of stepsize, we prove that the proposed algorithm converges at rate of O1/k2 under milder conditions. The given numerical experiments show the superiority of the proposed algorithm to some existing algorithms.

National Natural Science Foundation of China118013091190134312071249
1. Introduction

Difference of convex problem (DCP) is an important kind of nonlinear programming problems in which the objective function is described as the difference of convex (DC) functions. It finds numerous applications in digital communication system , assignment and power allocation , compressed sensing , and so on .

It is well known that the method to solve the DCP is the so-called difference of the convex algorithm (DCA)  in which the concave part is replaced by a linear majorant in the objective function and a convex optimization subproblem needs to be solved at each iteration. Note that the difficulty of the involved subproblem relies heavily on the DC decomposition of the objective function, and it can be easily solved when the objective function can be written as the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function, and a continuous concave function . Motivated by this, Gotoh et al.  proposed the so-called proximal difference of the convex algorithm (PDCA) for solving DCP, in which not only the concave part is replaced by a linear majorant in each iteration but also the smooth convex part is replaced by some techniques. Furthermore, if the proximal mapping of the proper closed convex function can be easily computed, then the subproblem involved in the PDCA can be solved efficiently. However, when the concave part of the objective is void, the PDCA reduces to the proximal gradient algorithm which may be slow in computing . In fact, since the convergence rate of the PDCA heavily depends on the Lojasiewicz exponent of the objective function, the PDCA converges linearly in general [18, 19]. To accelerate the convergence rate of the proximal difference of the convex algorithm, researchers recall the well-known extrapolation technique to design some efficient algorithms . This technique has been extensively used in accelerating the proximal type algorithms for convex programming [25, 26], and the convergence rate of the algorithms can be improved from O1/k to O1/k2. Motivated by this, Wen et al.  proposed the proximal difference of the convex algorithm with extrapolation (PDCAE) for solving the DCP. The numerical experiments  show that the PDCAE has a better performance although it converges linearly in theory . Now, a question is posed naturally: can we propose new type of the PDCA in which the convergence rate can be improved in theory? This constitutes the motivation of the paper.

In this paper, inspired by the work in [2023, 27], we establish an accelerated proximal DC programming algorithm (APDCA) for the DCP by combining the extrapolation technique and the PDCA. In the algorithm, the current iteration point is replaced by a linear combination of the previous two points, and extrapolation technique is involved in the stepsize. By making full use of the special structure of DC decomposition and the information of stepsize, we prove that the APDCA converges at rate of O1/k2 under milder conditions. The given numerical experiments show the superiority to some existing algorithms.

The remainder of the paper is organized as follows. In Section 2, we describe the DC optimization problem considered in this paper and present our new designed algorithm. In Section 3, we establish the global convergence and the quadratic convergence rate of the new designed algorithm. Some numerical experiments are provided in Section 4. Some conclusions are drawn in Section 5.

To end this section, we recall some definitions used in the subsequent analysis .

For an extended real valued function f:Rn,+, we denote its domain by dom f=xRn:fx<+. The function f is said to be strongly convex if there exists an a>0 such that 2fxaI for all xS, where S is a convex set and I is a identity matrix. The function f is said to be proper if it never equals and dom f. Moreover, a proper function is closed if it is lower semicontinuous. A proper closed function f is said to be level-bounded if the lower level sets of f are bounded; that is, xRn:fxr are bounded for any rR. Given a proper closed function f:RnR+, the limiting subdifferential of f at xdomf is given as follows:(1)fx=vRn:xkfx,vkv with lim infyxkfyfxkvk,yxkyxk0,k,where zfx mean zx and fzfx. Note that dom f=xRn:fx. It is well known that the (limiting) subdifferential reduces to the classical subdifferential in convex analysis when f is a convex function; that is,(2)fx=vRn:fufxv,ux0,uRn.

Furthermore, if f is continuously differentiable, then the (limiting) subdifferential reduces to the gradient of f and denoted by f.

2. Algorithms for DC Programming

Consider the following difference of convex programming:(3)minxRnFxfx+gxhx,where f:RnR is a strongly convex function with a>0, g:RnR is a smooth convex function, g is Lipschitz continuous with Lg>0, h:RnR is a continuous convex function, and h is Lipschitz continuous with Lh>0.

For the DCP, the following is a classical DCA which takes the following iterative scheme :(4)xk+1arg minxRnfx+gxhxk,xxk.

By replacing the concave part in the objective function by a linear majorant and replacing the smooth convex part by a quadratic majorant, Gotoh et al.  proposed a proximal DCA for the DCP. For the sake of completeness, we list Algorithm 1 as follows.

<bold>Algorithm 1:</bold> PDCA.

Initial step. Take ε>0, μ=1/Lg, and x0domf.

Iterative step. Compute the new iterate by the following iterative scheme:

xk+1argminxRnfx+gxkhxkgxkhxk,xxk+1/2μxxk2

untilxk+1xkε is satisfied

where Lg>0 is the Lipschitz constant of g.

Despite a simple subproblem is involved in the algorithm, the PDCA is potentially slow [19, 27]. To accelerate the convergence rate of the PDCA, we incorporate extrapolation technique into the PDCA to obtain the following algorithm (Algorithm 2).

<bold>Algorithm 2:</bold> APDCA.

Initial step. Take 0<μ1/maxLg,Lh, βk0,1 with 0supkβk<1, ε>0, t0=t1=1, and x1=x0domf.

Iterative step. Compute the new iterate by the following iterative scheme:

βk=tk1/tk+1and tk+1=1+1+4tk2/2

yk=xk+βkxkxk1,

xk+1=arg minxRnfx+gykhyk+gykhyk,xyk+1/2μxyk2

untilxk+1xkε is satisfied.

3. Convergence Analysis of the APDCA

In this section, we establish the global convergence of the algorithm and its convergence rate. To continue, we first recall the following conclusions.

Lemma 1.

(see ). Let f be a continuously differentiable function with Lipschitz continuity gradient whose Lipschitz constant Lf>0. Then, for any LLf, it holds that(5)fxfy+fy,xy+L2xy2,x,yRn.

Lemma 2.

Let μ1/2a. For the sequence xk,yk generated by the APDCA, it holds that(6)μFxFxk+1xxk+12xyk2,k1.

Proof.

Since f is strong convex function, there exists constant a>0 such that(7)fxfxk+1+ξk+1,xxk+1+axxk+12,where ξk+1fxk+1.

Connecting the fact that hx is Lipschitz continuous with constant Lh>0 with Lemma 1, we have(8)hxhyk+hyk,xyk+12μxyk2,where 0<μ1/Lh, which means that(9)hxhykhyk,xyk12μxyk2.

It follows from g is convex function that(10)gxgyk+gyk,xyk.

Connecting (7) and (9) with (10), we have(11)fx+gxhxfxk+1+gykhyk+ξk+1,xxk+1+gykhyk,xyk+axxk+1212μxyk2.

On the other hand, since h is convex, it follows that(12)hxhyk+hyk,xyk,which means that(13)hxhykhyk,xyk.

Connecting the fact that gx is Lipschitz continuous with constant Lh>0 with Lemma 1, we have(14)gxgyk+gyk,xyk+12μxyk2,where 0<μ1/Lg. Summing (13) and (14), we have(15)gxhxgykhyk+gykhyk,xyk+12μxyk2.

Adding fx to both sides of (15) yields(16)fx+gxhxfx+gykhyk+gykhyk,xyk+12μxyk2.

By taking x=xk+1, (16) yields that(17)fxk+1+gxk+1hxk+1fxk+1+gykhyk+gykhyk,xk+1yk+12μxk+1yk2.

By the optimality conditions of (8), one has(18)ξk+1+gykhyk+1μxk+1yk=0,that is,(19)1μxk+1yk=ξk+1+gykhyk.

Then, for 0<μ1/maxLg,Lh, it follows from (11) and (17) that(20)FxFxk+1ξk+1+gykhyk,xxk+1+axxk+1212μxyk212μykxk+12=1μxk+1yk,xxk+1+axxk+1212μxyk212μykxk+12=12μykxk+12+xxk+12xyk2+axxk+1212μxyk212μykxk+12=12μ1+2aμxxk+122xyk21μxxk+12xyk2,where the first equality follows from (19), the second equality follows from the fact that 2ab,ac=ac2+ab2bc2,a,b,cRn, and the last inequality follows from 2aμ1. We have conclusion (6).

Before proceeding further, we need the following conclusions.

Lemma 3.

(see [25, 31]). Let t0=t1=1. Then, the sequence tk generated by (6) is increasing, and tkk+1/2.

Lemma 4.

Let xk,yk be a sequence generated by the APDCA. Then,(21)μtk2vktk+12vk+1uk+12uk2,where uk=tkxktk1xk1x, vk=FxkFx, and x is the critical point of problem (3).

Proof.

From (7) and (6), we have yk=xk+tk1/tk+1xkxk1. Then, it follows that(22)uk+12uk2=tk+1xk+1tk+11xkx2tkxktk1xk1x2=tk+1xk+1tk+11xkx2tk+1yktk+11xkx2.

Hence, to show the assertion, we only need to show that(23)μtk2vktk+12vk+1tk+1xk+1tk+11xkx2tk+1yktk+11xkx2.

In fact, by taking x=xk, one has from Lemma 2 that(24)μFxkFxk+1xk+1xk2xkyk2.

Hence,(25)μvkvk+1xk+1xk2xkyk2.

Using Lemma 2 again, one has from x=x that(26)μFxFxk+1xk+1x2ykx2,that is,(27)μvk+1xk+1x2ykx2.

Multiplying (25) by tk2 and (27) by tk+1, respectively, and summing them yield(28)μtk2vktk+12vk+1=μtk2vktk2+tk+1vk+1tk2xk+1xk2xkyk2+tk+1xk+1x2ykx2=tk+12tk+1xk+1xk2xkyk2+tk+1xk+1x2ykx2=tk+1xk+1tk+11xkx2tk+1yktk+11xkx2,where the first equality follows from the fact that tk+12=tk2+tk+1 and the last equality follows by some manipulation. The desired result follows.

Now, we are ready to show the convergence rate of the APDCA.

Theorem 1.

For the sequence xk generated by the APDCA, it holds that(29)FxkFx4x0x2μk+12,where x is a stationary point of (3).

Proof.

Using the notations used in Lemma 4, let k=0, and it follows from (27) that(30)μv1x1x2y0x2.

Hence,(31)μv1+x1x2y0x2.

Then, from Lemma 4, we know that the sequence μtk2vk+uk2 is nonincreasing. Therefore,(32)μtk2vkμtk2vk+uk2μt12v1+x1x2y0x2=x0x2,where the second inequation follows from t1=1 and u1=x1x, and the last equation follows from t1=1.

Then, it follows from Lemma 3 that(33)FxkFx4x0x2μk+12.

The desired result follows.

4. Numerical Experiments

In this section, we evaluate the performance of the APDCA by applying it to the DC regularized least squares problem. We will compare the performance of the APDCA with the algorithm in  (PDCA) and GIST in .

On APDCA and PDCA, we set 1/μ=Lg=λmaxATA and c=Lg/2. On GIST, we set σ=105,m=5,η=2, and 1/αmin=αmax=1030. We initialize the three algorithms at the origin point and terminate the algorithms when(34)xkxk1max1,xk<105.

Furthermore, we terminate PDCA when the number of iteration is more than 5000 (denoted by “max” on the report).

Example 1.

Least squares problems with l12 regularizer are as follows:(35)minxRn12Axb2+cx2+λx1cx2λx,where ARm×n,bRm,c>0, and λ>0 is the regularization parameter.

This problem takes the form of (3) with fx=cx2+λx1, gx=1/2Axb2, and hx=cx2+λx. Note that the purpose of adding cx2 is to ensure strong convexity of fx.

To compare the performance of the three algorithms, we report the number of iterations (denoted by Iter), CPU times in seconds (denoted by CPU time), the sparsity of the solution (denoted by sparsity), and the function values at termination (denoted by fval), averaged over the 30 random instances. The numerical results are reported in Tables 1 and 2, from which we can see that the APDCA always outperforms PDCA and GIST. Specifically, from Table 1, we can see that the APDCA is about 2.5 times faster than GIST and is about 5.2 times faster than PDCA for the parameter λ=5×e4. From Table 2, we can see that the APDCA is about 2.1 times faster than GIST and is about 8.4 times faster than PDCA for the parameter λ=1×e3. Tables 1 and 2 also show that the APDCA requires fewer iteration steps than the other two algorithms. Specifically, from Table 1, the iteration step of APDCA is about 53% of GIST for the parameter λ=5×e4. From Table 2, the iteration step of APDCA is about 64% of GIST for the parameter λ=1×e3. Meanwhile, Tables 1 and 2 also show that the solution given by APDCA is more sparse than that given by GIST and PDCA.

Solving (35) on random instances, λ=5×e4.

IterCPU time
mnGISTAPDCAPDCAGISTAPDCAPDCA
72025601750909Max3.571.387.37
144051201629802Max13.75.031.8
216076801724802Max28.510.062.2
28801024017421002Max52.822.3112.2
36001280017991002Max83.834.3174.7
43201536017391002Max113.748.9246.5
50401792017781002Max160.766.9334.5
57602048018261002Max178.371.5366.1
6480230401778975Max244.3100.5524.1
7200256001752975Max317.4130.9692.6

SparsityFval
mnGISTAPDCAPDCAGISTAPDCAPDCA

720256078376111322.9755e−22.9743e−24.5442e−2
144051201575161422406.1144e−26.1122e−29.4466e−2
216076802367242434259.4648e−29.4612e−21.4594e−1
2880102403117291044961.2312e−11.2308e−11.8319e−1
3600128003889364457071.5896e−11.5890e−12.4309e−1
4320153604766437667201.8879e−11.8869e−12.8401e−1
5040179205497514179112.2523e−12.2512e−13.4175e−1
5760204806327593191812.6870e−12.6859e−14.1224e−1
64802304070656716101842.9070e−12.9098e−14.3889e−1
72002560078657449112863.2206e−13.2191e−14.8588e−1

Solving (35) on random instances, λ=1×e3.

IterCPU time
mnGISTAPDCAPDCAGISTAPDCAPDCA
7202560972591Max1.50.75.4
14405120968602Max6.12.823.2
21607680993602Max13.66.150.2
288010240835602Max19.810.688.6
360012800973602Max36.116.7139.8
432015360931602Max49.223.5202.5
504017920941602Max67.532.6296.4
576020480979602Max100.743.5354.9
648023040992602Max116.354.9449.8
720025600939602Max138.067.4558.5

SparsityFval
mnGISTAPDCAPDCAGISTAPDCAPDCA

72025607287039276.2438e−26.2430e−27.6433e−2
144051201449138118381.3160e−11.3159e−11.6346e−1
216076802168208628102.0060e−12.0058ee−12.5146e−1
2880102402853274536182.3976e−12.3973e−12.7654e−1
3600128003675355746073.0264e−13.0260e−13.5620e−1
4320153604368419555233.9802e−13.9798e−14.7740e−1
5040179205132492565014.7413e−14.7407e−15.7676e−1
5760204805825565673585.3208e−15.3202e−16.3891e−1
6480230406597631183615.7707e−15.7699e−16.9385e−1
7200256007270705292696.4648e−16.4640e−17.7325e−1
Example 2.

Least squares problems with logarithmic regularizer are as follows:(36)minxRn12Axb2+cx2+i=1nλlogxi+ελlogεcx2,where ARm×n,bRm,ε>0 is a constant, and λ>0 is the regularization parameter.

This problem takes the form of (3) with fx=cx2+λ/εx1, gx=1/2Axb2, and hx=cx2+i=1nλxi/εlogxi+ε+logε. Note that the purpose of adding cx2 is to ensure strong convexity of fx. For this example, we set ϵ=0.5.

To compare the performance of the three algorithms, we report the number of iterations (denoted by Iter), CPU times in seconds (denoted by CPU time), the sparsity of the solution (denoted by sparsity), and the function values at termination (denoted by fval), averaged over the 30 random instances. The numerical results are reported in Tables 3 and 4, from which we can see that the APDCA always outperforms PDCA and GIST. Specifically, from Table 3, we can see that the APDCA is about 1.9 times faster than GIST and is about 8.3 times faster than PDCA for the parameter λ=5×e4. From Table 4, we can see that the APDCA is about 1.6 times faster than GIST and is about 11.3 times faster than PDCA for the parameter λ=1×e3. Tables 3 and 4 also show that the APDCA requires fewer iteration steps than the other two algorithms. Specifically, from Table 3, the iteration step of APDCA is about 72% of GIST for the parameter λ=5×e4. From Table 4, the iteration step of APDCA is about 83% of GIST and is about 8.6% of PDCA for the parameter λ=1×e3. Meanwhile, Tables 3 and 4 also show that the solution given by APDCA is more sparse than that given by GIST and PDCA.

Solving (35) on random instances, λ=5×e4.

IterCPU time
mnGISTAPDCAPDCAGISTAPDCAPDCA
7202560843596Max1.60.75.5
14405120672602Max5.63.022.3
21607680873602Max12.46.149.5
288010240876602Max21.210.687.4
360012800871602Max32.716.6138.0
432015360845602Max45.123.4194.8
504017920872602Max62.532.0265.6
576020480846602Max79.441.4345.3
648023040877602Max104.152.8441.0
720025600816602Max120.466.0547.5

SparsityFval
mnGISTAPDCAPDCAGISTAPDCAPDCA

72025607056619313.8979e−23.8973e−25.6815e−2
144051201395134517947.1306e−27.1293e−29.3006e−2
216076802123201127101.1455e−11.1453e−11.5861e−1
2880102402809270535971.4878e−11.4876e−12.0601e−1
3600128003570341845031.9187e−11.9182e−12.7236e−1
4320153604277410353702.3163e−12.3159e−13.1699e−1
5040179205042472962872.6491e−12.6486e−13.6295e−1
5760204805689550171993.0649e−13.0643e−14.3049e−1
6480230406353609380573.4115e−13.4110e−14.7749e−1
7200256007139608989243.7435e−13.7427e−15.1004e−1

Solving (40) on random instances, λ=1×e3.

IterCPU time
mnGISTAPDCAPDCAGISTAPDCAPDCA
720256049732946580.90.45.2
1440512046840245823.11.920.5
2160768049640247396.84.046.8
288010240472402452711.17.079.5
360012800494402460118.311.1126.5
432015360505402460226.616.5179.0
504017920451402442831.821.3234.7
576020480448402444641.227.7304.2
648023040459402460252.835.0403.6
720025600487402466870.744.0510.4

SparsityFval
mnGISTAPDCAPDCAGISTAPDCAPDCA

72025606286356587.5032e−27.5032e−27.5053e−2
144051201300124813371.4892e−11.4891e−11.4896e−1
216076801987186519652.3348e−12.3347e−12.3354e−1
2880102402543246226273.0410e−13.0410e−13.0416e−1
3600128003156307232523.8829e−13.8828e−13.8837e−1
4320153603831370339734.5346e−14.5344e−14.5348e−1
5040179204460430046055.2664e−15.2662e−15.2676e−1
5760204805124499152685.9404e−15.9402e−15.9417e−1
6480230405761554059196.8740e−16.8737e−16.8756e−1
7200256006365623166327.6681e−17.6678e−17.6700e−1
5. Conclusions

In this paper, we propose an accelerated proximal point algorithm for the difference of convex optimization problem by combining the extrapolation technique with the proximal difference of the convex algorithm. By making full use of the special structure of difference of convex decomposition and the information of stepsize, we prove that the proposed algorithm converges at rate of O1/k2 under milder conditions. The given numerical experiments show the superiority of the proposed algorithm to some existing algorithms.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

The authors equally contributed to this paper and read and approved the final manuscript.

Acknowledgments

This project was supported by the Natural Science Foundation of China (grants nos. 11801309, 11901343, and 12071249).

AlvaradoA.ScutariG.PangJ.-S.A new decomposition method for multiuser DC-programming and its applicationsIEEE Transactions on Signal Processing201462112984299810.1109/tsp.2014.23151672-s2.0-84901022882SanjabiM.RazaviyaynM.LuoZ.-Q.Optimal joint base station assignment and beamforming for heterogeneous networksIEEE Transactions on Signal Processing20146281950196110.1109/tsp.2014.23039462-s2.0-84897052027YinP.LouY.HeQ.XinJ.Minimization of $\ell_{1-2}$ for compressed sensingSIAM Journal on Scientific Computing2015371A536A56310.1137/1409523632-s2.0-84923928069WangG.WangY.WangY.Some Ostrowski-type bound estimations of spectral radius for weakly irreducible nonnegative tensorsLinear and Multilinear Algebra20206891817183410.1080/03081087.2018.15618232-s2.0-85059529582WangG.ZhouG.ZhouG.CaccettaL.Z-eigenvalue inclusion theorems for tensorsDiscrete & Continuous Dynamical Systems-B201722118719810.3934/dcdsb.20170092-s2.0-85008705399WangG.ZhangY.ZhangY.\begin {document} $Z$\end {document} -eigenvalue exclusion theorems for tensorsJournal of Industrial & Management Optimization20201641987199810.3934/jimo.2019039De OliveiraW.Proximal bundle methods for nonsmooth DC programmingJournal of Global Optimization201975252356310.1007/s10898-019-00755-42-s2.0-85062619168FengD.SunM.WangX.A family of conjugate gradient methods for large-scale nonlinear equationsJournal of Inequalities and Applications201723610.1186/s13660-017-1510-02-s2.0-85029948073Le ThiH. A.Pham DinhT.DC programming in communication systems: challenging problems and methodsVietnam Journal of Computer Science201411152810.1007/s40595-013-0010-5Le ThiH. A.Pham DinhT.DC programming and DCA: thirty years of developmentsMathematical Programming2018169356810.1007/s10107-018-1235-y2-s2.0-85040942192LuZ.ZhouZ.Nonmonotone enhanced proximal DC algorithms for a class of structured nonsmooth DC programmingSIAM Journal on Optimization20192942725275210.1137/18m1214342LouY.ZengT.OsherS.XinJ.A weighted difference of anisotropic and isotropic total variation model for image processingSIAM Journal on Imaging Sciences2015831798182310.1137/14098435x2-s2.0-84943549432WangX.Alternating proximal penalization algorithm for the modified multiple-sets split feasibility problemsJournal of Inequalities and Applications201820184810.1186/s13660-018-1641-y2-s2.0-85042291977PhamD. T.Le ThiH. A.Convex analysis approach to DC programming:theory, algorithms and applicationsActa Mathematica Vietnamica199722289355PhamD. T.Le ThiH. A.A D. C. Optimization algorithm for solving the trust-region subproblemSIAM Journal on Optimization1998847650510.1137/S10526234942743132-s2.0-0032081028GotohJ.-Y.TakedaA.TonoK.DC formulations and algorithms for sparse optimization problemsMathematical Programming2018169114117610.1007/s10107-017-1181-02-s2.0-85026923441O’ DonoghueB.CandesE. J.Adaptive restart for accelerated gradient schemesFoundations of Computational Mathematics20151571573210.1007/s10208-013-9150-32-s2.0-84929946112Le ThiH. A.HuynhV. N.Pham DinhT.Convergence analysis of difference-of-convex algorithm with subanalytic dataJournal of Optimization Theory and Applications2018179110312610.1007/s10957-018-1345-y2-s2.0-85050672555WangX.ZhangY.ChenH.KouX.Convergence rate analysis of the proximal difference of convex algorithmMathematical Problem in Engineering202120215562986810.1155/2021/5629868NesterovY.A method of solving a convex programming problem with convergence rate O1/k2Proceedings of the USSR Academy of Sciences1983269543547NesterovY.Introductory Lectures on Convex Optimization: A Basic Course2004Berlin, GermanySpringerNesterovY.Dual extrapolation and its applications to solving variational inequalities and related problemsMathematical Programming20071092-331934410.1007/s10107-006-0034-z2-s2.0-33846657040NesterovY.Gradient methods for minimizing composite functionsMathematical Programming2013140112516110.1007/s10107-012-0629-52-s2.0-84879800501WangX.WangY.WangY.WangG.An accelerated augmented Lagrangian method for multi-criteria optimization problemJournal of Industrial & Management Optimization20201611910.3934/jimo.20181362-s2.0-85063214411BeckA.TeboulleM.A fast iterative shrinkage-thresholding algorithm for linear inverse problemsSIAM Journal on Imaging Sciences20092118320210.1137/0807165422-s2.0-85014561619MoudafiA.GibaliA.l1l2 regularization of split feasibility problemsNumerical Algorithms201878373975710.1007/s11075-017-0398-62-s2.0-85027684751WenB.ChenX.PongT. K.A proximal difference-of-convex algorithm with extrapolationComputational Optimization and Applications201869229732410.1007/s10589-017-9954-12-s2.0-85031902137FacchineiF.PangJ.Finite Dimensional Variational Inequalities and Complementarity Problems2003Berlin, GermanySpringerRockafellarR. T.WetsR. J.-B.Variational Analysis1998Berlin, GermanySpringerBoydS.VandenbergheL.Convex Optimization2004Cambridge, UKCambridge University PressGoldfarbD.MaS.ScheinbergK.Fast alternating linearization methods for minimizing the sum of two convex functionsMathematical Programming20131411-234938210.1007/s10107-012-0530-22-s2.0-84884671324GongP.ZhangC.LuZ.HuangJ.YeJ.A general iteraitve shrinkage and thresholding algorithm for nonconvex regularized optimization problemsProceedings of the 2013 30 th International Conference on Machine Learning2013Atlanta, GA, USA