MPEMathematical Problems in Engineering1563-51471024-123XHindawi10.1155/2020/64891906489190Research ArticleA Gradient Projection Algorithm with a New Stepsize for Nonnegative Sparsity-Constrained Optimizationhttps://orcid.org/0000-0003-3465-7951LiYe12SunJun3https://orcid.org/0000-0001-5354-8043QuBiao1ChenChuanjun1Qufu Normal UniversityRizhao 276826ShandongChinaqfnu.edu.cn2Shandong Women’s UniversityJinan 250300ShandongChinasdwu.edu.cn3Beijing Jiaotong UniversityBeijing 100044Chinanjtu.edu.cn20202682020202026052020300720200808202026820202020Copyright © 2020 Ye Li et al.This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Nonnegative sparsity-constrained optimization problem arises in many fields, such as the linear compressing sensing problem and the regularized logistic regression cost function. In this paper, we introduce a new stepsize rule and establish a gradient projection algorithm. We also obtain some convergence results under milder conditions.

Natural Science Foundation of Shandong ProvinceZR2018MA019
1. Introduction

In this paper, we are mainly concerned with the nonnegative sparsity-constrained optimization problem (NN-SCO):(1)minfxs.t.xSR+n,where f:RnR is a continuously differential function with a lower bound. S:=xRn:x0s is a sparse set, where s<n is a given integer regulating the sparsity level in x and R+n is the nonnegative orthant in Rn. x0 is the l0 norm of x, counting the number of nonzero elements in x. Many application problems can be translated into problem (1), such as the widely studied linear compressing sensing problem of fx=1/2Axb2 with ARm×n being a sensing matrix, bRm is the observation vector, and is the Euclidean norm in Rn . Problem (1) has also used to the regularized logistic regression cost function .

Recently, a great deal of work has been devoted to algorithms for sparsity-constrained optimization problem. Beck and Eldar  established the IHT algorithm which converges to L-stationary under the Lipchitz continuity of the gradient of objective function. Beck and Hallak  generalized these results to sparse symmetric sets. Lu  designed a nonmonotone algorithm for symmetric set constraint problems. Pan, Xiu, and Zhou [6, 7] established the B-stationary, C-stationary, and α-stationary based on the Bouligand tangent cone and Clarke tangent. Recently, Pan, Zhou, and Xiu  established the improved IHT algorithm (IIHT) for problem (1) by using Armijo line search. They proved that any accumulation point converged to an α-stationary point under the restricted strong smoothness of objective function which is weaker than the Lipchitz continuity of the gradient.

Inspired by the above literature studies, in this paper, we establish a gradient projection algorithm with a new stepsize. The new algorithm removes the condition of the restricted strong smoothness of objective function which makes it more applicable. Meanwhile, we prove the convergence of the algorithm.

The rest of this paper is organized as follows. In Section 2, we present some notations, definitions, and lemmas. In Section 3, we give the algorithm of (1) and prove the convergence properties.

2. Preliminaries2.1. Notations

To make it easier to read, we give some used notations as follows:(2)S=xRn:x0s,I1xi1,,n:xi0,I1xy=I1xI1y,PSR+nxargminySR+nxy,I0xi1,,n:xi=0,I1x:the cardinality of I1x.

2.2. DefinitionsDefinition 1 (see [<xref ref-type="bibr" rid="B8">8</xref>]).

Let xSR+n be a given feasible point of (1). We say that x is an α-stationary point, if there exists α>0 such that(3)xPSR+nxαfx.

Definition 2 (see [<xref ref-type="bibr" rid="B9">9</xref>]).

A function f is called 2s-restricted strongly smooth (2s-RSS) with parameter L2s>0, and if for any x,yRn satisfying I1xy<2s, it holds that(4)fyfxfx,yxL2s2yx2.

Definition 3 (see [<xref ref-type="bibr" rid="B9">9</xref>]).

A function f is called 2s-restricted strongly convex (2s-RSC) with parameter l2s>0, and if for any x,yRn satisfying I1xy<2s, it holds that(5)fyfxfx,yxl2s2yx2.

If and only if for any x,yRn and I1xy2s, we have(6)fxfyI1xyl2sxy.

In particular, in (5), if l2s=0, the function f is called 2s-restricted convex (2s-RC).

Definition 4 (see [<xref ref-type="bibr" rid="B10">10</xref>]).

The projected gradient SR+nfx of f is defined by(7)SR+nfx=PTSR+nCfx,=argminv+fxvTSR+nC,=maxfxk,v,v=1.

2.3. LemmasLemma 1 (see [<xref ref-type="bibr" rid="B8">8</xref>]).

Forα>0, vectorxSR+n is an α-stationary point if and only if(8)ifx=0,iI1x,αMsx,iI0x.

In particular, when x0=s,x0, ifx=0,iI1xαMsx,iI0x.

When x0<s,x0, ifx=0,iI1xR+n,iI0x.

Lemma 2 (see [<xref ref-type="bibr" rid="B8">8</xref>]).

PSR+nx=PSPR+nx.

Lemma 3 (see [<xref ref-type="bibr" rid="B8">8</xref>]).

For any xSR+n, we have(9)TSR+nCxk=spanei,iI1xk,where eiRn is a vector whose ith component is one and others are zeros.

3. Main Results

In this section, we establish a new algorithm which improves the IIHT algorithm for (1) and then we analyze its convergence properties. At first, let us develop the gradient projection algorithm with a new stepsize rule.

Algorithm 1.

Step 1. Initialize x0SR+n, 0θ1, and ε>0, and setk0.

Step 2. Compute Lk=supα>0fxkfzkα,θ/xkzkα,θ, where(10)zkα,θ=xk+θxkαxk,xkα=PSR+nxkαfxk.

Step 3. Compute xk+1=PSR+nxkαkfxk where αk satisfies 0αk1/3Lk.

Step 4. If Γkfxkε, then stop; otherwise, set kk+1 and go to Step 2.

Next, let us list the following assumptions for convenience:

For any k>0, Lk<+

f is bounded below on SR+n

Lemma 4.

Let the sequence xk be generated by Algorithm 1, and set lk=3Lk. Then, we have(11)fxkhlkxk,xk+1,where hlkxk,xk+1=fxk+1+fxk+1,xkxk+1+lk/2xkxk+12.

Proof.

Let(12)gt=fxk+1+txkxk+1.

Then,(13)g0=fxk+1,g1=fxk,gt=xkxk+1Tfxk+1+txkxk+1.

Thus,(14)fxkfxk+1=g1g0,=01gtdt,=01xkxk+1Tfxk+1+txkxk+1dt,=01xkxk+1Tfxk+1dt+01xkxk+1Tfxk+1+txkxk+1fxk+1dt,=01xkxk+1Tfxk+1dt+01xkxk+1Tfxk+1+txkxk+1fxk+1+fxkfxkdt,=01xkxk+1Tfxk+1dt+01xkxk+1Tfxkfxk+1dt,+01xkxk+1Tfxk+1+txkxk+1fxkdt,01xkxk+1Tfxk+1dt+01xkxk+1Tfxkfxk+1dt,+01xkxk+1fxk+1+txkxk+1fxkdt,xkxk+1Tfxk+1+Lkxkxk+12+xkxk+101Lk1txkxk+1dt,xkxk+1Tfxk+1+3Lk2xkxk+12,=xkxk+1Tfxk+1+lk2xkxk+12.

Then, (11) is tenable.

Lemma 5.

We suppose llk. For xkSR+n and xk+1=PSR+nxk1/lfxk, we have(15)fxkfxk+1σxkxk+12,where σ=llk/2.

Proof.

Since xk+1PSR+nxk1/lfxk, by the definition of projection, we get(16)xk+1argminxSR+nxxk1lfxk2.

Moreover,(17)hlx,xk=fxk+fxk,xxk+l2xxk,=l2xxkl2fxk2+fxk12lfxk2.

Because fxk1/2lfxk2 is a constant independent of x, we can get(18)xk+1argminxPSR+nhlx,xk.

Therefore,(19)hlxk+1,xkhlxk,xk=fxk.

By Lemma 4, we get(20)fxk+1hlxk+1,xk.

Hence,(21)fxkfxk+1fxkhlkxk+1,xk,hlxk+1,xkhlkxk+1,xk,=llk2xk+1xk2.

Let σ=llk/2. We get(22)fxkfxk+1σxkxk+12.

Lemma 6.

Let the sequence xk be generated by Algorithm 1. Then,

fxkfxk+11/αklk/2xkxk+12

fxk is an increasing sequence, and when k, fxk converges

xkxk+10

for any k=0,1,2,, if xkxk+1, we have fxk+1<fxk

Proof.

Since 0αk1/3Lk, we get(23)1αk3Lk=lk.

Setting l=1/αk in (15), formula (1) can be obtained.

We can easily get that fxk is an increasing sequence by (15). Moreover, by the assumptions H2, we can get that fxk converges.

Let μ=1/αklk/2 in (1). We can get(24)fxkfxk+1μxkxk+12.

Summing over both sides of this inequality, we get(25)k=1xkxk+12k=12μfxkfxk+1,=2μfx0limk+fxk.

Since f is bounded below, we get(26)xkxk+10.

It easily can be got by (2).

Lemma 7.

Let the sequence xk be generated by Algorithm 1. Suppose that the function f is 2s-RC. We have(27)fxk+1fxklkxk+1xk.

Proof.

Because the sequence xk be generated by Algorithm 1, we get I1xy<2s. By Lemma 4 and Lemma 5 in reference , we can get(28)fxk+1fxklkxk+1xk.

Theorem 1.

Let the sequence xk be generated by Algorithm 1. Then, the following results hold:

Any accumulation of sequence xk is an α-stationary point.

If f is 2s-RC, the projected gradient sequence converges to zero, i.e.,(29)limk+Γkfxk=0.

Proof.

Suppose that x is an accumulation point of sequence xk. Then, there exists a subsequence xkn converges to x.

Because(30)xknxkn+1xknxkn+1=xkn+1xkn+xknxkn+xkn+1xkn,

we get(31)limn+xkn+1=limn+xkn=x.

Moreover,(32)xkn+1=PSR+nRnxkαknfxkn,=PSPR+nxkαknfxkn.

We consider the next two cases:

Case 1.

For iI1x, there must exist a sufficiently large index N and a constant c0>0 such that(33)minxikn,xikn+1c0>0.

By PSR+n=PSPR+n and (33), we can get(34)xikn+1=xikαknifxkn.

Since(35)limn+infαkn>0,

without loss of generality, we can suppose limk+αkn>c. Let n+. We get(36)xi=xicifx,

i.e.,(37)ifx=0,iI1x.

Case 2.

For iI0x, we consider two subcases.

Subcase 1.

When x0=s, we get(38)0=x=limn+xikn+1=PSPR+nxknαknfxkni.

Due to the property of the projections PS and PR+n, we have(39)maxxiknαknifxkn,0Msx.

Thus,(40)xiknαknifxknMsx.

Taking limits on both sides, we obtain(41)ifx1cMsx.

Subcase 2.

When x0<s, suppose ifx<0, and we have(42)limn+xiknαknifxkn=cifx>0.

For all sufficiently large n, we have(43)PR+nxiknαknifxkn=xiknαknifxkn,=cifx>0.

Since x0<s, for all sufficiently large n, we have(44)xikn+1=PSPR+nxknαknfxkni,=PR+nxknαknfxkni>0.

which contradicts with iI0x. Thus, ifx0.

Summarizing the two cases, we obtain(45)ifx=0,iI1x,1cMsx,iI0x.

Thus, x is an α-stationary point of (1).

Set Γk=I1xk. By Lemma 3, we have(46)TSR+nCxk=RΓkn,=spanei,iI1xk.

By Definition 4, we have(47)PTSR+nCfxk=maxfxk,v,v=1,=Γkfxk.

Moreover, the maximum value is taken at v=1. For any ε>0, there exists vkRΓkn and vk=1 satisfies(48)Γkfxkfxk,vk+ε.

Because xk+1=PSR+nxkαkfxk and xΓk+1=yΓk+1,xPSR+ny, we get(49)xΓk+1k+1=xkαkfxkΓk+1,i.e.,(50)xΓk+1k+1xkαkfxkΓk+1=0.

Thus, for any ϖk+1RΓk+1n, we get(51)xk+1xkαkfxk,ϖk+1xk+1=0.

Taking ϖk+1=xk+1+vk+1, we get(52)xk+1xk+αkfxk,vk+1=0.

By the Cauchy–Schwartz inequality, we get(53)αkfxk,vk+1=xk+1xk,vk+1xk+1xk,i.e.,(54)fxk,vk+1xk+1xkαk.

By Lemma 7, we get(55)fxk+1,vk+1=fxk+1fxk,vk+1fxk,vk+1,lkxk+1xk+xk+1xkαk.

Taking limits on both sides and using Lemma 6, we have(56)limk+supfxk,vk+10.

By (32), we get(57)limk+Γkfxk=0.

Theorem 2.

Let the sequence xk be generated by Algorithm 1. x is an accumulation point of the sequence xk. Suppose fx is 2sRC, then the following results hold:

If x0<s, then x is a global minimizer of (1)

If x0=s, then x is a local minimizer of (1)

Proof.

For xSR+n, we have I1xx=I1xI1x2s. Since fx is 2sRC, by Definition 3, we have(58)fxfx+fx,xx,=fx+iI1xifxxixi+iI0xifxxixi.

Because x is an accumulation point of the sequence xk. By Theorem 1, x is an α-stationary. By Lemma 1, we can get(59)fxfx.

Thus, x is a global minimizer of (1).

If x0=s, then I1x=I1xk.

In fact, for all sufficiently large k, taking 0<δ<minxi:iI1x, we get(60)xkxδ.

For any iI1x, we have(61)xik=xixixikxixixik>xiδ>0.

Thus,(62)I1xI1xk.

By xk0=s and I1x=x0=s, we have(63)I1x=I1xk.

For any xkSR+n satisfying xkxδ, we have I1xkx=I1xkI1x2s. Since fx is 2sRC, by Definition 3, Theorem 1, and Lemma 1, we have(64)fxkfx+fx,xkx,=fx+iI1xifxxikxi+iI0xifxxikxi,fx.

Thus, x is a local minimizer of (1).

Theorem 3.

Let the sequence xk be generated by Algorithm 1. x is a limit of the sequence xk. Suppose fx is 2sRSC with parameter l2s and x0=s, for all sufficiently large k, and we have(65)xk+1x2xkx2,0<ρ<1,where ρ=12l2s2αk/Lk+2l2s2αk2.

Proof.

By Theorem 2, we get xkx. As fx is 2sRSC with parameter l2s, for any x,yRn and I1xy2s, we have(66)fxfyI1xyl2sxy.

Set Γk=I1xk and Γ=I1x. By Theorem 2, we get Γk=Γ. For all sufficiently large k, we have(67)Γfx=limk+Γkfxk=0,xΓk+1k+1=xkαkfxkΓk+1.

For all sufficiently large k, we have(68)xk+1x2=xΓkαkΓfxkxΓ+αkΓfx2=xkx22αkxkx,fxkfx+αk2fxkfxΓ.

Because xkxLkfxkfx, we get(69)xk+1x2xkx22αkLkαk2fxkfxΓ,12l2s2αkLk+l2s2αk2xkx2.

Since 0αk1/3Lk and σ=llk/2=lLk/2, we have 0αk1/2σ+3Lk. Thus,(70)β2σ+3Lkinfαk12σ+3Lk.

Setting α=β/2σ+3Lk, we get(71)ααk13Lk.

Thus,(72)12l2s2αkLk+l2s2αk2=1+l2s2αk1Lk2l2s2Lk2,1+l2s2α1Lk2l2s2Lk2,=12l2s2αLk+l2s2α2,=ρ2.

By l2sLk and ρ2=1+l2s2α1/Lk2l2s2/Lk2, we get ρ>0.

From 0<β<1 and ρ2=12l2s2α/Lk+l2s2α2, we have ρ<1.

Thus,(73)limk+xk+1xxkxρ,where 0<ρ<1. Thus, the sequence xk is Q-linear convergence to x.

4. Conclusions

In this paper, we are mainly concerned with the nonnegative sparsity-constrained optimization problem. We introduce a new stepsize rule and propose a new gradient projection algorithm to solve this problem. The new algorithm removes the condition of the restricted strong smoothness of objective function which makes the new algorithm more applicable. Meanwhile, we prove the convergence of the algorithm.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.

Acknowledgments

This project was supported by the National Science Foundation of Shandong Province (no. ZR2018MA019).

EladM.Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing2010Berlin, GermanySpringerBahmaniS.RajB.BoufounosP.Greedy sparsity-constrained optimizationThe Journal of Machine Learning Research201314807841BeckA.EldarY. C.Sparsity constrained nonlinear optimization: optimality conditions and algorithmsSIAM Journal on Optimization2013233148015010.1137/1208697782-s2.0-84886247578BeckA.HallakN.On the minimization over sparse symmetric sets: projections, optimality conditions, and algorithmsMathematics of Operations Research201641119622310.1287/moor.2015.07222-s2.0-84959289882LuZ.Optimization over sparse symmetric sets via a non-monotone projected gradient method2015https://arxiv.org/abs/1509.08581PanL.-L.XiuN.-H.ZhouS.-L.On solutions of sparsity constrained optimizationJournal of the Operations Research Society of China20153442143910.1007/s40305-015-0101-32-s2.0-84952325589PanL.XiuN.ZhouS.Gradient support projection algorithm for affine feasibility problem with sparsity and nonnegativity2014https://arxiv.org/abs/1406.7178PanL.ZhouS.XiuN.QiH.-D.A convergent iterative hard thresholding for nonnegative sparsity optimizationPacific Journal of Optimization201713325353NegahbanS.RavikumarP.WainwrightM.YuB.A united framework for high-dimensional analysis of M-estimators with decomposable regularizesProceedings of the Advances in Neural Information Processing Systems (NIPS)December 2009Vancouver, British Columbia, CanadaWangC.QuB.Convergence of the gradient projection method with a new stepsize ruleOR Transactions200263644