We propose an extended multivariate spectral gradient algorithm to solve
the nonsmooth convex optimization problem. First, by using Moreau-Yosida regularization, we convert the original objective function to a continuously differentiable function; then we use approximate function and gradient values of the Moreau-Yosida regularization to substitute the corresponding exact values in the algorithm. The global convergence is proved under suitable assumptions. Numerical experiments are presented to show the effectiveness of this algorithm.
1. Introduction
Consider the unconstrained minimization problem(1)minx∈Rnfx,where f:Rn→R is a nonsmooth convex function. The Moreau-Yosida regularization [1] of f at x∈Rn associated with z∈Rn is defined by(2)Fx=minz∈Rnfz+12λz-x2,where · is the Euclidean norm and λ is a positive parameter. The function minimized on the right-hand side is strongly convex and differentiable, so it has a unique minimizer for every z∈Rn. Under some reasonable conditions, the gradient function of F(x) can be proved to be semismooth [2, 3], though generally F(x) is not twice differentiable. It is widely known that the problem(3)minx∈RnFxand the original problem (1) are equivalent in the sense that the two corresponding solution sets coincidentally are the same. The following proposition shows some properties of the Moreau-Yosida regularization function F(x).
Proposition 1 (see Chapter XV, Theorem 4.1.4, [1]).
The Moreau-Yosida regularization function F is convex, finite-valued, and differentiable everywhere with gradient(4)gx≡∇Fx=1λx-px,where(5)px=argminz∈Rnfz+12λz-x2is the unique minimizer in (2). Moreover, for all x,y∈Rn, one has(6)gx-gy≤1λx-y.
This proposition shows that the gradient function g:Rn→Rn is Lipschitz continuous with modulus 1/λ. In this case, the gradient function g is differentiable almost everywhere by the Rademacher theorem; then the B-subdifferential [4] of g at x∈Rn is defined by (7)∂Bgx=V∈Rn×n:V=limxk→x∇gxk,xk∈Dg,where Dg={x∈Rn:gisdifferentiableatx}, and the next property of BD-regularity holds [4–6].
Proposition 2.
If g is BD-regular at x, then
all matrices V∈∂Bg(x) are nonsingular;
there exists a neighborhood N of x∈Rn, κ1>0, and κ2>0; for all y∈N, one has (8)dTVd≥κ1d2,V-1≤κ2,∀d∈Rn,V∈∂Bgy.
Instead of the corresponding exact values, we often use the approximate value of function F(x) and gradient g(x) in the practical computation, because p(x) is difficult and sometimes impossible to be solved precisely. Suppose that, for any ε>0 and for each x∈Rn, there exists an approximate vector pa(x,ε)∈Rn of the unique minimizer p(x) in (2) such that(9)fpax,ε+12λpax,ε-x2≤Fx+ε.The implementable algorithms to find such approximate vector pa(x,ε)∈Rn can be found, for example, in [7, 8]. The existence theorem of the approximate vector pa(x,ε) is presented as follows.
Proposition 3 (see Lemma 2.1 in [7]).
Let {xk} be generated according to the formula(10)xk+1=xk-αkυk,fork=1,2,…,where αk>0 is a stepsize and υk is an approximate subgradient at xk; that is,(11)υk∈∂εkfxk=υ∣fυ-υ,xk≤fxk+εk,fork=1,2,….(i) If υk satisfies(12)υk∈∂fxk+1,fork=1,2,…,then (11) holds with(13)εk=fxk-fxk+1-αkυk2≥0.(ii) Conversely, if (11) holds with εk given by (13), then (12) holds: xk+1=pa(xk,εk).
We use the approximate vector pa(x,ε) to define approximation function and gradient values of the Moreau-Yosida regularization, respectively, by(14)Fax,ε=fpax,ε+12λpax,ε-x2,(15)gax,ε=x-pax,ελ.The following proposition is crucial in the convergence analysis. The proof of this proposition can be found in [2].
Proposition 4.
Let ε be arbitrary positive number and let pa(x,ε) be a vector satisfying (9). Then, one gets(16)Fx≤Fax,ε≤Fx+ε,(17)gax,ε-gx≤2ελ,(18)pax,ε-px≤2λε.
Algorithms which combine the proximal techniques with Moreau-Yosida regularization for solving the nonsmooth problem (1) have been proved to be effective [7, 9, 10], and also some trust region algorithms for solving (1) have been proposed in [5, 11, 12], and so forth. Recently, Yuan et al. [13, 14] and Li [15] have extended the spectral gradient method and conjugate gradient-type method to solve (1), respectively.
Multivariate spectral gradient (MSG) method was first proposed by Han et al. [16] for optimization problems. This method has a nice property that it converges quadratically for objective function with positive definite diagonal Hessian matrix [16]. Further studies on such method for nonlinear equations and bound constrained optimization can be found, for instance, in [17, 18]. By using nonmonotone technique, some effective spectral gradient methods are presented in [13, 16, 17, 19]. In this paper, we extend the multivariate spectral gradient method by combining with a nonmonotone line search technique as well as the Moreau-Yosida regulation function to solve the nonsmooth problem (1) and do some numerical experiments to test its efficiency.
The rest of this paper is organized as follows. In Section 2, we propose multivariate spectral gradient algorithm to solve (1). In Section 3, we prove the global convergence of the proposed algorithm; then some numerical results are presented in Section 4. Finally, we have a conclusion section.
2. Algorithm
In this section, we present the multivariate spectral gradient algorithm to solve the nonsmooth convex unconstrained optimization problem (1). Our approach is using the tool of the Moreau-Yosida regularization to smoothen the nonsmooth function and then make use of the approximate values of function F and gradient g in multivariate spectral gradient algorithm.
We first recall the multivariate spectral gradient algorithm [16] for smooth optimization problem:(19)minfx∣x∈Rn,where f:Rn→R is continuously differentiable and its gradient is denoted by g. Let xk be the current iteration; multivariate spectral gradient algorithm is defined by(20)xk+1=xk-diag1λk1,1λk2,…,1λkngk,where gk is the gradient vector of f at xk and diag{λk1,λk2,…,λkn} is solved by minimizing(21)diagλ1,λ2,…,λnsk-1-uk-12with respect to {λi}i=1n, where sk-1=xk-xk-1, uk-1=gk-gk-1.
Denote the ith element of sk and yk by ski and yki, respectively. We present the following multivariate spectral gradient (MSG) algorithm.
Algorithm 5.
Set x0∈Rn, σ∈(0,1), β>0, λ>0, γ≥0, δ>0, ρ∈[0,1], ϵ∈(0,1), E0=1, and τ0∈(0,1]; {τk} is a strictly decreasing sequence with limk→0τk=0, k∶=0.
Step 1.
Set ε0=τ0. Calculate Fa(x0,ε0) by (14) as well as ga(x0,ε0) by (15). Let J0=Fa(x0,ε0), d0=-ga(x0,ε0).
Step 2.
Stop if ga(xk,εk)=0. Otherwise, go to Step 3.
Step 3.
Choose εk+1 satisfying 0<εk+1≤min{τk,τkga(xk,εk)2}; find αk which satisfies(22)Faxk+αkdk,εk+1-Jk≤σαkgaxk,εkTdk,where αk=β2-ik and ik is the smallest nonnegative integer such that (22) holds.
Step 4.
Let xk+1=xk+αkdk. Stop if ga(xk+1,εk+1)=0.
Step 5.
Update Jk+1 by the following formula:(23)Ek+1=ρEk+1,Jk+1=ρEkJk+Faxk+αkdk,εk+1Ek+1.
Step 6.
Compute the search direction dk+1 by the following:
If yki/ski>0, then set λk+1i=yki/ski; otherwise set λk+1i=skTyk/skTsk for i=1,2,…,n, where yk=ga(xk+1,εk+1)-ga(xk,εk)+γsk, sk=xk+1-xk.
If λk+1i≤ϵ or λk+1i≥1/ϵ, then set λk+1i=δ for i=1,2,…,n.
Let dk+1=-diag{1/λk+11,1/λk+12,…,1/λk+1n}ga(xk+1,εk+1).
Step 7.
Set k∶=k+1; go back to Step 2.
Remarks. (i) The definition of εk+1=o(ga(xk,εk)2) in Algorithm 5, together with (15) and Proposition 3, deduces that(24)εk+1=oxk-paxk,εk2=oxk-xk+12=oαk2dk2;then, with the decreasing property of εk+1, the assumed condition εk=o(αk2dk2) in Lemma 7 holds.
(ii) From the nonmonotone line search technique (22), we can see that Jk+1 is a convex combination of the function value Fa(xk+1,εk+1) and Jk. Also Jk is a convex combination of the function values Fa(xk,εk), …, Fa(x1,ε1), Fa(x0,ε0) as J0=Fa(x0,ε0). ρ is a positive value that plays an important role in manipulating the degree of nonmonotonicity in the nonmonotone line search technique, with ρ=0 yielding a strictly monotone scheme and with ρ=1 yielding Jk=Ck, where(25)Ck=1k+1∑i=0kFaxi,εiis the average function value.
(iii) From Step 6, we can obtain that(26)minϵ,1δgaxk,εk≤dk≤max1ϵ,1δgaxk,εk;then there is a positive constant μ such that, for all k,(27)gaxk,εkTdk≤-μgaxk,εk2,which shows that the proposed multivariate spectral gradient algorithm possesses the sufficient descent property.
3. Global Convergence
In this section, we provide a global convergence analysis for the multivariate spectral gradient algorithm. To begin with, we make the following assumptions which have been given in [5, 12–14].
Assumption A.
(i) F is bounded from below.
(ii) The sequence {Vk}, Vk∈∂Bg(xk), is bounded; that is, there exists a constant M>0 such that, for all k,(28)Vk≤M.
The following two lemmas play crucial roles in establishing the convergence theorem for the proposed algorithm. By using (26) and (27) and Assumption A, similar to Lemma 1.1 in [20], we can get the next lemma which shows that Algorithm 5 is well defined. The proof ideas of this lemma and Lemma 1.1 in [20] are similar, hence omitted.
Lemma 6.
Let {Fa(xk,εk)} be the sequence generated by Algorithm 5. Suppose that Assumption A holds and Ck is defined by (25). Then one has Fa(xk,εk)≤Jk≤Ck for all k. Also, there exists a stepsize αk satisfying the nonmonotone line search condition.
Lemma 7.
Let {(xk,εk)} be the sequence generated by Algorithm 5. Suppose that Assumption A and εk=o(αk2dk2) hold. Then, for all k, one has (29)αk≥m0,where m0>0 is a constant.
Proof (Proof by Contradiction).
Let αk satisfy the nonmonotone Armijo-type line search (22). Assume on the contrary that liminfk→∞αk=0 does hold; then there exists a subsequence {αk}K′ such that αk→0 as k→∞. From the nonmonotone line search rule (22), αk′=αk/2 satisfies (30)Faxk+αk′dk,εk+1-Jk>σαk′gaxk,εkTdk;together with Fa(xk,εk)≤Jk≤Ck in Lemma 6, we have(31)Faxk+αk′dk,εk+1-Faxk,εk≥Faxk+αk′dk,εk+1-Jk>σαk′gaxk,εkTdk.By (28) and (31) and Proposition 4 and using Taylor’s formula, there is(32)σαk′gaxk,εkTdk<Faxk+αk′dk,εk+1-Faxk,εk≤Fxk+αk′dk-Fxk+εk+1=αk′dkTgxk+12αk′2dkTVukdk+εk+1≤αk′dkTgxk+M2αk′2dk2+εk+1,where uk∈(xk,xk+1). From (32) and Proposition 4, we have(33)αk2=αk′>gaxk,εk-gxkTdk-1-σgaxk,εkTdk-εk+1/αk′dk22M≥μ1-σgaxk,εk2-2εk/λdk-εk/αk′dk22M=μ1-σgaxk,εk2dk2-oαkλ-oαk2M≥μ1-σmax1/ϵ,1/δ2-oαkλ-oαk2M,where the second inequality follows from (26), Part 3 in Proposition 4, and εk+1≤εk, the equality follows from εk=o(αk2dk2), and the last inequality follows from (27). Dividing each side by αk and letting k→∞ in the above inequality, we can deduce that (34)12≥limk→∞2μ1-σmax1/ϵ,1/δ2M1αk=+∞,which is impossible, so the conclusion is obtained.
By using the above lemmas, we are now ready to prove the global convergence of Algorithm 5.
Theorem 8.
Let {xk} be generated by Algorithm 5 and suppose that the conditions of Lemma 7 hold. Then one has (35)limk→∞gxk=0;sequence {xk}k=0∞ has accumulation point, and every accumulation point of {xk}k=0∞ is optimal solution of problem (1).
Proof.
Suppose that there exist ϵ0>0 and k0>0 such that (36)gaxk,εk≥ϵ0,∀k>k0.From (22), (26), and (29), we get (37)Faxk+αkdk,εk+1-Jk≤σαkgaxk,εkTdk≤-σαkminϵ,1δgaxk,εk2≤-σm0ϵ0minϵ,1δ,∀k>k0.Therefore, it follows from the definition of Jk+1 and (23) that(38)Jk+1=ρEkJk+Faxk+αkdk,εk+1Ek+1≤ρEkJk+Jk-σm0ϵ0minϵ,1/δEk+1≤Jk-σm0ϵ0minϵ,1/δEk+1.By Assumption A, F is bounded from below. Further by Proposition 4, F(xk)≤Fa(xk,εk) for all k, we see that Fa(xk,εk) is bounded from below. Together with Fa(xk,εk)≤Jk for all k from Lemma 6, it shows that Jk is also bounded from below. By (38), we obtain(39)∑k=k0∞σm0ϵ0minϵ,1/δEk+1<∞.On the other hand, the definition of Ek+1 implies that Ek+1≤k+2, and it follows that (40)∑k=k0∞σm0ϵ0minϵ,1/δEk+1≥∑k=k0∞σm0ϵ0minϵ,1/δk+2=+∞.This is a contradiction. Therefore, we should have(41)limk→∞gaxk,εk=0.From (17) in Proposition 4 together with εk as k→∞, which comes from the definition of εk and limk→0τk=0 in Algorithm 5, we obtain(42)limk→∞gxk=0.Set x∗ as an accumulation point of sequence {xk}k=0∞; there is a convergent subsequence {xkl}l=0∞ such that(43)liml→∞xkl=x∗.From (4) we know that g(xk)=(xk-p(xk))/λ. Consequently, (42) and (43) show that x∗=p(x∗). Hence, x∗ is an optimal solution of problem (1).
4. Numerical Results
This section presents some numerical results from experiments using our multivariate spectral gradient algorithm for the given test nonsmooth problems which come from [21]. We also list the results of [14] (modified Polak-Ribière-Polyak gradient method, MPRP) and [22] (proximal bundle method, PBL) to make a comparison with the result of Algorithm 5. All codes were written in MATLAB R2010a and were implemented on a PC with 2.8 GHz CPU, 2 GB of memory, and Windows 8. We set β=λ=1, σ=0.9, ϵ=10-10, and γ=0.01, and the parameter δ is chosen as(44)δ=1if gaxk,εk>1,gaxk,εk-1if 10-5≤gaxk,εk≤1,10-5if gaxk,εk<10-5;then we adopt the termination condition ga(xk,εk)≤10-10. For subproblem (5), the classical PRP CG method (called subalgorithm) is used to solve it; the algorithm stops if ∂f(xk)≤10-4 or f(xk+1)-f(xk)+∂f(xk+1)2-∂f(xk)2≤10-3 holds, where ∂f(xk) is the subgradient of f(x) at the point xk. The subalgorithm will also stop if the iteration number is larger than fifteen. In its line search, the Armijo line search technique is used and the step length is accepted if the search number is larger than five. Table 1 contains problem names, problem dimensions, and the optimal values.
Test problems.
Nr.
Problems
Dim.
fops(x)
1
Rosenbrock
2
0
2
Crescent
2
0
3
CB2
2
1.9522245
4
CB3
2
2.0
5
DEM
2
−3
6
QL
2
7.20
7
LQ
2
−1.4142136
8
Mifflin 1
2
−1.0
9
Mifflin 2
2
−1.0
10
Wolfe
2
−8.0
11
Rosen-Suzuki
4
−44
12
Shor
5
22.600162
The summary of the test results is presented in Tables 2-3, where “Nr.” denotes the name of the tested problem, “NF” denotes the number of function evaluations, “NI” denotes the number of iterations, and “f(x)” denotes the function value at the final iteration.
Results on Rosenbrock with different ρ and ε.
τk
ρ=0
Time
ρ=0.75
Time
NI/NF/f(x)
NI/NF/f(x)
1/2k2
30/46/1.581752e − 9
1.794
29/30/7.778992e − 9
1.076
1/3(k+2)3
28/38/5.207744e − 9
1.420
26/27/6.541087e − 9
1.023
1/4(k+2)4
29/37/1.502034e − 9
1.388
27/28/5.112699e − 9
1.030
1/5(k+2)5
27/37/1.903969e − 9
1.451
27/28/6.329141e − 9
1.092
1/6(k+2)6
27/36/4.859901e − 9
1.376
27/28/6.073222e − 9
1.025
Numerical results for MSG/MPRP/PBL on problems 1–12.
Nr.
MSG
MPRP
PBL
fops(x)
NI/NF/f(x)
NI/NF/f(x)
NI/NF/f(x)
1
29/30/7.778992e − 9
46/48/7.091824e − 7
42/45/0.381e − 6
0
2
9/10/1.450669e − 5
11/13/6.735123e − 5
18/20/0.679e − 6
0
3
9/10/1.9522245
12/14/1.952225
32/34/1.9522245
1.9522245
4
4/9/2.000009
2/6/2.000098
14/16/2.0000000
2.0
5
3/4/−2.999949
4/6/−2.999866
17/19/−3.0000000
−3
6
11/12/7.200000
10/12/7.200011
13/15/7.2000015
7.20
7
3/4/−1.4142136
2/3/−1.414214
11/12/−1.4142136
−1.4142136
8
9/10/−0.9999638
4/6/−0.9919815
66/68/−0.9999994
−1.0
9
12/13/−0.9999978
20/23/−0.9999925
13/15/−1.0000000
−1.0
10
5/6/−7.999999
—
43/46/−8.0000000
−8.0
11
6/7/−43.99797
28/58/−43.99986
43/45/−43.999999
−44
12
12/13/2.260017
33/91/22.60023
27/29/22.600162
22.600162
The value of ρ controls the nonmonotonicity of line search which may affect the performance of the MSG algorithm. Table 2 shows the results for different parameter ρ, as well as different values of the parameter τk ranging from 1/6(k+2)6 to 1/2k2 on problem Rosenbrock, respectively. We can conclude from the table that the proposed algorithm works reasonably well for all the test cases. This table also illustrates that the value of ρ can influence the performance of the algorithm significantly if the value of ε is within a certain range, and the choice ρ=0.75 is better than ρ=0.
Then, we compare the performance of MSG to that of the algorithms MPRP and PBL. In this test, we fix τk=1/2k2 and ρ=0.75. To illustrate the performance of each algorithm more specifically, we present three comparison results in terms of number of iterations, number of function evaluations, and the final objective function value in Table 3.
The numerical results indicate that Algorithm 5 can successfully solve the test problems. From the number of iterations in Table 3, we see that Algorithm 5 performs best among these three methods, and the final function value obtained by Algorithm 5 is closer to the optimal function value than those obtained by MPRP and PBL. In a word, the numerical experiments show that the proposed algorithm provides an efficient approach to solve nonsmooth problems.
5. Conclusions
We extend the multivariate spectral gradient algorithm to solve nonsmooth convex optimization problems. The proposed algorithm combines a nonmonotone line search technique and the idea of Moreau-Yosida regularization. The algorithm satisfies the sufficient descent property and its global convergence can be established. Numerical results show the efficiency of the proposed algorithm.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The author would like to thank the anonymous referees for their valuable comments and suggestions which help a lot to improve the paper greatly. The author also thanks Professor Gong-lin Yuan for his kind offer of the source BB codes on nonsmooth problems. This work is supported by the National Natural Science Foundation of China (Grant no. 11161003).
Hiriart-UrrutyJ. B.LemaréchalC.1993Berlin, GermanySpringerFukushimaM.QiL.A globally and superlinearly convergent algorithm for nonsmooth convex minimization1996641106112010.1137/s1052623494278839MR14165312-s2.0-0030508716QiL. Q.SunJ.A nonsmooth version of Newton's method199358335336710.1007/bf01581275MR12167912-s2.0-0027543961ClarkeF. H.1983New York, NY, USAWileyMR709590LuS.WeiZ.LiL.A trust region algorithm with adaptive cubic regularization methods for nonsmooth convex minimization201251255157310.1007/s10589-010-9363-1MR28919072-s2.0-84862021931QiL. Q.Convergence analysis of some algorithms for solving nonsmooth equations199318122724410.1287/moor.18.1.227MR1250115CorreaR.LemaréchalC.Convergence of some algorithms for convex minimization1993621–326127510.1007/bf01585170MR12476172-s2.0-33646971357FukushimaM.A descent algorithm for nonsmooth convex optimization198430216317510.1007/bf02591883MR7580022-s2.0-0021503955BirgeJ. R.QiL.WeiZ.Convergence analysis of some methods for minimizing a nonsmooth convex function199897235738310.1023/a:1022630801549MR16250922-s2.0-0032376260WeiZ.QiL.BirgeJ. R.A new method for nonsmooth convex optimization19982215717910.1155/s1025583498000101MR1672000SagaraN.FukushimaM.A trust region method for nonsmooth convex optimization20051217118010.3934/jimo.2005.1.171MR2136783ZBL1177.90319YuanG.WeiZ.WangZ.Gradient trust region algorithm with limited memory BFGS update for nonsmooth convex minimization2013541456410.1007/s10589-012-9485-8MR30034162-s2.0-84871627021YuanG.WeiZ.The Barzilai and Borwein gradient method with nonmonotone line search for nonsmooth convex optimization problems201217220321610.3846/13926292.2012.661375MR29043642-s2.0-84865284704YuanG.WeiZ.LiG.A modified Polak-Ribière-Polyak conjugate gradient algorithm for nonsmooth convex programs2014255869610.1016/j.cam.2013.04.032MR30934062-s2.0-84878121452LiQ.Conjugate gradient type methods for the nondifferentiable convex minimization20137353354510.1007/s11590-011-0437-5MR30227892-s2.0-84874419561HanL.YuG.GuanL.Multivariate spectral gradient method for unconstrained optimization20082011-262163010.1016/j.amc.2007.12.054MR24319592-s2.0-44649195783YuG.NiuS.MaJ.Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints20139111712910.3934/jimo.2013.9.117MR30030192-s2.0-84875236895YuZ.SunJ.QinY.A multivariate spectral projected gradient method for bound constrained optimization201123582263226910.1016/j.cam.2010.10.023MR27631412-s2.0-79251600720XiaoY.HuQ.Subspace Barzilai-Borwein gradient method for large-scale bound constrained optimization200858227529010.1007/s00245-008-9038-9MR24396632-s2.0-51849107311ZhangH.HagerW. W.A nonmonotone line search technique and its application to unconstrained optimization20041441043105610.1137/s1052623403428208MR21129632-s2.0-9944262108LukšanL.VlčekJ.Test problems for nonsmooth unconstrained and linearly constrained optimization2000798Praha, Czech RepublicInstitute of Computer Science, Academy of Sciences of the Czech RepublicLukšanL.VlčekJ.A bundle-Newton method for nonsmooth unconstrained minimization199883337339110.1016/s0025-5610(97)00108-1MR1650317