Recently, Zhang et al. proposed a sufficient descent Polak-Ribière-Polyak (SDPRP) conjugate gradient method for large-scale unconstrained optimization problems and proved its global convergence in the sense that lim infk→∞∥∇f(xk)∥=0 when an Armijo-type line search is used. In this paper, motivated by the line searches proposed by Shi et al. and Zhang et al., we propose two new Armijo-type line searches and show that the SDPRP method has strong convergence in the sense that limk→∞∥∇f(xk)∥=0 under the two new line searches. Numerical results are reported to show the efficiency of the SDPRP with the new Armijo-type line searches in practical computation.
1. Introduction
In this paper, we are concerned with the following unconstrained minimization problem:
(1)minf(x),x∈Rn,
where f:Rn→R1 is a smooth function whose gradient ∇f(x) is often denoted by g(x). The related problem is called large-scale minimization problem when its dimension n is very large (e.g., n>106). For solving large-scale minimization problems, the matrices-free methods are quite efficient. Among such methods, the conjugate gradient method is very famous for its excellent numerical performance in the practical computation. Much progress has been achieved in the study of global convergence of the various conjugate gradient methods, such as the Polak-Ribière-Polyak (PRP) [1, 2], the Fletcher-Reeves (FR) [3], the Hestenes-Stiefel (HS) [4, 5], and the Dai-Yuan (DY) [6] conjugate gradient methods, et al.
Recently, Zhang et al. [7] presented a sufficient descent Polak-Ribière-Polyak (SDPRP) conjugate gradient method for solving large-scale problem (1), whose most important property is that its generated direction is always a sufficient descent direction for the objective function. Moreover, this property is independent of the line search used, and it reduces to the classical PRP method when the exact line search is used. The iterative process of the SDPRP method is given by
(2)xk+1=xk+αkdk,k=0,1,…,
where xk is the current iterate, αk>0 is called the stepsize which can be obtained by some line search techniques, such as the Armijo line search, the Goldstein line search, and the (strong) Wolfe line search, and dk is the search direction determined by
(3)dk={-gk,ifk=0,-gk+βkPRPdk-1-θkyk-1,ifk≥1,
with
(4)βkPRP=gk⊤yk-1∥gk-1∥2,θk=gk⊤dk-1∥gk-1∥2,
where yk-1=gk-gk-1. It is easy to deduce from (3) and (4) that
(5)gk⊤dk=-∥gk∥2,
which indicates that dk is a sufficient descent direction of f(x) at the current iterate xk if ∥gk∥≠0; that is, xk is not a stationary point of the objective function f(x). It has been proved that SDPRP method has global convergence under an Armijo-type line search [7] in the sense that
(6)liminfk→∞∥g(xk)∥=0,
which means that at least one cluster point of the sequence {xk} is a stationary point if it is bounded.
In another recent paper, Shi and Shen [8] showed that the classical PRP method in [1] has strong convergence and linear convergence rate under a customized Armijo-type line search, which is somewhat complicated. The new Armijo-type line search ensures that the search direction generated by the classical PRP method possesses the sufficient descent property, which is helpful to prove the global convergence.
In this paper, motivated by the Armijo-type line search in [8], we first propose a similar but simple line search, which can ensure that the SDPRP method has strongly global convergence in the sense that
(7)limk→∞∥g(xk)∥=0,
that is, any cluster point of the sequence {xk} is a stationary point of the objective function f(x). Noting that the above new line search needs to estimate the Lipschitz constant, which is not easy even for linear function, we present another Armijo-type line search, which is motivated by the line search in [7]. This new line search can also guarantee the global convergence of the SDPRP method in the above sense.
The remainder of the paper is organized as follows. In Section 2 we introduce the two new Armijo-type line searches and present the strongly convergent SDPRP method. The global convergence is established under the above two new Armijo-type line searches in Section 3. Some numerical results are presented in Section 4, and in the last section, we conclude the paper with some remarks.
2. Strongly Convergent SDPRP Method
First, we give the following basic assumptions on the objection function f(x).
Assumption 1.
Consider the following.
The objective function f(x) has a lower bound on the level set L0={x∈Rn∣f(x)≤f(x0)}, where x0 is the starting point.
In some neighborhood N of L0, the gradient g(x) is Lipschitz continuous on an open convex set B that contains L0, that is, there exists a constant L>0 such that
(8)∥g(x)-g(y)∥≤L∥x-y∥,foranyx,y∈B.
The level set L0={x∈Rn∣f(x)≤f(x0)} is bounded.
Although g(x) is Lipschitz continuous, the Lipschitz constant L is usually unknown in practice, even for the linear function g(x). Therefore, we need to estimate the Lipschitz constant L. Here, we adopt one of three estimating approaches proposed in [9]. More specifically, if k≥1, then we can set
(9)L≅Lk=max{Lk-1,∥yk-1∥∥sk-1∥},
with L0>0, and sk-1=xk-xk-1.
Armijo-Type Line Search I. Set μ∈(0,1), ρ∈(0,1), c∈(0,1), and the initial stepsize δk=(1-c)∥gk∥2/(Lk∥dk∥2), where Lk is determined by (9). Let αk be the largest α in {δk,ρδk,ρ2δk,…} such that
(10)f(xk+αdk)-f(xk)≤-μα∥gk∥2.
Armijo-Type Line Search II. Set μ>0, ρ∈(0,1). Let αk be the largest α in {1,ρ,ρ2,…} such that
(11)f(xk+αdk)≤f(xk)-μα2∥dk∥4.
Now we begin to describe the strongly convergent SDPRP method.
Algorithm 2 [strongly convergent SDPRP method]
Step 0. Given an initial point x0∈Rn, μ∈(0,1/2) and ρ∈(0,1), c∈(0,1) and set d0=-g0, k:=0.
Step 1. If ∥gk∥=0 then stop; otherwise go to Step 2.
Step 2. Compute the descent direction dk by (3) and (4). Determine the stepsize αk by the Armijo-type line search (10) or (11).
Step 3. Set xk+1=xk+αkdk, and k:=k+1; go to Step 1.
Lemma 3.
Assume that (H1) and (H2) hold, then there exist m0>0 and M0>0 such that for any k≥0, one has
(12)m0≤Lk≤M0,
where Lk is defined by (9).
Proof.
See [9, Lemma 2.1].
Lemma 4.
Assume that (H1) and (H2) hold. If ∥gk∥>0, then the new Armijo-type line search I is well-defined for the index k.
Proof.
The proof is easy; for completeness, we give the proof here. In fact, we can prove this lemma by contradiction. Suppose that the conclusion does not hold; then for k, the inequality (10) does not hold for any nonnegative integer m; that is,
(13)f(xk)-f(xk+δkρmdk)<μδkρm∥gk∥2,∀m.
Thus,
(14)f(xk)-f(xk+δkρmdk)δkρm<μ∥gk∥2,∀m.
Letting m→+∞, by the continuity of f(x) and -gk⊤dk=∥gk∥2, we can obtain
(15)∥gk∥2≤μ∥gk∥2.
This and μ∈(0,1) yield that
(16)∥gk∥=0,
which contradicts to ∥gk∥>0. The proof is completed.
Lemma 5.
Assume that (H2) and (H3) hold. If ∥gk∥>0, then the new Armijo-type line search II is well-defined for the index k.
Proof.
The lemma is also proved by contradiction. Suppose that the conclusion does not hold; then for k, the inequality (11) does not hold for any nonnegative integer m; that is,
(17)f(xk+ρmdk)>f(xk)-μρ2m∥dk∥4,∀m.
That is,
(18)f(xk+ρmdk)-f(xk)ρm>-μρm∥dk∥4,∀m.
Letting m→+∞, by the continuity of f(x) and -gk⊤dk=∥gk∥2, we can obtain
(19)-∥gk∥2≥0,
that is,
(20)∥gk∥=0,
which contradicts to ∥gk∥>0. The proof is completed.
3. Strongly Global Convergence
Throughout this section, we assume that ∥gk∥>0, for all k≥0; otherwise a stationary point of the objective function f(x) has been found.
3.1. Global Convergence of SDPRP Method with the Line Search I
We first prove the global convergent of SDPRP method with the Armijo-type line search I.
Lemma 6.
For all k≥0, one has
(21)∥dk∥≤(1+2L(1-c)m0)∥gk∥,∀k,
where m0 is defined in Lemma 3.
Proof.
If k=0 then
(22)∥dk∥=∥gk∥≤(1+2L(1-c)m0)∥gk∥.
If k≥1 then, from (3), (4), and (H2), we can get that
(23)∥dk+gk∥=∥βkPRPdk-1-θkyk-1∥≤2∥gk-gk-1∥∥dk-1∥∥gk-1∥2∥gk∥≤2Lαk-1∥dk-1∥2∥gk-1∥2∥gk∥≤2Lδk-1∥dk-1∥2∥gk-1∥2∥gk∥≤2L(1-c)m0∥gk∥,
which together with the triangular inequality implies that
(24)∥dk∥≤∥dk+gk∥+∥gk∥≤(1+2L(1-c)m0)∥gk∥.
This completes the proof.
The following lemma shows that the stepsize sequence {αk} generated by the Armijo-type line search I is bounded from below.
Lemma 7.
For all k≥0, there exists a constant C>0, such that
(25)αk≥C,
in which αk is generated by the Armijo-type line search I.
Proof.
We divide the proof into two cases: αk=δk and αk<δk. For the first case, by (12) and (21), we get
(26)αk≥(1-c)M0(1+2L(1-c)m0)-2.
For the second case, that is αk<δk, which indicates that αk/ρ does not satisfy (10); that is,
(27)f(xk+αkdkρ)>f(xk)-μαk∥gk∥2ρ.
Using the mean value theorem in the above inequality, we obtain θk∈(0,1), such that
(28)[g(xk+θkαkdkρ)-gk]⊤dk>(1-μ)∥gk∥2.
This inequality and (H2), (21) show that
(29)Lαkρ≥∥g(xk+θkαkdk/ρ)-gk∥∥dk∥=∥g(xk+θkαkdk/ρ)-gk∥·∥dk∥∥dk∥2≥[g(xk+θkαkdk/ρ)-gk]⊤dk∥dk∥2≥(1-μ)∥gk∥2∥dk∥2≥(1-μ)(1+2L(1-c)m0)-2.
Therefore, we have that
(30)αk≥(1-μ)ρL(1+2L(1-c)m0)-2.
Obviously, (26) and (30) show that (25) holds with
(31)C=min{(1-c)M0,(1-μ)ρL}(1+2L(1-c)m0)-2.
This completes the proof.
We are now ready to establish the strong convergence of SDPRP method using the Armijo-type line search I.
Theorem 8.
Suppose that (H1) and (H2) hold. Then
(32)limk→∞∥gk∥=0.
Proof.
Since the generated sequence {xk}⊆L0 and the objection function f(x) is bounded below on the level set L0, by (10) and (25), we have
(33)∑k=0∞Cμ∥gk∥2≤∑k=0∞(fk-fk+1)<f0.
Thus
(34)limk→∞∥gk∥=0.
This completes the proof.
3.2. Global Convergence of SDPRP Method with the Line Search II
Then, we prove the strongly global convergent of SDPRP method with the Armijo-type line search II. It is obvious that xk∈L0 for all k≥0. Therefore, from the line search II, we have
(35)limk→∞αk∥dk∥2=0.
This together with (5) implies that
(36)limk→∞αk∥gk∥2≤limk→∞αk∥dk∥2=0.
In addition, (H3) implies that there is a constant M>0 such that
(37)∥gk∥≤M,∀k≥0.
Lemma 9.
Suppose that (H2) and (H3) hold. Then for all k≥0, one has
(38)αk≥min{1,ρ∥gk∥2(L+μ∥dk∥2)∥dk∥2}.
Proof.
If αk≠1, then αk′=αk/ρ does not satisfy (11); that is
(39)f(xk+αk′dk)>f(xk)-μ(αk′)2∥dk∥4.
From the mean value theorem and (H2), there exists a constant θk∈(0,1), such that
(40)f(xk+αk′dk)-f(xk)=αk′g(xk+θkαk′dk)⊤dk=αk′gk⊤dk+(g(xk+θkαk′dk)-gk)⊤dk≤-αk′∥gk∥2+(αk′)2L∥dk∥2,
which together with (36) shows that (35) holds. This completes the proof.
We are now ready to establish the strong convergence of SDPRP method using the Armijo-type line search II. The proof is motivated by the proof of Theorem 2.2 in [10].
Theorem 10.
Suppose that (H2) and (H3) hold. Then
(41)limk→∞∥gk∥=0.
Proof.
For the sake of contradiction, we suppose that the conclusion is not right. Then there exist a constant ϵ>0 and an infinite index set K such that
(42)∥gk-1∥≥ϵ,∀k∈K.
Moreover, the fact αk≤1, (35) and (H2) imply that
(43)∥gk-gk-1∥2≤L2αk-12∥dk-1∥2≤L2αk-1∥dk-1∥2⟶0.
This and (42) indicate that there exists a positive constant ϵ1 such that for sufficiently large k∈K, we have
(44)∥gk∥≥ϵ1.
Then by (36) and (44), we can get
(45)limk→∞,k∈Kαk=0.
By (3), (4), and (H2), for all k∈K, we have
(46)∥dk∥≤∥gk∥+|βkPRP|∥dk-1∥+|θk|∥yk-1∥≤∥gk∥+2Lαk-1∥dk-1∥2∥gk-1∥2∥gk∥≤(1+2Lαk-1∥dk-1∥2ϵ2)∥gk∥.
From (35), for all sufficiently large k∈K, there exist a constant r>0, such that
(47)αk-1∥dk-1∥2≤r.
Therefore ∥dk∥≤M2∥gk∥ with M2=1+(2Lr/ϵ2). Thus, for k∈K, this and (37) imply that
(48)∥dk∥≤MM2.
Thus, from (38) and (48), we get
(49)αk≥min{1,ρ(L+μM2M22)M22},∀k∈K,
which contradicts (45). The proof is then completed.
4. Numerical Results
In this section, we present some numerical results to compare the performance of SDPRP method with the two new Armijo-type line searches I and II and the three-term PRP method in [7].
SDPRPI: the SDPRP method with the line search (10), with μ=10-4,ρ=0.5,c=0.2;
SDPRPII: the SDPRP method with the line search (11), with μ=10-4,ρ=0.5.
TTPRP: the two-term PRP method with the following Armijo-type line search: let αk be the largest α in {1,ρ,ρ2,…} such that
(50)f(xk+αdk)≤f(xk)-μα2∥dk∥2,
where μ=10-4,ρ=0.5.
All codes were written in Matlab 7.1 and run on a portable computer. We stopped the iteration if the number of iteration exceeds 10000 or ∥gk∥<10-5. Tables 1 and 2 list the numerical results for solving some test problems numbered from 1 to 30 in [11] with different dimension n. Our numerical results are listed in the form NI/NF/CPU, where the symbols NI, NF, and CPU mean the number of iterations, the number of function evaluations, and the CPU time in seconds, respectively.
The results for the methods on the tested problems.
P
n
SDPRPI
SDPRPII
TTPRP
FREUROTH
50
45/265/0.0313
152/1778/0.1719
62/633/0.0625
Extended trigonometric
1000
30/112/0.2188
218/2870/4.4375
26/122/0.2344
3000
72/116/0.8906
F
82/227/1.1875
SROSENBR
500
850/2472/0.5625
1567/4536/0.9375
1151/3288/0.7344
Extended White and Holst
1000
125/485/0.3906
170/2170/1.2031
71/760/0.4531
5000
133/561/1.9688
411/6741/18.1563
65/701/2.0625
BEALE
1000
131/453/0.2813
49/282/0.1406
64/380/0.1563
5000
116/370/1.1875
41/233/0.6250
55/329/0.7813
Extended penalty
1000
38/352/0.1094
F
26/250/0.0781
3000
29/328/0.2813
F
26/277/0.2500
Perturbed quadratic
1000
340/2761/0.6875
350/3850/0.8750
283/3062/0.6719
5000
699/6693/6.2656
1161/19049/16.5625
725/9442/8.2500
Raydan 1
500
168/611/0.2188
214/1177/0.3125
186/1017/0.2813
Raydan 2
1000
5/6/0.0313
4/6/0.0625
5/6/0.0625
5000
5/6/0.2969
5/8/0.3281
5/6/0.2969
10000
5/6/1.1250
6/13/1.1563
5/6/1.1406
Diagonal1
100
87/353/0.0625
88/451/0.0625
92/476/0.0781
Diagonal2
1000
6755/6756/10.9531
6753/6754/10.5625
6753/6754/10.4063
Diagonal3
100
84/435/0.0938
103/678/0.1250
103/678/0.1250
Hager
500
55/221/0.1875
61/280/0.2188
48/218/0.1563
Generalized tridiagonal-1
1000
46/236/0.3125
41/231/0.3281
46/262/0.3438
5000
43/213/1.7656
42/234/1.8438
49/277/2.1094
Extended tridiagonal-1
1000
58/104/0.1406
64/120/0.2031
58/105/0.1563
5000
60/110/0.9219
214/366/2.1406
68/121/0.9688
Extended three exponential
1000
25/ 81/0.0938
39/160/0.1563
39/160/0.1719
3000
29/ 95/0.3438
38/146/0.4844
38/146/0.4688
Generalized tridiagonal-2
1000
47/278/0.3438
55/441/0.6250
76/614/0.8438
The results for the methods on the tested problems.
P
n
SDPRPI
SDPRPII
TTPRP
Diagonal4 function
5000
195/1606/8.7656
66/533/3.8125
188/1764/9.3438
1000
36/156/0.0625
57/401/0.0781
60/418/0.1094
5000
36/154/0.4219
74/596/0.7031
78/532/0.5781
Diagonal5
1000
3/4/0.0469
3/4/0.0156
3/4/0.0469
5000
4/5/0.4688
4/5/0.4688
4/5/0.5000
HIMMELBC
1000
48/295/0.0938
47/329/0.0781
48/313/0.0938
5000
53/326/0.7344
88/716/0.9844
52/338/0.7344
Generalized PSC1
1000
292/540/0.5313
326/726/0.7188
422/756/0.7656
5000
373/733/3.6094
352/1355/7.5469
373/733/3.6094
Extended PSC1
1000
28/80/0.0938
28/138/0.1094
18/59/0.0625
5000
27/78/0.6250
56/502/1.9063
18/59/0.5781
Extended Powell
1000
213/1548/1.1094
415/5098/2.7188
207/1498/1.0781
5000
133/979/3.5781
F
145/1055/3.8281
Extended block diagonal
1000
25/128/0.0625
37/213/0.1094
37/213/0.1094
5000
30/151/0.6406
41/216/0.7656
41/216/0.7813
Extended Maratos
500
36/257/0.0469
48/1451/0.1563
48/466/0.0625
Extended Cliff
1000
41/235/0.1406
633/3098/1.7031
71/363/0.2188
5000
47/255/1.0625
963/3585/10.2188
68/357/1.2344
Quadratic diagonal perturbed
1000
842/4828/1.2969
496/5415/1.1406
709/7359/1.5313
5000
1049/7363/5.3906
1031/15570/9.2500
767/9203/5.5156
Extended Wood
1000
237/1519/0.3594
F
299/2940/0.5313
5000
222/1483/1.7188
F
174/1738/1.8750
Extended Hiebert
1000
582/2850/0.7188
F
59/763/0.1406
5000
609/3056/3.4063
F
73/932/1.0000
Quadratic function
1000
343/2436/0.5313
360/3700/0.6719
321/3122/0.5938
5000
808/7176/6.5781
F
746/8948/7.1875
Figures 1 and 2 show the performance of these methods relative to the number of function evaluations and CPU time, respectively, which are evaluated using the profiles of Dolan and Moré [12]. That is, for each method, we plot the fraction P of problems for which the method is within a factor τ of the best time. The left side of the figure gives the percentage of the test problems for which a method is fastest; while the right side gives the percentage of the test problems that are successfully solved by each of the methods. The top curve is the method that solved most problems in a time that was within a factor τ of the best time. Figures 1 and 2 show that SDPRPI method performs a little better than TTPRP method and obviously better than SDPRPII method. It solves about 72% and 63% of the problems with the smallest number of function evaluations and CPU time, respectively. Obviously, the performance of SDPRPII method is not so good, and, in the future, we will further study the corresponding line search. Of course, more numerical experiments should be carried out to test our proposed methods.
Performance profiles of three methods about the number of function evaluations.
Performance profiles of three methods about CPU time.
5. Conclusion
In this paper, we have proposed two new Armijo-type line searches and proved that the sufficient descent PRP method proposed by Zhang et al. is strongly global convergent with the two new line searches. Numerical results show that the SDPRP method with the proposed line searches is efficient for the test problems.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors would like to thank the referees for giving them many valuable suggestions and comments, which improved this paper greatly. This work is supported by the Nature Science Foundation of Shandong Province (ZR2012AL08).
PolakB.RibièreG.Note surla convergence des méthodes de directions conjuguées1969163543PolyakB. T.The conjugate gradient method in extremal problems196994941122-s2.0-000193164410.1016/0041-5553(69)90035-4ZBL0191.49003FletcherR.ReevesC.Function minimization by conjugate gradients19647149154HestenesM. R.StiefelE. L.Method of conjugate gradient for solving linear systems195249409432ZhangL.ZhouW. J.LiD. H.Some descent three-term conjugate gradient methods and their global convergence20072246977112-s2.0-3414720293210.1080/10556780701223293ZBL1220.90094DaiY. H.YuanY.A nonlinear conjugate gradient method with a strong global convergence property19991011771822-s2.0-0033266804ZhangL.ZhouW.LiD. H.A descent modified Polak-Ribière-Polyak conjugate gradient method and its global convergence20062646296402-s2.0-3374962181110.1093/imanum/drl016ZBL1106.65056ShiZ. J.ShenJ.Convergence of the Polak-Ribière-Polyak conjugate gradient method2007666142814412-s2.0-3384589333410.1016/j.na.2006.02.001ZBL1120.49027ShiZ.-J.ShenJ.Convergence of Liu-Storey conjugate gradient method200718225525602-s2.0-3424749555110.1016/j.ejor.2006.09.066ZBL1121.90125ZhouW. J.ZhouY. H.On the strong convergence of a modified Hestenes-Stiefel method for nonconvex optimization201394893899NeculaiA.http://camo.ici.ro/neculai/UNO/UNO.FORDolanE. D.MoréJ. J.Benchmarking optimization software with performance profiles20029122012132-s2.0-2824449609010.1007/s101070100263ZBL1049.90004