A mixed spectral CD-DY conjugate descent method for solving unconstrained optimization problems is proposed, which combines the advantages of the spectral conjugate gradient method, the CD method, and the DY method. Under the Wolfe line search, the proposed method can generate a descent direction in each iteration, and the global convergence property can be also guaranteed. Numerical results show that the new method is efficient and stationary compared to the CD (Fletcher 1987) method, the DY (Dai and Yuan 1999) method, and the SFR (Du and Chen 2008) method; so it can be widely used in scientific computation.
1. Introduction
The purpose of this paper is to study the global convergence properties and practical computational performance of a mixed spectral CD-DY conjugate gradient method for unconstrained optimization without restarts, and with appropriate conditions.
Consider the following unconstrained optimization problem:
minx∈Rnf(x),
where f:Rn→R is continuously differentiable and its gradient g(x)=∇f(x) is available. Generally, we use the iterative method to solve (1.1), and the iterative formula is given by
xk+1=xk+αkdk,
where xk is the current iteration, αk is a positive scalar and called the step-size, which is determined by some line search, dk is the search direction defined by
dk={-gk,fork=1,-gk+βkdk-1,fork≥2,
where gk=∇f(xk) and βk is a scalar which determines the different conjugate gradient methods [1, 2].
There are many kinds of iterative method that include the steepest descent method, Newton method, and conjugate gradient method. The conjugate direction method is a commonly used and effective method in optimization, and it only needs to use the information of the first derivative. However, it overcomes the shortcoming of the steepest descent method in the slow convergence and avoids the defects of Newton method in storage and computing the second derivative.
The original CD method was proposed by Fletcher [3], in which βk is defined by
βkCD=-‖gk‖2dk-1Tgk-1,
where ∥·∥ denotes the Euclidean norm of vectors. An important property of the CD method is that the method in each iteration will produce a descent direction under the strong Wolfe line search:
f(xk+αkdk)≤f(xk)+δαkgkTdk,|g(xk+αkdk)Tdk|≤-σgkTdk,
where 0<δ<σ<1. Dai and Yuan [4] first proposed the DY method, in which βk is defined by
βk=‖gk‖2dk-1T(gk-gk-1).
Dai and Yuan [4] also strictly proved that the DY method in each iteration produces a descent direction under the Wolfe line search (1.5) and
g(xk+αkdk)Tdk≥σgkTdk.
Some good results about the CD method and DY method have also been reported in recent years [5–11].
Quite recently, Birgin and Martinez [12] proposed a spectral conjugate gradient method by combining conjugate gradient method and spectral gradient method. Unfortunately, the spectral conjugate gradient method [12] cannot guarantee to generate descent directions. So, based on the FR formula, Zhang et al. [13] modified the FR method such that the direction generated is always a descent direction. Based on the modified FR conjugate gradient method [13], Du and Chen [14] proposed a new spectral conjugate gradient method:
dk={-gk,fork=1,-θkgk+βkFRdk-1,fork≥2,
where
βkFR=‖gk‖2‖gk-1‖2,θk=dkT(gk-gk-1)‖gk-1‖2.
And they proved the global convergence of the modified spectral FR method (In this paper, we call it SFR method) with the mild conditions.
The observation of the above formula motivates us to construct a new formula; which combines the advantage of the spectral gradient method, CD method, and DY method as follows:
dk={-gk,fork=1,-θkgk+βkdk-1,fork≥2,
where βk is specified by
βk=βkCD+min{0,φk⋅βkCD},θk=1-gkTdk-1gk-1Tdk-1,φk=-gkTdk-1dk-1T(gk-gk-1).
And under some mild conditions, we give the global convergence of the mixed spectral CD-DY conjugate gradient method with the Wolfe line search.
This paper is organized as follows. In Section 2, we propose the corresponding algorithm and give some assumptions and lemmas, which are usually used in the proof of the global convergence properties of nonlinear conjugate gradient methods. In Section 3, global convergence analysis is provided with suitable conditions. Preliminary numerical results are presented in Section 4.
2. Algorithm and Lemmas
In order to establish the global convergence of the proposed method, we need the following assumption on objective function, which have been used often in the literature to analyze the global convergence of nonlinear conjugate gradient methods with inexact line search.
Assumption 2.1.
(i) The level set Ω={x∣f(x)≤f(x1)} is bounded, where x1 is the starting point.
In some neighborhood N of Ω, the objective function is continuously differentiable, and its gradient is Lipchitz continuous, that is, there exists a constant L>0 such that
‖g(x)-g(y)‖≤L‖x-y‖,∀x,y∈N.
Now we give the mixed spectral CD-DY conjugate gradient method as follows.
Algorithm 2.2.
Step 1.
Data x1∈Rn, ε≥0. Set d1=-g1; if ∥g1∥≤ε, then stop.
Step 2.
Compute αk by the strong Wolfe line search (1.5) and (1.8).
Step 3.
Let xk+1=xk+αkdk,gk+1=g(xk+1); if ∥gk+1∥≤ε, then stop.
Step 4.
Compute βk+1 by (1.12), and generate dk+1 by (1.11).
Step 5.
Set k=k+1; go to Step 2.
The following lemma shows that Algorithm 2.2 produces a descent direction in each iteration with the Wolfe line search.
Lemma 2.3.
Let the sequences {gk} and {dk} be generated by Algorithm 2.2, and let the step-size αk be determined by the Wolfe line search (1.5) and (1.8), then
gkTdk<0.
Proof.
The conclusion can be proved by induction. Since g1Td1=-∥g1∥2, the conclusion holds for k=1. Now we assume that the conclusion is true for k-1,k≥2. Then from (1.8), we have
dk-1T(gk-gk-1)≥(σ-1)gk-1Tdk-1>0.
If gkTdk-1≤0, then from (1.14) and (2.3), we have βk=βkCD.
From (1.4), (1.11), and (1.13), we have
gkTdk=-(1-gkTdk-1gk-1Tdk-1)⋅‖gk‖2-‖gk‖2gk-1Tdk-1⋅gkTdk-1=-‖gk‖2<0.
If gkTdk-1>0, then from (1.14) and (2.3), we have βk=βkCD+φk·βkCD=βkDY.
From (1.11), (1.7), and (1.13), we have
gkTdk=-θk⋅‖gk‖2+βkDY⋅gkTdk-1=βkDY⋅[-θk⋅dk-1T(gk-gk-1)+gkTdk-1]=βkDY⋅[gk-1Tdk-1+gkTdk-1gk-1Tdk-1⋅dk-1T(gk-gk-1)]≤βkDY⋅gk-1Tdk-1<0.
From the above inequality (2.4) and (2.5), we obtain that the conclusion holds for k.
The conclusion of the following lemma, often called the Zoutendijk condition, is used to prove the global convergence properties of nonlinear conjugate gradient methods. It was originally given by Zoutendijk [15].
Lemma 2.4 (see [15]).
Suppose that Assumption 2.1 holds. Let the sequences {gk} and {dk} be generated by Algorithm 2.2, and let the step-size αk be determined by the Wolfe line search (1.5) and (1.8), and Lemma 2.3 holds. Then
∑k≥1(gkTdk)2‖dk‖2<+∞.
Lemma 2.5.
Let the sequences {gk} and {dk} be generated by Algorithm 2.2, and let the step-size αk be determined by the Wolfe line search (1.5) and (1.8), and Lemma 2.3 holds. Then
βk≤gkTdkgk-1Tdk-1.
Proof.
If gkTdk-1≤0, then from Lemma 2.3, we have βk=βkCD. From (1.11), (1.4), and (1.13), we have
gkTdk=-(1-gkTdk-1gk-1Tdk-1)⋅‖gk‖2+βkCD⋅gkTdk-1=βkCD⋅(gk-1Tdk-1-gkTdk-1)+βkCD⋅gkTdk-1=βkCD⋅gk-1Tdk-1.
From the above equation, we have
βkCD=gkTdkgk-1Tdk-1.
If gkTdk-1>0, from (2.5), we have
βkDY≤gkTdkgk-1Tdk-1.
From (1.12), (2.9), and (2.10), we obtain that the conclusion (2.7) holds.
3. Global Convergence Property
The following theorem proves the global convergence of the mixed spectral CD-DY conjugate gradient method with the Wolfe line search.
Theorem 3.1.
Suppose that Assumption 2.1 holds. Let the sequences {gk} and {dk} be generated by Algorithm 2.2, and let the step-size αk be determined by the Wolfe line search (1.5) and (1.8), and Lemma 2.3 holds. Then
liminfk→+∞‖gk‖=0.
Proof.
Suppose by contradiction that there exists a positive constant ρ>0, such that
‖gk‖≥ρ
holds for all k≥1.
From (1.11), we have dk+θkgk=βkdk-1, and by squaring it, we get
‖dk‖2=βk2‖dk-1‖2-2θkgkTdk-θk2‖gk‖2.
From the above equation and Lemma 2.5, we have
‖dk‖2≤(gkTdkgk-1Tdk-1)2⋅‖dk-1‖2-2θkgkTdk-θk2‖gk‖2.
Dividing the above inequality by (gkTdk)2, we have
‖dk‖2(gkTdk)2≤‖dk-1‖2(gk-1Tdk-1)2-2θkgkTdk-θk2⋅‖gk‖2(gkTdk)2=‖dk-1‖2(gk-1Tdk-1)2-(θk⋅‖gk‖gkTdk+1‖gk‖)2+1‖gk‖2≤‖dk-1‖2(gk-1Tdk-1)2+1‖gk‖2.
Using (3.5) recursively and noting that ∥d1∥2=-g1Td1=∥g1∥2, we get
‖dk‖2(gkTdk)2≤∑i=1k1‖gi‖2.
Then from (3.2) and (3.6), we have
(gkTdk)2‖dk‖2≥ρ2k,
which indicates
∑k≥11‖gk‖2=+∞.
The above equation contradicts the conclusion of Lemma 2.4. Therefore, the conclusion (3.1) holds.
4. Numerical Experiments
In this section, we report some numerical results. We used MATLAB 7.0 to test some problems that are from [16] and compare the performance of the mixed spectral CD-DY method (Algorithm 2.2) with the CD method, DY method, and SFR method. The global convergence of the CD method has not still been proved under the Wolfe line search, so our line search subroutine computes αk such that the strong Wolfe line search conditions hold with δ=0.01 and σ=0.1. We also use the condition ||gk||≤10-6 or It-max>9999 as the stopping criterion. It-max denotes the maximal number of iterations.
The numerical results of our tests are reported in the following table. The first column “Problem” represents the name of the tested problem in [16]. “Dim” denotes the dimension of the tested problem. The detailed numerical results are listed in the form NI/NF/NG, where NI, NF, and NG denote the number of iterations, function evaluations, and gradient evaluations respectively. If the limit of 9999 function evaluations was exceeded, the run was stopped; this is indicated by “—”.
In order to rank the average performance of all above methods, one can compute the total number of function and gradient evaluation by the formula
Ntotal=NF+l*NG,
where l is some integer. According to the results on automatic differentiation [17, 18], the value of l can be set to 5
Ntotal=NF+5*NG.
That is to say, one gradient evaluation is equivalent to five function evaluations if automatic differentiation is used.
By making use of (4.2), we compare the mixed spectral CD-DY method as follows: for the ith problem, compute the total number of function evaluations and gradient evaluations required by the CD method, the DY method, the SFR method, and the mixed spectral CD-DY method by formula (4.2), and denote them by Ntotal,i(CD), Ntotal,i(DY), Ntotal,i(SFR), and Ntotal,i(CD-DY); then calculate the ratios:γi(CD)=Ntotal,i(CD)Ntotal,i(CD-DY),γi(DY)=Ntotal,i(DY)Ntotal,i(CD-DY),γi(SFR)=Ntotal,i(SFR)Ntotal,i(CD-DY).
From Table 1, we know that some problems are not run by some methods. So, if the i0th problem is not run by the given method, we use a constant τ=max{γi(thegivenmethod)∣i∈S1} instead of γi0(thegivenmethod), where S1 denotes the set of test problems, which can be run by the given method.
The performance of the CD method, DY method, CD-DY method, and SFR method.
Problem
Dim
CD
DY
CD-DY
SFR
ROSE
2
88/250/223
64/190/172
60/188/168
64/190/172
FROTH
2
—
42/168/138
38/151/123
—
BADSCP
2
682/1885/1637
—
2704/6550/6438
—
BADSCB
2
272/941/800
—
726/2051/1786
—
BEALE
2
73/177/155
75/186/164
68/175/145
75/186/164
JENSAM
2(m=6)
17/61/43
10/49/26
26/80/57
15/48/32
HELIX
3
56/157/132
37/118/98
50/145/120
37/118/98
BRAD
3
75/224/189
66/208/177
37/120/98
66/208/177
SING
4
454/1074/1009
2286/4555/4545
850/1894/1863
1476/2901/2891
WOOD
4
184/438/399
100/291/240
139/396/337
100/291/240
KOWOSB
4
173/516/449
536/1449/1271
144/421/365
504/1386/1211
BD
4
43/169/132
39/158/121
28/144/113
37/152/116
WATSON
5
89/279/239
127/348/299
158/438/373
128/352/304
BIGGS
6
200/579/509
294/824/712
236/680/599
288/812/703
OSB2
11
3243/5413/5398
7006/11059/11048
584/1262/1226
7013/11102/11091
VAEDIM
5
6/57/38
6/57/38
6/57/38
6/57/38
10
7/81/52
7/81/52
7/81/52
7/81/52
PEN1
50
2209/2565/2536
1727/2117/2043
116/221/190
1727/2117/2043
100
62/223/182
31/157/121
31/167/131
31/157/121
TRIG
100
—
305/399/398
88/145/144
305/399/398
500
—
343/424/423
109/189/188
344/427/425
ROSEX
500
92/267/238
68/207/186
65/205/182
68/207/186
1000
98/287/255
68/207/186
65/205/182
68/207/186
SINGX
100
682/1593/1517
1488/3139/3073
1159/2366/2614
2411/5326/5084
1000
511/1245/1135
2092/4451/4321
1374/3104/3064
5213/10042/10032
BV
500
1950/2543/2542
4796/6823/6822
1311/2131/2130
4784/6793/6792
1000
632/833/832
414/449/448
414/449/448
414/449/448
IE
500
7/15/8
7/15/8
7/15/8
7/15/8
1000
7/15/8
7/15/8
7/15/8
7/15/8
TRID
500
52/112/107
49/106/101
43/94/89
49/106/101
1000
70/149/145
64/137/133
55/119/115
64/137/133
The geometric mean of these ratios for the CD method, the DY method and SFR method, over all the test problems is defined by
γ(CD)=(∏i∈Sγi(CD))1/|S|,γ(DY)=(∏i∈Sγi(DY))1/|S|,γ(SFR)=(∏i∈Sγi(SFR))1/|S|,
where Sdenotes the set of the test problems and |S| denotes the number of elements in S. One advantage of the above rule is that, the comparison is relative and hence is not be dominated a few problems for which the method requires a great deal of function evaluations and gradient functions.
According to the above rule, it is clear that γ(CD-DY)=1. From Table 2, we can see that average performance of the mixed spectral CD-DY conjugate gradient method (Algorithm 2.2) works the best. So, the mixed spectral CD-DY conjugate gradient method has some practical values.
Relative efficiency of the CD, DY, SFR, and the mixed spectral CD-DY methods.
CD
DY
SFR
CD-DY
1.3956
1.6092
1.6580
1
Acknowledgments
The authors wish to express their heartfelt thanks to the referees and the editor for their detailed and helpful suggestions for revising the paper. This work was supported by The Nature Science Foundation of Chongqing Education Committee (KJ091104).
YuG. H.2007Guangzhou, ChinaSun Yat-Sen UniversityZBL1230.68061YuanG.WeiZ.New line search methods for unconstrained optimization2009381293910.1016/j.jkss.2008.05.0042492376FletcherR.19872ndNew York, NY, USAJohn Wiley & SonsDaiY. H.YuanY.A nonlinear conjugate gradient method with a strong global convergence property199910117718210.1137/S10526234973189921740963ZBL0957.65061QuB.HuG. F.ZhangX. C.A global convergence result for the conjugate descent method200242113161884698DuX. W.YeL. Q.XuC. X.Global convergence of a class of unconstrained optimal methods include the conjugate descent method20011821201221851296ZBL1015.90078PanC. Y.ChenL. P.A class of efficient new descent methods200730188982339312ZBL1125.65053DaiY.YuanY.Convergence properties of the conjugate descent method19962565525621453164ZBL0871.49028DaiY. H.New properties of a nonlinear conjugate gradient method2001891839810.1007/PL000054641846765ZBL1006.65063DaiY. H.YuanY.A class of globally convergent conjugate gradient methods1998ICM-98-030Bejing, ChinaInstitute of Computational Mathematics and Scientific/Engineering Computing, Chinese Academy of SciencesDaiY.-H.Convergence of nonlinear conjugate gradient methods20011955395481851852ZBL1008.65039BirginE. G.MartinezJ. M.A spectral conjugate gradient method for unconstrained optimization2006180465210.1016/j.amc.2005.11.167ZhangL.ZhouW.LiD.Global convergence of a modified Fletcher-Reeves conjugate gradient method with Armijo-type line search2006104456157210.1007/s00211-006-0028-z2249678ZBL1103.65074DuS.-Q.ChenY.-Y.Global convergence of a modified spectral FR conjugate gradient method2008202276677010.1016/j.amc.2008.03.0202435711ZBL1154.65047ZoutendijkG.AbadieJ.Nonlinear programming, computational methods1970North-Holland37860437081ZBL0336.90057MoreJ. J.GarbowB. S.HillstromK. E.Testing unconstrained optimization software198171174110.1145/355934.355936607350ZBL0454.65049DaiY.-H.NiQ.Testing different conjugate gradient methods for large-scale unconstrained optimization20032133113201978635ZBL1041.65048GriewankA.IriM.TannabeK.On automatic differentiation1989Kluwer Academic Publishers84108