Conjugate gradient method is one of the most effective algorithms for solving unconstrained optimization problem. In this paper, a modified conjugate gradient method is presented and analyzed which is a hybridization of known LS and CD conjugate gradient algorithms. Under some mild conditions, the Wolfe-type line search can guarantee the global convergence of the LS-CD method. The numerical results show that the algorithm is efficient.
1. Introduction
Consider the following nonlinear programs:
(1)minx∈Rnf(x),
where Rn denotes an n-dimensional Euclidean space and f(x) is continuously differentiable function.
As you know, conjugate gradient method is a line search method that takes the following form:
(2)xk+1=xk+αkdk,k=0,1,2…,
where dk is a descent direction off(x) at xkand is a stepsize obtained by some one-dimensional line search. If xk is the current iterate, we denote f(xk)≜fk, ∇f(xk)≜gk, and ∇2f(xk)≜Gk, respectively. If Gk is available and inverse, then dk=-Gk-1gk leads to the Newton method and dk=-gk results in the steepest descent method [1]. The search direction dk is generally required to satisfy gkTdk≤0, which guarantees that dk is a descent direction of f(x) at xk [2]. In order to guarantee the global convergence, we sometimes require dk to satisfy a sufficient descent condition as follows:
(3)gkTdk≤-c∥gk∥2,
where c>0 is a constant and ∥·∥ is the Euclidean norm. In line search methods, the well-known conjugate gradient method has the following form:
(4)dk={-gk,ifk=0-gk+βkdk-1,ifk≥1.
Different conjugate gradient algorithms correspond to different choices for the parameter βk, where βk can be defined by
(5)βkFR=∥gk∥2∥gk-1∥2,βkPRP=gkT(gk-gk-1)∥gk-1∥2,βkDY=∥gk∥2d(k-1)T(gk-1-gk),βkLS=-gkT(gk-gk-1)dk-1Tgk-1,βkCD=-∥gk∥2dk-1Tgk-1,βkHS=-gkT(gk-1-gk)dk-1T(gk-1-gk),
or by other formulae. The corresponding methods are called FR (Fletcher-Reeves) [3], PRP (Polak-Ribiére-Polyak) [4, 5], DY (Dai-Yuan) [6], CD (conjugate descent [7]), LS (Liu-Storey [8]), and HS (Hestenes-Stiefel [9]) conjugate gradient method, respectively.
Although the above mentioned conjugate gradient algorithms are equivalent to each other for minimizing strong convex quadratic functions under exact line search, they have different performance when using them to minimize nonquadratic functions or when using inexact line searches. For general objective function, the FR, DY, and CD methods have strong convergence properties, but they may have modest practical performance due to jamming. On the other hand, the methods of PRP, LS, and HS in general may not be convergent, but they often have better computational performance.
Touati-Ahmed and Storey [10] have given the first hybrid conjugate algorithm; the method is combinations of different conjugate gradient algorithms; mainly it is being proposed to avoid the jamming phenomenon. Recently, some kinds of new hybrid conjugate gradient methods are given in [11–17]. Based on the new method, we focus on hybrid conjugate gradient methods and analyze the global convergence of the methods with Wolfe-type line search.
The rest of this paper is organized as follows. The algorithm is presented in Section 2. In Section 3 the global convergence is analyzed. We give the numerical experiments in Section 4.
2. Description of AlgorithmAlgorithm 1.
Step 0. Initialization:
given a starting point x0∈Rn, choose parameters
(6)0<ε≪1,0<δ<12,δ<σ<1,d0=-g0.
Set k=0.
Step 1. If ∥gk∥<ε, stop; else go to Step 2.
Step 2. Compute step size αk, such that
(7)f(xk)-f(xk+αkdk)≥-δαkgkTdk,σgkTdk≤g(xk+αkdk)Tdk≤0.
Step 3. Let xk+1=xk+αkdk; if ∥gk+1∥<ε, stop; otherwise, go to Step 4.
Step 4. Compute the search direction
(8)dk+1=-gk+1+βk+1LS-CDdk,
where βkLS-CD=max{0,min{βkLS,βkCD}}.
Step 5. Let k:=k+1, and go to Step 2.
Throughout this paper, the following basic assumptions on the objective function are assumed, which have been widely used in the literature to analyze the global convergence of the conjugate gradient methods.
The objective function f(x) is continuously differentiable and has a lower bound on the level set L0={x∈Rn∣f(x)≤f(x0)}, where x0 is the starting point.
The gradient g(x) of f(x) is Lipschitz continuous in some neighborhood U of L0; that is, there exists a constant L>0, such that
(9)∥g(x)-g(y)∥≤L∥x-y∥,∀x,y∈U.
Since {f(xk)} is decreasing, it is clear that the sequence {xk} generated by Algorithm 1 is contained in L0.
3. Global Convergence of Algorithm
Now we analyze the global convergence of Algorithm 1.
Lemma 2.
Suppose that assumptions (H2.1) and (H2.2) hold; the sequences {gk} and {dk} are to be generated by Algorithm 1, if gk≠0 for all k≥0; then
(10)gkTdk<0.
Proof.
If k=0, then d0=-go, and we get
(11)g0Td0=-∥g0∥2<0.
When k≥1, multiplying gkT by
(12)dk=-gk+βkLS-CDdk-1,
we obtain
(13)gkTdk=-∥gk∥2+βkLS-CDgkTdk-1.
It follows from βkLS-CD≥0 and gkTdk-1≤0 that
(14)gkTdk≤-∥gk∥2<0.
Therefore, the result is true.
Lemma 3.
Suppose that assumptions (H2.1) and (H2.2) hold, and consider any iteration of the form (2), where dk is a descent direction and αk satisfies the Wolfe conditions (7). Then, the Zoutendijk condition
(15)∑k=0∞(gkTdk)2∥dk∥2<+∞
holds.
Proof.
From (7), we have
(16)(σ-1)gkTdk≤(gk+1-gk)Tdk.
In addition, the assumption (H2.2) gives
(17)Lαk∥dk∥2≥(gk+1-gk)Tdk.
Combing these two relations, we have
(18)αk≥(σ-1)∥gkTdk∥L∥dk∥2,
which with f(xk+αkdk)≤f(xk)+δαkgkTdk implies that
(19)fk-fk+1≥δ(1-σ)(gkTdk)2L∥dk∥2.
Thus,
(20)∑k=0∞fk-fk+1≥∑k=0∞δ(1-σ)(gkTdk)2L∥dk∥2.
Noting that f is bounded below, (15) holds.
Furthermore, from Lemma 2 and (3), we can easily obtain the following condition:
(21)∑k=0∞∥gk∥4∥dk∥2<+∞.
Theorem 4.
Suppose that x0 is a starting point for which assumptions (H2.1) and (H2.2) hold. Consider Algorithm 1; then, one has either gk=0 for some finite k or
(22)limk→∞inf∥gk∥=0.
Proof.
The first statement is easy to show, since the only stopping point is in Step 3. Thus, assume that the algorithm generates an infinite sequence {gk}; if the statement is false, there exists a constant ε>0, such that
(23)∥gk∥≥ε,∀k≥0.
From (8), we have
(24)dk=-gk+βkLS-CDdk-1.
Squaring both sides of the above equation, we get
(25)∥dk∥2+2gkTdk+∥gk∥2=(βkLS-CD)2∥dk-1∥2,
that is,
(26)∥dk∥2=(βkLS-CD)2∥dk-1∥2-2gkTdk-∥gk∥2.
From the definitions of βkLS, and βkCD, we have
(27)0≤βkLS-CD≤βkCD.
Thus, we can get
(28)∥dk∥2≤(βkCD)2∥dk-1∥2-2gkTdk-∥gk∥2.
On the other hand, multiplying gkT by
(29)dk=-gk+βkLS-CDdk-1,
we obtain
(30)gkTdk=-∥gk∥2+βkLS-CDgkTdk-1.
Considering thatβkLS-CD≥0 and gkTdk-1≤0, we have
(31)gkTdk≤-∥gk∥2<0,
which indicates that
(32)(gkTdk)2≥∥gk∥4.
Dividing the above inequality (28) by (gkTdk)2, we obtain
(33)∥dk∥2(gkTdk)2≤(βkCD)2∥dk-1∥2(gkTdk)2-2gkTdk(gkTdk)2-∥gk∥2(gkTdk)2=(-∥gk∥2/dk-1Tgk-1)2∥dk-1∥2(gkTdk)2-2gkTdk(gkTdk)2-∥gk∥2(gkTdk)2≤(-∥gk∥2/dk-1Tgk-1)2∥dk-1∥2∥gk∥4-2gkTdk∥gk∥4-∥gk∥2∥gk∥4≤∥dk-1∥2(dk-1Tgk-1)2+2∥gk∥2-1∥gk∥2=∥dk-1∥2(dk-1Tgk-1)2+1∥gk∥2.
Using the above inequality recursively and noting that
(34)∥d0∥2=-g0Td0=∥g0∥2,
we have
(35)∥dk∥2(gkTdk)2≤∑l=0k-11∥gl∥2.
Then, from (23) and (35), it holds that
(36)(gkTdk)2∥dk∥2≥ε2k.
Thus, it is easy to obtain
(37)∑k=0∞(gkTdk)2∥dk∥2=+∞.
This contradicts the Zoutendijk condition (15). Therefore, the conclusion holds.
4. Numerical Experiments
In this section, we give the numerical results of Algorithm 1 to show that the method is efficient for unconstrained optimization problems. We set the parameters δ=0.3, and δ=0.7 and use MATLAB 7.0 to test the chosen problems on a PC with 2.10 GHz CPU processor, 1.0 GB RAM memory, and Linux operation system. We also use the condition ∥gk∥≤10-6 or It-max > 5000 as the stopping criterion (It-max denotes the maximal number of iterations). When the limit of 5000 function evaluations was exceeded, the run was stopped, which is indicated by “NaN.” The problems that we tested are from [17, 19].
Prob 1 f(x)=(x1+3x2+x3)2+4(x1-x2)2,
Prob 2 f(x)=sin(x1+x2)+(x1-x2)2,
Prob 3 f(x)=ln(1+x12)+x22+x2,
Prob 4 f(x)=sin(πx1/12)cos(πx2/16),
Prob 5 f(x)=ex1+x12+2x1x2+4x22,
Prob 6 f(x)=x12+x22+2x32+x42-5(x1+x2)-21x3+7x4.
Tables 1, 2, and 3 show the computation results.
Test results for CD algorithm.
Prob
x0
xk
f*
NI
1
(4, 5, 10.1)
(NaN, NaN, NaN)
NaN
5000
2
(6.5, 23)
(14.92256489843370, 14.92256511253481)
−0.99999999999993
18
3
(7, 11)
(0.00000007501213, −0.49999972373559)
−0.24999999999992
16
4
(2.5, 11.9)
(5.99999999984147, 15.99997571516633)
−0.99999999998863
160
5
(7, 9.8)
(NaN, NaN)
NaN
5000
6
(−3, −1, −3, −1)
(NaN, NaN, NaN, NaN)
NaN
5000
x0: the initial point; xk: the final point; f*: the final value of the objective function; NI: the number of times of iteration for each problem.
x0: the initial point; xk: the final point; f*: the final value of the objective function; NI: the number of times of iteration for each problem.
Because conjugate gradient algorithms are devised for solving large-scale unconstrained optimization problems, we chose some large-scale problems from [18] and compared the performance of the hybrid LS-CD method (Algorithm 1 in Section 2) with the LS method and CD method.
From Tables 1, 2, 3, and 4, we see that the performance of Algorithm 1 is better than that of the CD and the LS methods for some problems. Therefore, our numerical experiments show that the algorithm is efficient.
The performance of the LS method, CD method, and LS-CD method.
Prob
Dim
LS
CD
LS-CD
NI/NF/NG
NI/NF/NG
NI/NF/NG
PEN 1
100
51/142/92
62/223/182
51/168/125
1000
33/125/83
52/181/165
33/164/117
10000
21/118/72
31/157/121
21/132/102
TRIG
100
305/399/398
NaN
305/399/398
500
343/424/423
NaN
343/424/423
ROSEX
500
52/112/107
92/267/238
50/186/157
1000
70/149/145
98/287/255
70/246/183
Prob: the test problem name from [18]; Dim: the problem dimension; NI: the iterations number; NF: the function evaluations number; NG: the gradient evaluations number.
Acknowledgments
The authors would like to thank the anonymous referee for the careful reading and helpful comments and suggestions that led to an improved version of this paper. This work was supported in part by the Foundation of Hunan Provincial Education Department under Grant (nos. 12A077 and 13C453) and the Educational Reform Research Fund of Hunan University of Humanities, Science, and Technology (no. RKJGY1320).
NocedalJ.WrightJ. S.1999New York, NY, USASpringerxxii+63610.1007/b98874MR1713114YuanY.1993Shanghai, ChinaShanghai Scientific & Technical PublishersFletcherR.ReevesC.Function minimization by conjugate gradients19647149154MR018737510.1093/comjnl/7.2.149ZBL0132.11701PolakE.RibiéreG.Note sur la convergence de méthodes de directions conjuguées1969163543MR0255025ZBL0174.48001PolyakB. T.The conjugate gradient method in extreme problems196999411210.1016/0041-5553(69)90035-4DaiY. H.YuanY.A nonlinear conjugate gradient method with a strong global convergence property199910117718210.1137/S1052623497318992MR1740963ZBL0957.65061FletcherR.Unconstrained Optimization198712ndpart 1New York, NY, USAJohn Wiley & SonsMR955799LiuY.StoreyC.Efficient generalized conjugate gradient algorithms. I. Theory199169112913710.1007/BF00940464MR1104590ZBL0702.90077HestenesM. R.StiefelE.Method of conjugate gradient for solving linear systems19524940943610.6028/jres.049.044Touati-AhmedD.StoreyC.Efficient hybrid conjugate gradient techniques199064237939710.1007/BF00939455MR1042002ZBL0666.90063DaiY. H.YuanY.An efficient hybrid conjugate gradient method for unconstrained optimization2001103334710.1023/A:1012930416777MR1868442ZBL1007.90065AndreiN.A scaled BFGS preconditioned conjugate gradient algorithm for unconstrained optimization200720664565010.1016/j.aml.2006.06.015MR2314407ZBL1116.90114AndreiN.A hybrid conjugate gradient algorithm for unconstrained optimization as a convex combination of Hestenes-Stiefel and Dai-Yuan20081745570MR2422329DaiY.-H.KouC.-X.A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search201323129632010.1137/100813026MR3033109ZBL1266.49065Babaie-KafakiS.Mahdavi-AmiriN.Two modified hybrid conjugate gradient methods based on a hybrid secant equation2013181325210.3846/13926292.2013.756832MR3032464ZBL1264.49034JiaW.ZongJ. H.WangX. D.An improved mixed conjugate gradient method2012421922510.1016/j.sepro.2011.11.069SunM.LiuJ.A new conjugate method and its global convergence2013817580MoréJ. J.GarbowB. S.HillstromK. E.Testing unconstrained optimization software198171174110.1145/355934.355936MR607350ZBL0454.65049HockW.SchittkowskiK.Test examples for nonlinear programming codes1981301127129ZBL0452.90038