Conjugate gradient (CG) method is used to find the optimum solution for the large scale unconstrained optimization problems. Based on its simple algorithm, low memory requirement, and the speed of obtaining the solution, this method is widely used in many fields, such as engineering, computer science, and medical science. In this paper, we modified CG method to achieve the global convergence with various line searches. In addition, it passes the sufficient descent condition without any line search. The numerical computations under weak Wolfe-Powell line search shows that the efficiency of the new method is superior to other conventional methods.
1. Introduction
The nonlinear CG method is a useful tool to find the minimum value of function for unconstrained optimization problems. Let us consider the following form(1)minfx: x∈Rn,where f:Rn→R is continuously differentiable and its gradient is denoted by gx=∇fx. The method to find a sequence of points xk starting from initial point x0∈Rn is given by the iterative formula:(2)xk+1=xk+αkdk,k=0,1,…,where xk is the current iteration point and αk>0 is the step size obtained by some line search. The search direction dk is defined by(3)dk=-gk,k=0-gk+βkdk-1,k≥1,where gk=gxk and βk is known as the conjugate gradient coefficient.
Strong Wolfe-Powell (SWP) line search is the most popular inexact line search, which is depending on a reduction in function and decreasing the search area to find step length. In addition, it forces the step length to be closed to stationary point or local minimum of function, so it is useful method to find the step size.(4)fxk+αkdk≤fxk+δαkgkTdk,(5)gxk+αkdkTdk≤σgkTdk,where 0<δ<σ<1. In fact, SWP line search is modified from weak Wolfe-Powell (WWP), so we find that the step length satisfies (4) and (6)gxk+αkdkTdk≥σgkTdk.However, WWP line search may accept the step length far from stationary or local minimum of function. Dai [1] proposed two Armijo type line searches: the first one matches the global convergence for any βk≥0 using methods (2) and (3). By this line search, the global convergence for FR, nonnegative PRP, and CD methods have been established. To match the global convergence of original PRP method, he designed another line search proposed as follows.
Given a constant λ∈0,1, δ>0 and σ∈0,1 determine the smallest integer m≥0, if it defines αk=λm, then the vectors xk+1 and dk+1 given by (2) and (3) satisfy (4) with(7)0≠gxk+αkdkTdk+1≤-σdk+12,σ∈0,1,where δ∈0,1/2 and σ∈δ,1 are two constants.
The most popular formulas for βk are as follows: Hestenes-Stiefel (HS) [2], Fletcher-Reeves (FR) [3], Polak-Ribière-Polyak (PRP) [4], Conjugate Descent (CD) [5], Liu-Storey (LS) [6], Dai-Yuan (DY) [7], Wei et al. (WYL) [8], and Hager and Zhang (HZ) [9].(8)βkHS=gkTyk-1dk-1Tgk-gk-1,βkFR=gk2gk-12,βkPRP=gkTyk-1gk-12,βkCD=-gk2dk-1Tgk-1,βkLS=-gkTyk-1dk-1Tgk-1,βkDY=gk2dk-1Tgk-gk-1,βkWYL=gkTgk-gk/gk-1gk-1gk-12,βkHZ=maxβkN,ηk,where βkN=1/dkTykyk-2dkyk2/dkTykTgk, ηk=-1/dkminη,gk with yk=gk+1-gk and η>0 being a constant.
The global convergence of FR method with exact line search was achieved by Zoutendijk [10], Al-Baali [11] proved that FR method is globally convergent under strong Wolfe condition when σ<1/2, and later Liu et al. [12] extended the result to σ≤1/2. Its behavior on numerical computation is unpredictable. In few cases, it is as efficient as PRP method. However, generally, it is very slow. In addition, DY and CD have the same performance as FR method under exact line search with strong global convergence. Global convergence of PRP method for convex objective function under exact line search was proved by Polak and Ribière in 1969 [4]. Later, Powell gave out a counterexample showing that there exists nonconvex function, which PRP method does not converge globally, although the exact line search is used. Powell suggested the importance of achieving the global convergence of PRP method, and it should not be negative. Gilbert and Nocedal [13] proved that nonnegative PRP method is globally convergent with the Wolfe-Powel line search. HS method and LS method have the same performance as PRP with exact line search. Therefore, PRP method is the most efficient method when it is compared to the other conjugate gradient methods. For more, the reader can see the following references [14–19].
In 2006, Wei et al. [8] gave a new positive CG method, and it seems like original PRP method which has been studied in both exact line search and inexact line search, and many modifications have appeared, such as the following [20–23], respectively.
A little modification from βkWYL, Zhang [21] presented the following CG method: (9)βkNPRP=gk2-gk/gk-1gkTgk-1gk-12.In the same manner, βkDPRP construct the following CG by using the denominator of βkNPRP:(10)βkDPRP=gk2-gk/gk-1gkTgk-1wgkTdk-1+gk-12.In addition, βkMLS∗ is constructed by using the numerator of βkWYL:(11)βkMLS∗=gkTgk-gk/gk-1gk-1-gk-1Tdk-1+mgkTdk-1,where m≥0 and w≥1.
The descent condition plays important rule in CG method given by (12)gkTdk<0,k≥0.If we extend (12) to the following form, (13)gkTdk≤-cgk2,k≥0,c>0,then the search direction satisfies the sufficient descent condition.
In this paper, we will present the new formula and the algorithm in Section 2. Furthermore, we will establish the global convergence of our method with several line searches in Section 3. Numerical results with conclusion will be presented in Sections 4 and 5, respectively.
2. The Modified Formula
In this section, βkHZ∗ is presented which is extended to βkMLS∗ and βkNPRP method; that is,(14)βkHZ∗=gk2-gk/gk-1gkTgk-1-gk-1Tdk-1+θgkTdk-1,where · means the Euclidean norm, and θ>1.
Algorithm 1.
Step 1 (initialization). Given x0, set k=0.
Step 2. Compute βk based on (14).
Step 3. Compute dk based on (3). If gk=0, then stop.
Step 4. Compute αk based on some line search; we use in numerical section WWP line search with σ=0.1 and δ=0.001.
Step 5. Update new point based on (2).
Step 6. Convergent test and stopping criteria: if fxk<fxk+1 and gk≤10-6 then stop; otherwise, go to Step 1 with k=k+1.
3. The Global Convergence Analysis for βkHZ∗ Method
The following assumption is needed to be used in following theorems.
Assumption 2.
(I) fx is bounded from below on the level set Ω=x∈Rn:fx≤fx1, where x1 is the starting point.
(II) In some neighborhood N of Ω, f is continuous and differentiable, and its gradient is Lipschitz continuous; that is, for any x,y∈N, there exists a constant L>0 such that gx-gy≤Lx-y.
Lemma 3.
Let Assumption 2 hold. Consider any method in form (2), (3), and αk satisfies the WWP line search (4) and (6), in which the search direction is descent. Then, the following condition holds:(15)∑k=0∞gkTdk2dk2<∞.Substituting (13) into (15), it follows that(16)∑k=0∞gk4dk2<∞.
3.1. The Sufficient Descent Condition with Convergence Properties for SWP Line SearchTheorem 4.
Let sequences gk and dk be generated by methods (2), (3), and (14); then (13) holds, where c∈0,1.
Proof.
We use proof by induction. From (3), we know that for k=0 it is hold. Suppose that it is true until k-1; that is, (17)gk-1Tdk-1≤-cgk-12;then(18)1-gk-1Tdk-1≤1cgk-12.Now multiply (3) by gkT:(19)gkTdk=gkT-gk+βkdk-1=-gk2+βkgkTdk-1≤-gk2+gkTdk-1gk2θgkTdk-1=-1-1θgk2,where θ>1. Take c=1-1/θ and complete the proof.
3.2. Global Convergence under WWP Line Search
Gilbert and Nocedal [13] present an important theorem to find the global convergence for a nonnegative part of PRP method; it is summarized by Theorem 5. In addition, [13] presents a nice property called Property ∗, which plays strong roles in studies of CG methods.
Property ∗. Consider a method of form (1) and (2), and suppose 0<γ≤gk≤γ-; we say that the method possesses Property ∗ if there exists constant b>1 and λ>0, where for all k≥1, and we get βk≤b, and if xk-xk-1≤λ, then βk≤1/2b.
Theorem 5 (see [13]).
Consider that any CG method of form (2) and (3) achieves the following conditions that hold:
βk≥0
The sufficient descent condition (13)
Zoutendijk condition
Property ∗
Assumption 2
Then the iterates are globally convergent.
Lemma 6.
Suppose that Assumption 2 holds with Algorithm 1; then βkHZ∗satisfy Property ∗.
Proof.
Since βkHZ∗≤βkMLS∗ and since βkMLS∗satisfies Property ∗, βkHZ∗ also achieves Property ∗; for more we suggest that the reader reads Lemma 3.6 [24]. The proof is completed.
The following corollary is a result from Theorem 5 and Lemma 3.
Corollary 7.
Let sequences xk be generated by Algorithm 1. If Assumption 2 holds true, then any line search satisfies Zoutendijk condition; we have liminfk→∞gk=0.
3.3. Global Convergence Properties for Armijo Type Line SearchTheorem 8.
Suppose Assumption 2 is true. Consider the methods of form (2) and (3) with βkHZ∗, and αk is obtained by (4) and (7). Then we have liminfk→∞gk=0.
Proof.
By using Lemma 2.8 in [1], we achieve (20)αk>c,c∈0,1.Using (2) and (7), then (21)dk≤σ-1gk.From (2), (4), (7), and (20), we have (22)limk→∞dk=0.From Assumption 2 and (21), we obtain (23)gk+1≤1+Lσgk.From (3), (24)gk+1≤dk+1+βk+1dk.Using (23), (13), (14), and (24), then(25)gk+1≤dk+1+1+L/σ2cdk,where c∈0,1. Take the limit and use (22), and then we have liminfk→∞gk=0. The proof is completed.
4. Numerical Results and Discussions
To analyze the efficiency of the new method, we selected some of the test functions in Table 1 from CUTEr [25], Andrei [26], and Adorio and Diliman [24]. We performed a comparison with other CG methods, including NPRP and DPRP methods using weak Wolfe-Powell line search with δ=0.001. The tolerance ε is selected to 10-6 for all algorithms to investigate the rapidity of the iteration methods towards the optimal. The gradient value is taken as the stopping criteria. Here, the stopping criteria considered gk≤10-6. Since the parameters NPRP and DPRP are tested based on weak Wolfe-Powell line search, the modified parameters HZ∗ are tested based on weak Wolfe line search with values of σ=0.1 and δ=0.001. In addition, the values of θ=2 and w=2 are for HZ∗ and DPRP parameters, respectively.
The test functions.
Number
Function
Dimension/s
1
EXTENDED WHITE & HOlST
500, 1000, 5000, 10000
2
EXTENDED ROENBROCK
500, 1000, 5000, 10000
3
EXTENDED BEALE
500, 1000, 5000, 10000
4
EXTENDED HIMMELBLAU
500, 1000, 5000, 10000
5
EXTENDED DENSCHNB
500, 1000, 5000, 10000
6
SIX HUMP
2
7
THREE HUMP
2
8
BOOTH
2
9
SHALLOW
500, 1000, 5000, 10000
10
DIXMAANA
1500, 3000, 6000, 9000
11
DIXMAANB
1500, 3000, 6000, 9000
12
NONDIA (Shanno-78)
500, 1000, 5000, 1000
13
DQDRTIC
500, 1000, 5000, 10000
14
RAYDAN 1
500, 1000, 5000, 10000
15
EXTENDED TRIDIAGONAL 1
500, 1000, 5000, 1000
16
GENERALIZED QUARTIC GQ1
500, 1000, 5000, 10000
17
DIAGONAL4
500, 1000, 5000, 10000
18
EXTENDED POWELL
4
19
PERTURBED QUADRATIC
500, 1000, 5000
20
EXTENDED CLIFF
10, 20, 30, 40
21
A QUADRATIC FUNCTION QF2
500, 1000, 5000, 10000
22
DIAGONAL 2
500, 1000, 5000, 10000
23
SUM SQUARES
500, 1000, 5000, 10000
24
ZETTL
2
25
DIXMAANC
1500, 3000, 6000, 9000
26
NONDIA
500, 1000, 5000, 10000
We used Matlab 7.9 subroutine program, with CPU processor Intel (R) Core (TM), i3 CPU, and 2 GB DDR2 RAM under strong Wolfe line search. The performance results are shown in Figures 1 and 2, respectively, using a performance profile introduced by Dolan and Moré [27]. This performance measure was introduced to compare a set of solvers S on a set of problems ρ. Assuming ns solvers and np problems in S and ρ, respectively, the measure tp,s is defined as the computation time (e.g., the number of iterations or the CPU time) required for solver s to solve problem p.
Performance profile based on the CPU time with weak Wolfe-Powell line search.
Performance profile based on the number of iterations with the weak Wolfe-Powell line search.
To create a baseline for comparison, the performance of solver s on problem p is scaled by the best performance of any solver in S on the problem using the ratio:(26)rp,s=tp,smintp,s:s∈S.Let the parameter rM≥rp,s for all p,s be selected, and further assume that rp,s=rM if and only if the solver s does not solve problem p. As we would like to obtain an overall assessment of the performance of a solver, we defined the measure:(27)Pst=1npsizep∈ρ:rp,s≤t.Thus, Pst is the probability for solver s∈S that the performance ratio rp,s is within a factor t∈R of the best possible ratio. If we define the function ps as the cumulative distribution function for the performance ratio, then the performance measure ps:R→0,1 for a solver is nondecreasing and piecewise continuous function from the right. The value of ps1 is the probability that the solver achieves the best performance of all of the solvers. In general, a solver with high values of Pst, which would appear in the upper right corner of the figure, is preferable.
It is clear that HZ∗ parameter is strong competitive with NPRP parameter and slightly better in some cases for all graphs in Figures 1, 2, 3, and 4 which include the number of iterations, CPU times, gradient evaluations, and function evaluations. On the other hand, it is clear that HZ∗ parameter outperforms DPRP parameter in all performance profiles.
Performance profile based on the number of gradient evaluations with weak Wolfe-Powell line search.
Performance profile based on the function evolutions with weak Wolfe-Powell line search.
5. Conclusion
In this paper, we proposed a new modification of conjugate gradient method extended from NPRP methods. Our numerical results had shown that the new coefficient is comparable compared to other conventional CG methods. This method converges globally with several line searches with descent direction. However, in future, we will focus on speed using hybrid methods. Additionally, we will try to compare several line searches with modern CG method.
Competing Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
DaiY.-H.Conjugate gradient methods with Armijo-type line searches200218112313010.1007/s102550200010MR2010899HestenesM. R.StiefelE.195249Washington, DC, USANational Bureau of StandardsFletcherR.ReevesC. M.Function minimization by conjugate gradients19647214915410.1093/comjnl/7.2.149MR0187375PolakE.RibièreG.Note sur la convergence de méthodes de directions conjuguées19693163543FletcherR.20012ndNew York, NY, USAWiley-Interscience John Wiley & SonsMR1867781LiuY.StoreyC.Efficient generalized conjugate gradient algorithms. I. Theory199169112913710.1007/BF00940464MR1104590ZBL0702.900772-s2.0-0026142704DaiY. H.YuanY.2000Shanghai, ChinaShanghai Science and Technology PublisherWeiZ.YaoS.LiuL.The convergence properties of some new conjugate gradient methods200618321341135010.1016/j.amc.2006.05.150MR2294093ZBL1116.650732-s2.0-33845762927HagerW. W.ZhangH.A new conjugate gradient method with guaranteed descent and an efficient line search200516117019210.1137/030601880MR2177774ZBL1093.900852-s2.0-33144465578ZoutendijkG.Nonlinear programming, computational methods19703786MR0437081Al-BaaliM.Descent property and global convergence of the Fletcher-Reeves method with inexact line search19855112112410.1093/imanum/5.1.121MR777963ZBL0578.650632-s2.0-77957214026LiuG. H.HanJ. Y.YinH. X.Global convergence of the Fletcher-Reeves algorithm with inexact linesearch1995101758210.1007/bf02663897MR13359682-s2.0-51649138005GilbertJ. C.NocedalJ.Global convergence properties of conjugate gradient methods for optimization199221214210.1137/0802003MR1147881ZBL0767.90082AlhawaratA.MamatM.RivaieM.MohdI.A new modification of nonlinear conjugate gradient coefficients with global convergence properties, World Academy of Science, Engineering and Technology, International Science Index 852014815460AlhawaratA.MamatM.RivaieM.SallehZ.An efficient hybrid conjugate gradient method with the strong Wolfe-Powell line search20152015710351710.1155/2015/1035172-s2.0-84938629802SallehZ.AlhawaratA.An efficient modification of the Hestenes-Stiefel nonlinear conjugate gradient method with restart property201620161, article no. 11010.1186/s13660-016-1049-52-s2.0-84962860516AlhawaratA.SallehZ.MamatM.RivaieM.An efficient modified Polak–Ribière–Polyak conjugate gradient method with global convergence properties201611410.1080/10556788.2016.1266354Al-BaaliM.NarushimaY.YabeH.A family of three-term conjugate gradient methods with sufficient descent property for unconstrained optimization20156018911010.1007/s10589-014-9662-zMR32978902-s2.0-84921700026HagerW. W.ZhangH.The limited memory conjugate gradient method20132342150216810.1137/120898097MR3123830ZBL1298.901292-s2.0-84892874375ShengweiY.WeiZ.HuangH.A note about WYL's conjugate gradient method and its applications2007191238138810.1016/j.amc.2007.02.094MR23855392-s2.0-34548124681ZhangL.An improved Wei-Yao-Liu nonlinear conjugate gradient method for optimization computation200921562269227410.1016/j.amc.2009.08.016MR25571132-s2.0-70349994127DaiZ.WenF.Another improved Wei-Yao-Liu nonlinear conjugate gradient method with sufficient descent property2012218147421743010.1016/j.amc.2011.12.091MR2892710ZBL1254.650742-s2.0-84857447941HuangH.LinS.A modified Wei–Yao–Liu conjugate gradient method for unconstrained optimization201423117918610.1016/j.amc.2014.01.012MR31740232-s2.0-84893170814AdorioE. P.DilimanU. P.Mvf-multivariate test functions library in C for unconstrained global optimization2005BongartzI.ConnA. R.GouldN.TointP. L.CUTE: constrained and unconstrained testing environment199521112316010.1145/200979.2010432-s2.0-0000024679AndreiN.An unconstrained optimization test functions collection2008101147161MR2424936DolanE. D.MoréJ. J.Benchmarking optimization software with performance profiles200291220121310.1007/s101070100263MR18755152-s2.0-28244496090