MPEMathematical Problems in Engineering1563-51471024-123XHindawi Publishing Corporation87509710.1155/2009/875097875097Research ArticleA Truncated Descent HS Conjugate Gradient Method and Its Global ConvergenceChengWanyou1ZhangZongguo2PavlovskaiaEkaterina1College of SoftwareDongguan University of TechnologyDongguan 523000Chinadgut.edu.cn2College of Mathematics and PhysicsShandong Institute of Light IndustryJinan 250353Chinasdili.edu.cn20091072009200902122008080420092009Copyright © 2009This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Recently, Zhang (2006) proposed a three-term modified HS (TTHS) method for unconstrained optimization problems. An attractive property of the TTHS method is that the direction generated by the method is always descent. This property is independent of the line search used. In order to obtain the global convergence of the TTHS method, Zhang proposed a truncated TTHS method. A drawback is that the numerical performance of the truncated TTHS method is not ideal. In this paper, we prove that the TTHS method with standard Armijo line search is globally convergent for uniformly convex problems. Moreover, we propose a new truncated TTHS method. Under suitable conditions, global convergence is obtained for the proposed method. Extensive numerical experiment show that the proposed method is very efficient for the test problems from the CUTE Library.

1. Introduction

Consider the unconstrained optimization problem: minf(x),xRn, where f is continuously differentiable. Conjugate gradient methods are very important methods for solving (1.1), especially if the dimension n is large. The methods are of the form xk+1=xk+αkdk,k=0,1,,dk={-gk,ifk=0,-gk+βkdk-1,ifk>0, where gk denotes the gradient of f at xk,αk is the step length obtained by a line search and βk is a scalar. The strong Wolfe line search is to find a step length αk such that f(xk+αkdk)f(xk)+δαkgkTdk,|g(xk+αkdk)Tdk|-σgkTdk, where δ(0,1/2) and σ(δ,1). In the conjugate gradient methods field, it is also possible to use the Wolfe line search [1, 2], which calculates an αk satisfying (1.4) and g(xk+αkdk)TdkσgkTdk. In particular, some conjugate gradient methods admit to use the Armijo line search, namely, the step length αk can be obtained by letting αk=max{βρj,j=0,1,2,} satisfy f(xk+βρjdk)f(xk)+δ1βρjgkTdk, where 0<β1,0<ρ<1, and 0<δ1<1. Varieties of this method differ in the way of selecting βk. In this paper, we are interested in the HS method , namely, βkHS=gkTyk-1dk-1Tyk-1. Here and throughout the paper, without specification, we always use · to denote the Euclidian norm of vectors, yk-1=gk-gk-1 and sk=αkdk.

We refer to a book  and a recent review paper  about progress of the global convergence of conjugate gradient methods. We know that the study in the HS method has made great progress. In practical computation, the HS method is generally believed to be one of the most efficient conjugate gradient methods. Theoretically, the HS method has the property that the conjugacy condition dkTyk-1=0, always holds, which is independent of line search used. Expecting the fast convergence of the method, Dai and Liao  modified the numerator of the HS method to obtain DL method by using the secant condition of quasi-Newton methods. Due to Powell's  example, the DL method may not converge with exact line search for general function. Similar to the PRP+ method , Dai and Liao  proposed the DL+ method from a view of global convergence. In a further development of this update strategy, Yabe and Takano  used another modified secant condition in [10, 11] and proposed the YT and YT+ methods. Recently, Hager and Zhang  modified the HS method to propose a new conjugate gradient method called CG_DESCENT method. A good property of the CG_DESCENT method lies in that the direction dk satisfies sufficient descent property gkTdk-(7/8)gk2 which is independent of the line search used. Hager and Zhang  proved that the CG_DESCENT method with Wolfe line search is globally convergent even for nonconvex problems. Zhang  proposed the TTHS method. The sufficient descent property of the TTHS method is also independent of line search used. In order to obtain the global convergence of the TTHS method, Zhang truncated the search direction of the TTHS method. Numerical experiments in  show the truncated TTHS method is not very effective. In this paper, we further study the TTHS method. We prove that the TTHS method with standard Armijo line search is globally convergent for uniformly convex problems. To improve the efficiency of the truncated TTHS method, we propose a new truncated strategy to the TTHS method. Under suitable conditions, global convergence is obtained for the proposed method. Numerical experiments show that the proposed method outperforms the known CG_DESCENT method.

The paper is organized as follows. In Section 2, we propose our algorithm. Convergence analysis is provided under suitable conditions. Preliminary numerical results are presented in Section 3.

2. Global Convergence Analysis

Recently, Zhang  proposed a three-term modified HS method as follows dk={-gk,ifk=0,-gk+βkHSdk-1-θkyk-1,ifk>0, where θk=gkTdk-1/dk-1Tyk-1. An attractive property of the TTHS method is that the direction always satisfies gkTdk=-gk2, which is independent of the line search used. In order to obtain the global convergence of the TTHS method, Zhang truncated the TTHS method as follows dk={-gk,ifskTyk<ε1gkrskTsk,-gk+βkHSdk-1-θkyk-1,ifskTykε1gkrskTsk, where ε1 and r are positive constants. Zhang proved that the truncated TTHS method converges globally with the Wolfe line search (1.4) and (1.6). However, numerical results show the truncated TTHS method is not very effective. In this paper, we will study the TTHS method again. In the rest of this section, we will establish two preliminary convergent results for the TTHS method.

Uniformly convex functions: converge globally with the standard Armijo line search (1.7).

General functions: converge globally with the strong Wolfe line search (1.4) and (1.5) by using a new truncated strategy to the TTHS method.

In order to establish the global convergence of our method, we need the following assumption.

Assumption 2.1.

(i) The level set Ω={xRnf(x)f(x0)} is bounded.

(ii) In some neighborhood N of Ω,f is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant L>0 such that g(x)-g(y)Lx-y,x,yN.

Under Assumption 2.1, It is clear that there exist positive constants B and γ such that x-yBx,yΩ,g(x)γxΩ.

Lemma 2.2.

Suppose that Assumption 2.1 holds. Consider {xk} be generated by the TTHS method, where αk is obtained by the Armijo line search (1.7), one has k=0gk4dk2<.

Proof.

If αk=β, then δ1gk2=-δ1gkTdk1β[f(xk)-f(xk+1)]. Combining with gkTdk2gk2dk2, yields gk4dk2gk21βδ1(f(xk)-f(xk+1)). On the other hand, if αkβ, by the line search rule, then ρ-1αk does not satisfy (1.7). This implies f(xk+ρ-1αkdk)>f(xk)+δ1ρ-1αkgkTdk. By the mean-value theorem, there exists μk(0,1) such that f(xk+ρ-1αkdk)=f(xk)+ρ-1αkg(xk+μkρ-1αkdk)Tdk. This together with (2.11) implies (g(xk+μkρ-1αkdk)-gk)Tdk-(1-δ1)gkTdk. Since g is Lipschitz continuous, the last inequality shows αk-(1-δ1)ρgkTdkLdk2=(1-δ1)ρgk2Ldk2. That is f(xk+1)-f(xk)δ1αkgkTdk=-(1-δ1)δ1ρLgk4dk2. This implies that there is a constant M1>0 such that gk4dk2M1(f(xk)-f(xk+1)). Inequality (2.10) together with (2.16) shows that gk4dk2M2(f(xk)-f(xk+1)), with some constant M2>0. Summing these inequalities, we obtain (2.7).

The following theorem establishes the global convergence of the TTHS method with the standard Armijo line search (1.7) for uniformly convex problems.

Theorem 2.3.

Suppose that Assumption 2.1 holds and f is a uniformly convex function. Consider the TTHS method, where αk is obtained by the Armijo line search (1.7), one has that limkinfgk=0.

Proof.

We proceed by contradiction. If (2.18) does not hold, there exists a positive constant ε such that for all kgkε. From Lemma 2.2, we get k=01dk2<. Since f is a uniformly convex function, there exists a constant μ>0 such that (g(x)-g(y))T(x-y)μx-y2,x,yN. This means dk-1Tyk-1μαk-1dk-12.

By (2.1), (2.4), (2.6), and (2.22), one has dkgk+|βkHS|dk-1+|θk|yk-1gk+|βkHS|dk-1+gkdk-1|dk-1Tyk-1|yk-1gk+2gkdk-1|dk-1Tyk-1|yk-1gk+2Lgksk-1μαk-1dk-12dk-12L+μμγ. This implies k=01dk2k=0μ2(μ+2L)2γ2. This yield a contradiction with (2.20).

We are going to investigate the global convergence of the TTHS method with the strong Wolfe line search (1.4) and (1.5). Similar to the PRP+ method , we restrict βkHS=max{βkHS,0}. In this case, the search direction (2.1) may not be a descent direction. Noting the search direction (2.1) can be rewritten as dk={-gk,ifk=0,-gk+βkdk-1-βkgkTdk-1gkTyk-1yk-1,ifk>0, where βk=βkHS. Since the term gkTyk-1 may be zero in practice computation, we consider the following search direction dk={-gk,if|gkTyk-1|<cgk2,-gk+βkHS+dk-1-βkHS+gkTdk-1gkTyk-1yk-1,if|gkTyk-1|cgk2, where c is a positive constant and βkHS+=max{βkHS,0}. It is clear that the relation (2.2) always holds. For simplicity, we regard the method defined by (1.2) and (2.26) as the method (2.26).

Now, we describe a lemma for the search directions, which shows that they change slowly, asymptotically. The lemma is similar to [8, Lemma  3.4].

Lemma 2.4.

Suppose that Assumption 2.1 holds. Consider {xk} be generated the method (2.26), where αk is obtained by the strong Wolfe line search (1.4) and (1.5). If there exists a constant ε>0 such that for all kgkε, then dk0 and k0uk+1-uk2<, where uk=dk/dk.

Proof.

Noting that dk=0, for otherwise (2.2) would imply gk=0. Therefore, uk is well defined. Now, let us define rk=vk/dk and δk=βkHS+(dk-1/dk), where vk=-(1+βkHS+gkTdk-1gkTyk-1)gk. From (2.26), we have uk=rk+δkuk-1. Since uk are unit vectors, we have rk=uk-δkuk-1=δkuk-uk-1. Since δk>0, it follows that uk-uk-1(1+δk)(uk-uk-1)uk-δkuk-1+δkuk-uk-1=2rk. Then we have uk-uk-124rk2.

Now, we evaluate the quantity vk. If gkTyk-1cgk2, by (1.5), we have dk-1Tyk-1=dk-1T(gk-gk-1)(σ-1)gk-1Tdk-1=(1-σ)gk-12.

By the strong Wolfe condition (1.5) and the relation (2.2), we obtain |gkTdk-1|σ|gk-1Tdk-1|=σgk-12. Inequalities (2.34) and (2.35) yield |gkTdk-1||dk-1Tyk-1|σ1-σ. This implies vk(1+βkHS+|gkTdk-1gkTyk-1|)gk(1+|gkTdk-1dk-1Tyk-1|)gk11-σgk. If gkTyk-1<cgk2, then vk=gk. The relation (2.37) also holds. It follows from the definition of rk, Lemma 2.2, (2.27) and (2.37) that k0rk2k0gk4(1-σ)2ε2dk2<. By (2.33), we get the conclusion (2.28).

The next theorem establishes the global convergence of method (2.26) with the strong Wolfe line search (1.4) and (1.5). The proof of the theorem is similar to [15, Theorem  3.2].

Theorem 2.5.

Suppose that Assumption 2.1 holds. Consider {xk} be generated by the method (2.26), where αk is obtained by the strong Wolfe line search (1.4) and (1.5), one has limkinfgk=0.

Proof.

We assume that the conclusion (2.39) is not true, then there exists a constant ε>0 such that for all kgkε. The proof is divided into the following three steps.

Step 1.

A bound for βkHS+. From (2.4), (2.6), and (2.34), we get |βkHS+||gkTyk-1dk-1Tyk-1|Lgk·sk-1(1-σ)gk-12Lγsk-1(1-σ)ε2C1sk-1.

Step 2.

A bound on the steps sk. This is a modified version of [8, Theorem  4.3]. Observe that for any lk,xl-xk=j=kl-1xj+1-xj=j=kl-1sjuj=j=kl-1sjuk+j=kl-1sj(uj-uk). Taking norms and by the triangle inequality to the last equality, we get from (2.5) that j=kl-1sjxl-xk+j=kl-1sjuj-ukB+j=kl-1sjuj-uk. Let Δ be a positive integer, chosen large enough that Δ4BC, where C=(1+σγ2/ε2)C1. By Lemma 2.4, we can chose k0 large enough that ik0ui+1-ui214Δ. If j>kk0 and j-kΔ, then by (2.45) and the Cauchy-Schwarz inequality, we have uj-uki=kj-1ui+1-uij-k(i=kj-1ui+1-ui2)1/2Δ(14Δ)1/2=12. Combining this with (2.43) yields j=kl-1sj2B, where l>kk0 and l-kΔ.

Step 3.

A bound on the direction dl determined by (2.26). If glTyl-1cgl2, from (2.26), (2.27), (2.35), and (2.41), we have dl2(gl+βlHS+dl-1+βlHS+|glTdl-1|gl2yl-1)2(gl+(1+LBσγ2ε2)βlHS+dl-1)22γ2+2(1+LBσγ2ε2)2C12sl-12. If glTyl-1<cgl2, then dl=-gl, we know that the relation (2.48) also holds. Define Si=2C2si2, we conclude that for l>k0, dl22γ2(i=k0+1lj=il-1Sj)+dk02j=k0l-1Sj. Proceeding the similar proof as the case III of [15, Theorem  3.2], we get the conclusion.

3. Numerical Experiments

In this section, we report some numerical results. We tested 111 problems that are from the CUTE  library. We compared the performance of the method (2.26) with the CG_DESECENT method. The CG_DESECNT code can be obtained from Hager's web page at http://www.math.ufl.edu/hager/papers/CG.

In the numerical experiments, we used the latest version—Source code Fortran 77 Version 1.4 (November 14, 2005) with default parameters. We implemented the method (2.26) with the approximate Wolfe line search in . Namely, the method (2.26) used the same line search and parameters as the CG_DESECENT method. The stop criterion is that the inequality g(x)max{10-8,10-12f(x0)} is satisfied or the iteration number exceeds 4×104. All codes were written in Fortran 77 and run on a PC with PIII 866 processor and 192 RAM memory and Linux operation system. Detailed results are posted at the following web site: http://hi.814e.com/wanyoucheng/results.htm.

We adopt the performance profiles by Dolan and Moré  to compare the performance between different methods. That is, for each method, we plot the fraction P of problems for which the method is within a factor τ of the best time. The left side of the figure gives the percentage of the test problems for which a method is the fastest; the right side gives the percentage of the test problems that are successfully solved by each of the methods. The top curve is the method that solved the most problems in a time that is within a factor τ of the best time.

The curves in Figures 1, 2, 3, and 4 have the following meaning:

Performance based on the number of iteration.

Performance based on the number of function evaluations.

Performance based on the number of gradient evaluations.

Performance based on CPU time.

cg-descent: the CG_DSCENT method with the approximate Wolfe line search proposed by Hager and Zhang ;

mhs+: the method (2.26) with the same line search as “cg-descent” and c=10-8.

From Figures 14, it is clear that the “mhs+” method outperforms the “cg-descent” method.

Acknowledgments

The authors are indebted to the anonymous referee for his helpful suggestions which improved the quality of this paper. The authors are very grateful also to Professor W. W. Hager and Dr. H. Zhang for their CG_DESCENT code and line search code. This paper is supported by the NSF of China via Grant 10771057.

WolfeP.Convergence conditions for ascent methodsSIAM Review196911222623510.1137/1011036MR0250453ZBL0177.20603WolfeP.Convergence conditions for ascent methods—II: some correctionsSIAM Review197113218518810.1137/1013035MR0288943ZBL0216.26901HestenesM. R.StiefelE.Methods of conjugate gradients for solving linear systemsJournal of Research of the National Bureau of Standards195249409436MR0060307ZBL0048.09901DaiY.-H.YuanY.Nonlinear Conjugate Gradient Methods2000Shanghai, ChinaShanghai Science and TechnologyHagerW. W.ZhangH.A survey of nonlinear conjugate gradient methodsPacific Journal of Optimization2006213558ZBL1117.90048DaiY.-H.LiaoL.-Z.New conjugacy conditions and related nonlinear conjugate gradient methodsApplied Mathematics and Optimization20014318710110.1007/s002450010019MR1804396ZBL0973.65050PowellM. J. D.Convergence properties of algorithms for nonlinear optimizationSIAM Review198628448750010.1137/1028154MR867680ZBL0624.90091GilbertJ. C.NocedalJ.Global convergence properties of conjugate gradient methods for optimizationSIAM Journal on Optimization199221214210.1137/0802003MR1147881ZBL0767.90082YabeH.TakanoM.Global convergence properties of nonlinear conjugate gradient methods with modified secant conditionComputational Optimization and Applications200428220322510.1023/B:COAP.0000026885.81997.88MR2072533ZBL1056.90130ZhangJ. Z.DengN. Y.ChenL. H.New quasi-Newton equation and related methods for unconstrained optimizationJournal of Optimization Theory and Applications1999102114716710.1023/A:1021898630001MR1702813ZBL0991.90135ZhangJ. Z.XuC.Properties and numerical performance of quasi-Newton methods with modified quasi-Newton equationsJournal of Computational and Applied Mathematics2001137226927810.1016/S0377-0427(00)00713-5MR1865232ZBL1001.65065ZhangL.Nonlinear conjugate gradient methods for optimization problems, Ph.D. thesis2006Changsha, ChinaCollege of Mathematics and Econometrics, Hunan UniversityBongartzI.ConnA. R.GouldN.TointP. L.CUTE: constrained and unconstrained testing environmentACM Transactions on Mathematical Software199521112316010.1145/200979.201043ZBL0886.65058DolanE. D.MoréJ. J.Benchmarking optimization software with performance profilesMathematical Programming200291220121310.1007/s101070100263MR1875515ZBL1049.90004HagerW. W.ZhangH.A new conjugate gradient method with guaranteed descent and an efficient line searchSIAM Journal on Optimization200516117019210.1137/030601880MR2177774ZBL1093.90085