IJMMSInternational Journal of Mathematics and Mathematical Sciences1687-04250161-1712Hindawi Publishing Corporation32962310.1155/2009/329623329623Research ArticleA Conjugate Gradient Method for Unconstrained Optimization ProblemsYuanGonglinJebeleanPetruCollege of Mathematics and Information ScienceGuangxi UniversityNanningGuangxi 530004Chinagxu.edu.cn20091910200920090507200928082009010920092009Copyright © 2009This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

A hybrid method combining the FR conjugate gradient method and the WYL conjugate gradient method is proposed for unconstrained optimization problems. The presented method possesses the sufficient descent property under the strong Wolfe-Powell (SWP) line search rule relaxing the parameter σ<1. Under the suitable conditions, the global convergence with the SWP line search rule and the weak Wolfe-Powell (WWP) line search rule is established for nonconvex function. Numerical results show that this method is better than the FR method and the WYL method.

1. Introduction

Consider the following n variables unconstrained optimization problem: minxnf(x), where f:n is smooth and its gradient g(x) is avaible. The nonlinear conjugate gradient (CG) method for (1.1) is designed by the iterative form xk+1=xk+αkdk,k=0,1,2,, where xk is the kth iterative point, αk>0 is a steplength, and dk is the search direction defined by dk={-gk+βkdk-1,if  k1,-gk,if  k=0, where βk is a scalar which determines the different conjugate gradient methods [1, 2], and gk is the gradient of f(x) at the point xk. There are many well-known formulas for βk, such as the Fletcher-Reeves (FR) , Polak-Ribière-Polyak (PRP) , Hestenses-Stiefel (HS) , Conjugate-Descent (CD) , Liu-Storrey (LS) , and Dai-Yuan (DY) . The CG method is a powerful line search method for solving optimization problems, and it remains very popular for engineers and mathematicians who are interested in solving large-scale problems . This method can avoid, like steepest descent method, the computation and storage of some matrices associated with the Hessian of objective functions. Then there are many new formulas that have been studied by many authors (see  etc.).

The following formula for βk is the famous FR method: βkFR=gk+12gk2, where gk and gk+1 are the gradients f(xk) and f(xk+1) of f(x) at the point xk and xk+1, respectively, · denotes the Euclidian norm of vectors. Throughout this paper, we also denote f(xk) by fk. Under the exact line search, Powell  analyzed the small-stepsize property of the FR conjugate gradient method and its global convergence, Zoutendijk  proved its global convergence for nonconvex function. Al-Baali  proved the sufficient descent condition and the global convergence of the FR conjugate gradient method with the SWP line search by restricting the parameter σ<1/2. Liu et al.  extended the result to the the parameter σ=1/2. Wei et al. (WYL)  proposed a new conjugate gradient formula: βkWYL=gk+1T(gk+1-(gk+1/gk)gk)  gk2. The numerical results show that this method is competitive to the PRP method, the global convergence of this method with the exact line search and Grippo-Lucidi line search conditions is proved. Huang et al.  proved that by restricting the parameter σ<1/4, under the SWP line search rule, this method has the sufficient descent property. Then it is an interesting task to extend the bound of the parameter σ and get the sufficient descent condition.

The sufficient descent condition gkTdk-cgk2,k0, where c>0 is a constant, is crucial to insure the global convergence of the nonlinear conjugate gradient method [23, 2628]. In order to get some better results of the conjugate gradient methods, Andrei [29, 30] proposed the hybrid conjugate gradient algorithms as convex combination of some other conjugate gradient algorithms. Motivated by the ideas of Andrei [29, 30] and the above observations, we will give a hybrid method combining the FR method and the WYL method. The proposed method, relaxing the parameter σ<1, under the SWP line search technique, possesses the sufficient descent condition (1.6). The global convergence with the SWP line search and the WWP line search of our method is established for the nonconvex functions. Numerical results show that the presented method is competitive to the FR and the WYL method.

In the following section, the algorithm is stated. The properties and the global convergence of the new method are proved in Section 3. Numerical results are reported in Section 4 and one conclusion is given in Section 5.

2. Algorithm

Now we describe our algorithm as follows.

Algorithm 2.1 (the hybrid method).

Step 1.

Choose an initial point x0n, ε(0,1), λ10, λ20. Set d0=-g0=-f(x0),  k:=0.

Step 2.

If gkε, then stop; otherwise go to the next step.

Step 3.

Compute step size αk by some line search rules.

Step 4.

Let xk+1=xk+αkdk. If gk+1ε, then stop.

Step 5.

Calculate the search direction dk+1=-gk+1+βkHdk, where βkH=λ1βkWYL+λ2βkFR.

Step 6.

Set k:=k+1, and go to Step 3.

3. The Properties and the Global Convergence

In the following, we assume that gk0 for all k, otherwise a stationary point has been found. The following assumptions are often used to prove the convergence of the nonlinear conjugate gradient methods (see [3, 8, 16, 17, 27]).

Assumption 3.1.

(i) The function f(x) has a lower bound on the level set Ω={xnf(x)f(x0)}, where x0 is a given point.

(ii) In an open convex, set Ω0 that contains Ω,f is differentiable, and its gradient g is Lipschitz continuous, namely, there exists a constants L>0 such that

g(x)-g(y)Lx-y,x,yΩ0.

3.1. The Properties with the Strong Wolfe-Powell Line Search

The strong Wolfe-Powell (SWP) search rule is to find a step length αk such that

f(xk+αkdk)fk+δαkgkTdk,|g(xk+αkdk)Tdk|σ|gkTdk|, where δ(0,1/2),  σ(0,1).

The following theorem shows that the hybrid algorithm with the SWP line search possesses the sufficient condition (1.6) only under the parameter σ(δ,1).

Theorem 3.2.

Let the sequences {gk} and {dk} be generated by Algorithm 2.1, and let the stepsize αk be determined by the SWP line search (3.2) and (3.3), if σ(0,1),  2λ1+λ2(0,1/2σ), then the sufficient descent condition (1.6) holds.

Proof.

By the definition λ1,λ2 and the formulae (1.4) and (1.5), we have βkH=λ1βkWYL+λ2βkFRλ1(gk+12-(gk+1/gk)gk+1gk)gk2+λ2gk+12gk2=λ2gk+12gk2,|βkH|=|λ1βkWYL+λ2βkFR|λ1(gk+12+(gk+1/gk)gk+1gk)gk2+λ2gk+12gk2=(2λ1+λ2)gk+12gk2. Using (3.3) and the above inequality, we get |βkHgk+1Tdk|(2λ1+λ2)gk+12gk2σ|gkTdk|. By (2.1), we have gk+1Tdk+1gk+12=-1+βkHgk+1Tdkgk+12. We prove the descent property of {dk} by induction. Since g0Td0=-g02<0, if g00, now we suppose that di,  i=1,2,,k, are all descent directions, for example, diTgi<0.

By (3.6), we get

|βkHgk+1Tdk|σ(2λ1+λ2)gk+12gk2(-gkTdk), that is, gk+12gk2(2λ1+λ2)σgkTdkβkHgk+1Tdk-gk+12gk2(2λ1+λ2)σgkTdk.   However, from (3.7) together with (3.9), we deduce -1+(2λ1+λ2)σgkTdkgk2gk+1Tdk+1gk+12-1-(2λ1+λ2)σgkTdkgk2. Repeating this process and using the fact d0Tg0=-g02 imply -i=0k[(2λ1+λ2)σ]igk+1Tdk+1gk+12-2+i=0k[(2λ1+λ2)σ]i. By the restriction σ(0,1) and 2λ1+λ2(0,1/2σ), we have (2λ1+λ2)σ(0,1/2). So i=0k[(2λ1+λ2)σ]i<i=0[(2λ1+λ2)σ]i=11-4λ1σ. Then (3.11) can be rewritten as -11-(2λ1+λ2)σgk+1Tdk+1gk+12-2+11-(2λ1+λ2)σ. Thus, by induction, gkTdk<0 holds for all k0.

Denote c=2-1/(1-(2λ1+λ2)σ), then c(0,1) and (3.13) turns out to be

c-2gkTdkgk2-c, this implies that (1.6) holds. The proof is complete.

Lemma 3.3.

Suppose that Assumption 3.1 holds. Let the sequences {gk} and {dk} be generated by Algorithm 2.1, let the stepsize αk be determined by the SWP line search (3.2) and (3.3), and let the conditions in Theorem 3.2 hold. Then the Zoutendijk condition  k=0(gkTdk)2dk2<+ holds.

By the same way, if Assumption 3.1 and the condition gkTdk<0 (for all k) hold, (3.15) also holds for the exact line search, the Armijo-Goldstein line search, and the weak Wolfe-Powell line search. The proofs can be seen in [31, 32]. Now, we prove the global convergence theorem of Algorithm 2.1 with the SWP line search.

Theorem 3.4.

Suppose that Assumption 3.1 holds. Let the sequence {gk} and {dk} be generated by Algorithm 2.1, let the stepsize αk be determined by the SWP line search (3.2) and (3.3), let the conditions in Theorem 3.2 hold, and let the parameter 2λ1+λ21. Then limkinfgk=0.

Proof.

By (1.6), (3.3), and the Zoutendijk condition (3.15), we get k=0gk4dk2<+. Denote tk=dk2gk4, so (3.17) can be rewritten as k=01tk<+. We prove the result of this theorem by contradiction. Assume that this theorem is not true, then there exists a positive constant γ>0 such that gkγ,k0.   Squaring both sides of (2.1), we obtain dk2=gk2-2βk-1HgkTdk-1+(βk-1Hdk-1)2. Dividing both sides by gk4, applying (3.4), (3.18), and the parameter 2λ1+λ21, we get tktk-1+1gk2(1+2|gkTdk-1|gk-12  )  . Using (3.3) and (1.6), we have tktk-1+1gk2(1+2σ|gk-1Tdk-1|gk-12)tk-1+[1+2σ(2-c)]1gk2. Repeating this process and using the fact t0=1/g02, we get tk[1+2σ(2-c)]i=11gi2. Now, combining (3.24) and (3.20), we get another formula tk[1+2σ(2-c)]k+1γ2. Thus k=01tk=+, this contradicts the condition (3.19). However, the conclusion of this theorem is correct.The following theorem will show that Algorithm 2.1 with the SWP line search, only under the descent condition gkTdk<0,k0, is global convergence too.

Theorem 3.5.

Suppose that Assumption 3.1 holds. Let the sequence {gk,dk} be generated by Algorithm 2.1, let the stepsize αk be determined by the SWP line search (3.2) and (3.3), let the parameter 2λ1+λ21, and let (3.27) hold. Then, for all k0, the following inequalities holds: min{rk,rk+1}12, where rk=-gkTdkgk2, furthermore, (3.16) holds

Proof.

By (2.1), (3.3), (3.27), and 2λ1+λ21, we can deduce that (3.10) holds for all σ<1 and σ*=(2λ1+λ2)σ<1. According to the second inequality of (3.10), we have σ*rk+rk+11. Similar to the way of , it is not difficult to get (3.28).

Secondly, using (3.30) and the Cauchy-Schwarz inequality implies that (or see )

rk2+rk+12(1+σ*2)-1. Moreover, repeating the process of the first inequality of (3.10) and using r0=1, we get rk+1<(1-σ*)-1. By (3.3), (3.22), and the above inequality, we have tk+11+σ*1-σ*i=0k+11gi2. Thus, using (3.20), we have tk+1b1(k+2), where b1=1+σ*/(1-σ*)γ2. By Zoutendijk condition (3.15), (3.31), and (3.34), we obtain +>k=0(gkTdk)2dk2=k=0rk2tk=k=0(r2k-12t2k-1+r2k2t2k)k=0r2k-12+r2k22b1(k+1)k=012b1(1+σ*2)(k+1)=+. This contradiction shows that (3.16) is true.

3.2. The Properties with the Weak Wolfe-Powell (WWP) Line Search

The weak Wolfe-Powell line search is to find a step length αk satisfying (3.2) and g(xk+αkdk)TdkσgkTdk, where δ(0,1/2),  σ(δ,1).

From the computation point of view, one of the well-known formulas for βk is the PRP method. The global convergence with the exact line search had been proved by Polak and Ribière  when the objective function is convex. Powell  gave a counter example to show that there exist nonconvex functions on which the PRP method does not converge globally even if the exact line search is used. He suggested that βk should not be less than zero. Considering this suggestion, under the assumption of the sufficient descent condition, Gilbert and Nocedal  proved that the modified PRP method βk+=max{0,βkPRP} is globally convergent with the WWP line search technique. For the new formula βkH, we know that it is always larger than zero. Then we can also get the global convergence of the hybrid method with the WWP line search.

Lemma 3.6 (see [<xref ref-type="bibr" rid="B8">31</xref>, Lemma <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M144"><mml:mn>3.3</mml:mn><mml:mo>.</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>]).

Let Assumption 3.1 hold and let the sequences {gk,dk} be generated by Algorithm 2.1. The stepsize αk is determined by (3.2) and (3.36). Suppose that (3.20) is true, and the sufficient descent condition (1.6) holds. Then we have dk0 and k=0uk+1-uk2<, where uk=dk/dk.

The following Property 1 was introduced by Gilbert and Nocedal , which pertains to the PRP method under the sufficient descent condition. The WYL also has this property. Now we will prove that this Property 1 pertains to the new method.

Property 1.

Suppose that 0<γ1gkγ2. We say that the method has Property 1 if for all k, there exists constants b>1 and λ>0 such that |βk|b and skλ|βk|12b.

Lemma 3.7.

Let Assumption 3.1 hold, let the sufficient descent condition (1.6) hold, and let the sequences {gk,dk} be generated by Algorithm 2.1. Suppose that there exists a constant M>0 such that dkM for all k. Then this method possesses Property 1.

Proof.

By (3.36), (1.6), (3.1), and (3.38), we have γ1c(1-σ)gkc(1-σ)gk2-(1-σ)gkTdk(gk+1-gk)TdkLskdkLMsk. Then we get gkLMγ1c(1-σ)sk. By (3.1) again, we obtain gk+1-gkgk+1-gkLsk. Combing the above inequality and (3.41) implies that gk+1gk+Lsk(LMγ1c(1-σ)+L)sk. By (3.5) and (3.38), we have |βkH|=(2λ1+λ2)gk+12gk2(2λ1+λ2)γ22γ12, let b=max{2,(2λ1+λ2)γ22/γ12}>1,  λ=γ12/2bγ2L(2λ1+M/γ1c(1-σ)+1). If skλ, using (3.1), (3.38), (3.43), and the above equation, we obtain |βkH|gk+1λ1gk+1-(gk+1/gk)gk+λ2gk+1gk2=γ2λ1gk+1-gk+gk-(gk+1/gk)gk+λ2gk+1gk2γ2λ1gk+1-gk+λ1|gk-gk+1|+λ2gk+1gk2γ22λ1Lsk+(LM/γ1c(1-σ)+L)skgk2γ2L(2λ1+M/γ1c(1-σ)+1)γ12λ=12b. Therefore, the conclusion of this lemma holds.

Lemma 3.8 (see [<xref ref-type="bibr" rid="B8">31</xref>, Lemma <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M168"><mml:mn>3.3</mml:mn><mml:mo>.</mml:mo><mml:mn>2</mml:mn></mml:math></inline-formula>]).

Let the sequences {gk} and {dk} be generated by Algorithm 2.1 and the conditions in Lemma 3.7 hold. If βkH0 and has Property 1, then there exists λ>0 such that for any ΔN and any index k0, there is an index k>k0 satisfying |κk,Δλ|>λ2, where κk,Δλ={iN:kik+Δ-1,si>λ},N denotes the set of positive integers, |κk,Δλ| denotes the numbers of elements in κk,Δλ.

Finally, by Lemmas 3.6 and 3.8, we present the global convergence theorem of Algorithm 2.1 with the WWP line search. Similar to [31, Theorem 3.3.3], it is not difficult to prove the result, so we omit it.

Theorem 3.9.

Let the sequences {gk} and {dk} be generated by Algorithm 2.1 with the weak Wolfe-Powell line search and the conditions in Lemma 3.7 hold. Then (3.16) holds.

4. Numerical Results

In this section, we report some results of the numerical experiments. It is well known that there exist many new conjugate gradient methods (see [1, 1316, 18, 19, 29, 30]) which have good properties and good numerical performances. Since the given formula is the hybrid of the FR formula and the WYL formula, we only test Algorithm 2.1 under the WWP line search on problems in  with the given initial points and dimensions, and compare its performance with those of the FR  and the WYL  methods. The parameters are chosen as follows: δ=0.1,  σ=0.2,  λ1=λ2=0.5. The following Himmeblau stop rule is used as follows.

If |f(xk)|>e1, let stop1=|f(xk)-f(xk+1)|/|f(xk)|; otherwise, let stop1=|f(xk)-f(xk+1)|.

If g(x)<ε or stop1<e2 was satisfied, we will stop the program, where e1=e2=10-6. We also stop the program if the iteration number is more than one thousand. All codes were written in MATLAB and run on PC with 2.60 GHz CPU processor and 256 MB memory and Windows XP operation system. The detail numerical results are listed at http://210.36.18.9:8018/publication.asp?id=36990.

Dolan and Moré  gave a new tool to analyze the efficiency of Algorithms. They introduced the notion of a performance profile as a means to evaluate and compare the performance of the set of solvers S on a test set P. Assuming that there exist ns solvers and np problems, for each problem p and solver s, they defined:

tp,s= computing time (the number of function evaluations or others) required to solve problem p by solver s.

Requiring a baseline for comparisons, they compared the performance on problem p by solver s with the best performance by any solver on this problem; that is, using the performance ratio rp,s=tp,smin{tp,s:sS}. Suppose that a parameter rMrp,s for all p,s is chosen, and rp,s=rM if and only if solver s does not solve problem p.

The performance of solver s on any given problem might be of interest, but we would like to obtain an overall assessment of the performance of the solver, then they defined ρs(t)=1npsize{pP:rp,st}, thus ρs(t) was the probability for solver sS that a performance ratio rp,s was within a factor t of the best possible ratio. Then function ρs was the (cumulative) distribution function for the performance ratio. The performance profile ρs:[0,1] for a solver was a nondecreasing, piecewise constant function, continuous from the right at each breakpoint. The value of ρs(1) was the probability that the solver would win over the rest of the solvers.

According to the above rules, we know that one solver whose performance profile plot is on top right will win over the rest of the solvers.

Figures 1-2 show that the performances of these methods are relative to the iteration number (NI) and the number of the function and gradient (NFN), where the “FR” denotes the FR formula with WWP rule, the “WYL” denotes the WYL formula with WWP rule, and Algorithm 2.1 denotes the new method with WWP rule, respectively.

Performance profiles of these three methods (NI).

Performance profiles of these three methods (NFN).

From Figures 1-2, it is easy to see that Algorithm 2.1 is the best among the three methods, and the WYL method is much better than FR methods. Notice that the global convergence of the FR method with the WWP line search has not been established yet. In other words, the given method is competitive to the other two normal methods and the hybrid formula is notable.

5. Conclusions

This paper gives a hybrid conjugate gradient method for solving unconstrained optimization problems. Under the SWP line search, this method possesses the sufficient descent condition only with the parameter σ<1. The global convergence with the SWP line search and the WWP line search is established for the nonconvex functions. Numerical results show that the given method is competitive to other two conjugate gradient methods.

For further research, we should study the new method with the nonmonotone line search technique. Moreover, more numerical experiments for large practical problems (such as the problems ) should be done, and the given method should be compared with other famous formulas in the future. How to choose the parameters λ1 and λ2 in the algorithm is another aspect of future investigation.

Acknowledgments

The authors are very grateful to the anonymous referees and the editors for their valuable suggestions and comments, which improved our paper greatly. This work is supported by China NSF grants 10761001 and the Scientific Research Foundation of Guangxi University (Grant no. X081082).