AAA Abstract and Applied Analysis 1687-0409 1085-3375 Hindawi Publishing Corporation 279891 10.1155/2014/279891 279891 Research Article A Hybrid of DL and WYL Nonlinear Conjugate Gradient Methods http://orcid.org/0000-0003-2369-6963 Yao Shengwei 1, 2 Qin Bin 1 Atangana Abdon 1 School of Information and Statistics Guangxi University of Finance and Economics Nanning 530003 China gxufe.cn 2 School of Science East China University of Science and Technology Shanghai 200237 China ecust.edu.cn 2014 2532014 2014 29 11 2013 27 01 2014 25 3 2014 2014 Copyright © 2014 Shengwei Yao and Bin Qin. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The conjugate gradient method is an efficient method for solving large-scale nonlinear optimization problems. In this paper, we propose a nonlinear conjugate gradient method which can be considered as a hybrid of DL and WYL conjugate gradient methods. The given method possesses the sufficient descent condition under the Wolfe-Powell line search and is globally convergent for general functions. Our numerical results show that the proposed method is very robust and efficient for the test problems.

1. Introduction

The nonlinear conjugate gradient (CG) method has played a special role in solving large-scale nonlinear optimization due to the simplicity of their iterations and their very low memory requirements. In fact, the CG method is not among the fastest or most robust optimization algorithms for nonlinear problems available today, but it remains very popular for engineers and mathematicians who are interested in solving large-scale problems. As we know, the nonlinear conjugate gradient method is the extended based on linear conjugate gradient method. The first linear conjugate gradient method is proposed by Hestenes and Stiefel 1952  for solving linear equations. In 1964, Fletcher and Reeves  extended it to nonlinear problems and get the first nonlinear conjugate gradient method (FR method).

In this paper, we focus on solving the following nonlinear unconstrained optimization problem by conjugate gradient method: (1) min { f ( x ) : x R n } , where f ( x ) : R n R is a smooth, nonlinear function whose gradient will be denoted by g ( x ) . The iterative formula of the conjugate gradient method is given by (2) x k = x k - 1 + α k - 1 d k - 1 , where, d k - 1 is the search direction at x k - 1 and α k - 1 is the step-length. For nonlinear conjugate gradient method, d k is computed by (3) d k = - g k + β k d k - 1 , d 1 = - g 1 , where β k is a scalar and g k = f ( x k ) denotes the gradient. Different conjugate gradient methods correspond to the different ways to compute β k . Some well-known formulae for β k are the Fletcher-Reeves (FR), Polak-Ribière (PR), Hestense-Stiefel (HS), Dai-Yuan (DY), and CG-DESCENT, which are given, respectively, by (4) β k FR = g k 2 g k - 1 2 (see ), (5) β k PR = g k T y k - 1 g k - 1 2 (see ), (6) β k HS = g k T y k - 1 d k - 1 T y k - 1 (see ), (7) β k DY = g k 2 d k - 1 T y k - 1 (see ), (8) β k N = ( y k - 1 - 2 d k - 1 y k - 1 2 d k - 1 T y k - 1 ) T g k d k - 1 T y k - 1 (see ), where y k - 1 = g k - g k - 1 denotes the gradient change.

Although all these methods are equivalent in the linear case, namely, when f is a strictly convex quadratic function and α k are determined by exact line search, their behaviors for general objective functions may be far different. For general functions, Zoutendijk  proved the global convergence of FR method with exact line search. (Here and throughout this paper, for global convergence, we mean the sequence generated by the corresponding methods will either terminate after finite steps or contain a subsequence such that it converges to a stationary point of the objective function from a given initial point.) Although one would be satisfied with its global convergence properties, the FR method performs much worse than the PR (HS) method in real computations. Powell  analyzed a major numerical drawback of the FR method, namely, if a small step is generated away from the solution point, the subsequent steps may be also very short. On the other hand, in practical computation, the HS method resembles the PR method, and both methods are generally believed to be the most efficient conjugate gradient methods since these two methods essentially perform a restart if a bad direction occurs. However, Powell  constructed a counterexample and showed that the PR method and HS method can cycle infinitely without approaching the solution. This example suggests that these two methods have a drawback that they are not globally convergent for general functions. Therefore, over the past few years, much effort has been put to find out new formulas for conjugate methods such that they are not only globally convergent for general functions but also have good numerical performance. The similar counterexamples are also constructed by Dai and Yuan .

From the structure of the above formulae β k , we know that β k FR and β k DY have the common numerator g k 2 . They are globally convergent if the objective function Lipschitz continuous and the level set { x R n : f ( x ) < f ( x 0 ) } is bounded. For inexact line search, Al-Baali  proved the global convergence of FR method under the strong Wolfe-Powell line search with the restriction σ < 1 / 2 . Based on Al-Baali's result, Liu et al.  extended the global convergence of FR method to the case σ = 1 / 2 . Dai and Yuan  proved that the sufficient descent condition must hold for one of the directions d k + 1 and d k , and proposed the global convergence of FR method with general Wolfe line searches.

β k PR and β k HS share the common numerator g k T y k - 1 , they possess a built-in restart feature to avoid the jamming problem as follows: when the step x k - x k - 1 is small, the factor y k - 1 in the numerator of β k tends to be zero. Hence, the next search direction d k is essentially the steepest descent direction - g k . So, the numerical performance of these methods is better than the performance of the methods with g k 2 in numerator of β k . In  Polak and Ribière proved that if the objective function is strongly convex and line search is exact, the PR method is globally convergent. For general functions, Powell [7, 8] analyzed the convergence properties of PR method and constructed an example which shows that the PR method may cycle infinitely between nonstationary points. To get the global convergence, Gilbert and Nocedal  made the following nonnegative restriction on β k : (9) β k PR + = max { β k PR , 0 } .

Generally speaking, methods with numerator g k T y k - 1 possess better convergence than the methods with numerator g k 2 . But from numerical performance point of view, methods with numerator g k 2 outperform the methods with the numerator g k T y k - 1 . So, a lot of effort has been made to find the method which has nice convergence properties and efficient numerical performance in the past decades. In , authors proposed a new conjugacy condition which made use of not only gradient values but also function values. Based on the given conjugacy condition, a class of nonlinear conjugate gradient methods is proposed. The PR method outperforms a lot of methods in numerical experiments, but it does not possess the sufficient descent condition. So, some modified forms of PR method have been studied in [15, 16], the given methods possess the sufficient descent condition and are globally convergent for general functions.

2. Motivations and the New Formula

Since the PR method is considered as one of the most efficient nonlinear conjugate gradient methods, a lot of effort has been made on its convergence properties and its modifications. In , with the sufficient descent assumption, Gilbert and Nocedal proved the global convergence of PR+ method under the Wolfe line search. Grippo and Lucidi  constructed an Armijo-type line search and proved that under this line search, directions d k generated by PR method satisfy the sufficient descent condition.

Recently, Wei et al.  and Huang et al.  gave a modified formula for computing β k as follows: (10) β k WYL = g k T y k - 1 ~ g k - 1 2 , where y k - 1 ~ = g k - ( g k / g k - 1 ) g k - 1 .

The method with formula β k WYL not only has nice numerical results but also possess the sufficient descent condition and global convergence properties under the strong Wolfe-Powell line search. From the structure of β k WYL , we know that, the method with β k WYL can also avoid jamming such that when the step x k - x k - 1 is small, g k / g k - 1 tends to be 1 and the next search direction tends to be the steepest descent direction which is similar to PR method. But WYL method has some advantages, such as under strong Wolfe-Powell line search, β k WYL 0 , and if the parameter σ 1 / 4 in SWP, WYL method possesses the sufficient descent condition which deduces the global convergence of the WYL method.

In [20, 21], Shengwei et al. and Huang et al. extended such modification to HS method as follows: (11) β k MHS = g k T y k - 1 * d k - 1 T y k - 1 , y k - 1 * = g k - g k g k - 1 g k - 1 . The above formulae β k WYL and β k MHS can be considered as the modification form of β k PR and β k HS by using y k - 1 * to replace y k - 1 , respectively. In [20, 21], the corresponding methods are proved to be globally convergent for general functions under the strong Wolfe-Powell line search and Grippo-Lucidi line search. Based on the same approach, some authors extended other discussions and modifications in . In fact, y k - 1 * is not our point at the beginning, our purpose is involving the information of the angle between g k and g k - 1 . From this point of view, β k WYL has the following form: (12) β k WYL = β k FR ( 1 - cos ( θ k - ) ) , where θ k - is the angle between g k and g k - 1 . By multiplying β k FR with 1 - cos θ k - , the method not only has similar convergence properties with FR method, but also avoid jamming which is similar to PR method.

Recently, Dai and Liao  proposed a new conjugacy condition which is based on the Quasi-Newton techniques. According to the new conjugacy condition, the following formula β k DL is given: (13) β k DL 1 = g k T ( y k - 1 - t s k - 1 ) d k - 1 T y k - 1 , where t 0 , for simplicity, we call the method with (13) as DL1 method. It is obviously that (14) β k DL 1 = β k HS - t g k T s k - 1 d k - 1 T y k - 1 . In , for the method with β k DL 1 , if the line search is exact, DL1 method has the same convergence properties with PR method, which indicates that DL1 method does not converge for general functions. To get the global convergence, Dai and Liao replace (13) by (15) β k DL = max { β k HS , 0 } - t g k T s k - 1 d k - 1 T y k - 1 . The formula (15) can be considered as a modified form of β k HS , by adding the part t ( g k T s k - 1 / d k - 1 T y k - 1 ) which may contain some information of Hessian 2 f ( x ) . From the convergence analysis in , the nonnegative restriction max { β k HS , 0 } and the sufficient descent condition are significant for the global convergence results.

Motivated by the above discussion, in this paper, we give the following formula to compute the parameter β k : (16) β k * = β k WYL - t g k T s k - 1 d k - 1 T y k - 1 .

The formula β k * can be considered a modification of β k WYL , namely, by adding t ( g k T s k - 1 / d k - 1 T y k - 1 ) , the β k * may contain some Hessian information . It also can be considered as a modified form of β k DL by substituting max { β k HS , 0 } with β k WYL . We call the method with (2), (3), and (16) as WYLDL method and give the corresponding algorithm as follows.

Algorithm 1 (WYLDL).

1 55555555

Step 1. Given x 1 R n , ε 0 , set d 1 = - g 1 , k = 1 , if g 1 ε , then stop;

Step 2. Compute α k by the Strong Wolfe-Powell line search;

Step 3. Let x k = x k - 1 + α k - 1 d k - 1 , g k = g ( x k ) , if g k ε , then stop;

Step 4. Compute β k by (16) and generate d k by (3);

Step 5. Set k = k + 1 , go to Step 2.

The convergence properties of Algorithm 1 will be discussed in Section 3.

3. Convergence Analysis

For conjugate gradient methods, during the iteration process, the gradient of the objective function is required. We make the following basic assumptions on the objective functions.

Assumption 2.

(i)  The level set Γ = { x R n : f ( x ) f ( x 1 ) } is bounded, namely, there exists a constant B > 0 such that (17) x B , x Γ .

(ii)  In some neighborhood N of Γ , f is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant L > 0 such that (18) g ( x ) - g ( y ) L x - y , x , y N .

Under the above assumptions of f , there exists a constant γ - 0 such that (19) f ( x ) γ - , x Γ . Exact Line Search. Suppose that d k is a descent direction and step length α k is the solution of (20) min α k > 0 f ( x k + α d k ) . For exact line search, (21) g k T d k - 1 = 0 . Form (16) and (21), we can get that β k * = β k WYL . In , author proved that if the line search is exact, the method with β k WYL is globally convergent for the uniformly convex functions.

For conjugate gradient methods, the sufficient descent condition is significant to the global convergence. We say the sufficient descent condition holds if there exists a constant c > 0 such that (22) g k T d k - c g k 2 , k . In nonlinear optimization algorithm, the Strong Wolfe-Powell conditions, namely, (23) f ( x k + α k d k ) - f ( x k ) δ α k g k T d k , (24) | g ( x k + α k d k ) T d k | σ | g k T d k | , where 0 < δ < σ < 1 , are often imposed on the line search.

In , Dai and Liao proved that, if directions d k satisfy the sufficient descent condition (22), the DL method is globally convergent under the strong Wolfe-Powell line search for general functions. In this section, we will prove that the directions d k generated by Algorithm 1 satisfy the sufficient descent condition (22). Based on this result, the global convergence of Algorithm 1 will be established.

Lemma 3.

Suppose that the sequence { x k } is generated by Algorithm 1, the step-length α k satisfy the strong Wolfe-Powell conditions (23) and (24), if 0 < σ < 1 / 4 ; then, the generated directions d k satisfy the sufficient descent condition (22).

Proof.

We prove this result by induction. By using (3), we have g k T d k = - g k 2 + β k WYLDL g k T d k - 1 , and combining this equation with ( β k WYLDL ), we can deduce that (25) g k T d k g k 2 = - 1 + g k T d k - 1 g k - 1 2 ( 1 - cos ( θ k - ) ) - t α k ( g k T d k - 1 ) 2 g k 2 d k - 1 T y k - 1 . By strong Wolfe-Powell condition (24), it follows that d k - 1 T y k - 1 ( σ - 1 ) g k - 1 T d k - 1 > 0 . Which means that (26) t α k ( g k T d k - 1 ) 2 g k 2 d k - 1 T y k - 1 > 0 . From (25), the following inequality holds: (27) g k T d k g k 2 - 1 + g k T d k - 1 g k - 1 2 ( 1 - cos ( θ k - ) ) . (25) and Wolfe-Powell condition (24) deduce that (28) g k T d k g k 2 - 1 + - σ g k - 1 T d k - 1 g k - 1 2 ( 1 - cos ( θ k - ) ) , namely, (29) g k T d k g k 2 - 1 + - 2 σ g k - 1 T d k - 1 g k - 1 2 . The repeating of (29) can deduce that (30) g k T d k g k 2 - 2 + j = 0 k - 1 ( 2 σ ) j . Since (31) j = 0 k - 1 ( 2 σ ) j < j = 0 ( 2 σ ) j = 1 1 - 2 σ , (30) can be expressed as (32) g k T d k g k 2 - 2 + 1 1 - 2 σ . With the restrictions σ ( 0 , 1 / 4 ) and g 1 T d 1 = - g 1 2 , for c = ( 1 - 4 σ ) / ( 1 - 2 σ ) , the inequality (32) means that (33) g k T d k - c g k 2 .

For PR method, when a small step-length occurs, β k PR will tend to be zero, and the next search direction d k automatically approaches to - g k . By such way, the PR method automatically avoids jamming. This property was first studied by Gilbert and Nocedal , which is called Property ( * ). We are going to show that the method with β k WYLDL possesses such property ( * ).

Property 1.

( * ) Consider a method of form (2) and (3). Suppose that (34) 0 < γ g k γ - k 1 . We say that the method has Property (*), if for all k , there exist constants b > 1 , λ > 0 such that | β k | b , and if s k - 1 λ , we have | β k | 1 / 2 b .

Lemma 4.

Consider a method of form (2) and (3). If β k is determined by β k WYLDL , α k satisfies the Wolfe-Powell condition (24), then the method possesses Property 1( * ).

Proof.

By Lemma 3, we know that the sufficient descent condition (22) holds. Combining with Wolfe-Powell condition (24), we have (35) d k - 1 T y k - 1 ( σ - 1 ) g k - 1 T d k - 1 ( 1 - σ ) c γ 2 . It follows from (17), (35), and β k WYLDL that (36) | β k WYLDL | g k ( g k - g k - 1 + g k - 1 - ( g k / g k - 1 ) g k - 1 ) g k - 1 2 + t g k s k - 1 | d k - 1 T y k - 1 | g k ( 2 g k - g k - 1 ) g k - 1 2 + t g k s k - 1 c ( 1 - σ ) g k - 1 2 g k ( 2 L s k - 1 ) g k - 1 2 + t g k s k - 1 c ( 1 - σ ) g k - 1 2 2 B γ - ( c ( 1 - σ ) 2 L + t ) γ 2 c ( 1 - σ ) . Set (37) b = 2 B γ - ( c ( 1 - σ ) 2 L + t ) γ 2 c ( 1 - σ ) , which means that | β k WYLDL | < b . By setting (38) λ = γ 2 c ( 1 - σ ) 2 b γ - ( c ( 1 - σ ) 2 L + t ) , we have (39) | β k WYLDL | < s k - 1 γ - ( c ( 1 - σ ) 2 L + t ) γ 2 c ( 1 - σ ) , < 1 2 b .

For nonlinear conjugate gradient methods, Dai et al.  proposed the following general conclusion.

Lemma 5.

Suppose that Assumption 2 holds. Consider any conjugate gradient method, where d k is a descent direction and α k is obtained by the strong Wolfe-Powell line search. if (40) 1 d k 2 = , we have (41) lim inf k g k = 0 .

By Lemma 3, we know that the method with β k WYLDL possesses the sufficient descent condition under the Wolfe-Powell line search. Combining with Lemma 5, we can have the following theorem.

Theorem 6.

Suppose that Assumption 2 holds. Consider WYLDL method, where α k is obtained by strong Wolfe-Powell lien search with σ < 1 / 4 . If there exists a constant γ > 0 such that (42) g k γ , k 1 , then d k 0 and (43) k 2 u k - u k - 1 2 < , where u k = d k / d k .

Proof.

Firstly, note that d k 0 ; otherwise, the sufficient descent condition (22) fails. Therefore, u k is well defined. In addition, by relation (42) and Lemma 5 we have (44) k 1 1 d k 2 < . Now, we divide formula β k WYLDL into two parts as follows: (45) β k 1 = β k WYL , β k 2 = - t g k T s k - 1 d k - 1 T y k - 1 , and define (46) r k = ϑ k d k , δ k = β k 1 d k - 1 d k , where ϑ k = - g k + β k 2 d k - 1 .

Then by (3), we have, for all k 2 , (47) u k = r k + δ k u k - 1 . Using the identity u k = u k - 1 = 1 and (47), we can obtain (48) r k = u k - δ k u k - 1 = δ k u k - u k - 1 . Using the condition δ k = β k WYL ( d k - 1 / d k ) = ( g k 2 / g k - 1 2 ) ( 1 - cos ( θ k - ) ) ( d k - 1 / d k ) 0 , the triangle inequality, and (48), we obtain (49) u k - u k - 1 ( 1 + δ ) u k - ( 1 + δ ) u k - 1 u k - δ k u k - 1 + δ k u k - u k - 1 = 2 r k . On the other hand, line search condition (24) gives (50) y k - 1 T d k - 1 ( σ - 1 ) g k - 1 T d k - 1 . Equations (22), (24), and (50) imply that (51) | g k T d k - 1 d k - 1 T y k - 1 | σ 1 - σ . It follows from the definition of ϑ k , (17), (36), and (51) that (52) ϑ k g k + t | g k T s k - 1 d k - 1 T y k - 1 | d k - 1 = g k + t | g k T d k - 1 d k - 1 T y k - 1 | s k - 1 γ - + t σ 1 - σ 2 B . So we have (53) u k - u k - 1 2 4 r k 2 4 ϑ k 2 d k 2 4 ( γ - + t σ 1 - σ 2 B ) 2 1 d k 2 < . Let N * denote the set of positive integers. For λ > 0 and a positive integer Δ , denote (54) K k , Δ λ = { i N * : k i k + Δ - 1 , s k - 1 > λ } . Let | K k , Δ λ | denote the number of elements in K k , Δ λ . Dai and Liao  pointed out that for conjugate gradient method which satisfies

Property 1(*);

the sufficient descent condition;

Theorem 6;

if (42) holds, then the small step-sizes should not be too many. This property is described as follows.

Lemma 7.

Suppose that Assumption 2 holds. Consider WYLDL method, where α k is obtained by the strong Wolfe-Powell line search in which σ < 1 / 4 . Then if (42) holds, there exists λ > 0 such that, for any Δ N * and any index k 0 , there is an index k k 0 such that (55) | K k , Δ λ | > Δ 2 .

Proof.

It follows from Lemmas 3 and 4 and Theorem 6 that WYLDL method possesses the above three conditions in . So, according to Lemma 3.5 in , the Lemma 7 holds. We omit the detailed proof of this Lemma 7.

According to the above lemmas and theorems, we can prove the following convergence theorem for WYLDL method.

Theorem 8.

Suppose that Assumption 2 holds. Consider WYLDL method, if α k is obtained by strong Wolfe-Powell line search with σ < 1 / 4 , then we have (56) lim inf k g k = 0 .

Proof.

We prove this theorem by contradiction. If liminf k g k > 0 , then (42) must hold. Then the conditions of Theorem 6 and Lemma 7 hold. Defining u i = d i / d i , we have, for any indices l , k , with l k , (57) x l - x k - 1 = i = k l x i - x i - 1 = i = k l α i - 1 d i - 1 = i = k l u i - 1 s i - 1 = i = k l s i - 1 u k - 1 + i = k l s i - 1 ( u i - 1 - u k - 1 ) . (57), u i = 1 , and (17) give the following: (58) i = k l s i - 1 x l - x k - 1 + i = k l s i - 1 u i - 1 - u k - 1 2 B + i = k l s i - 1 u i - 1 - u k - 1 . Let λ > 0 be given by Lemma 7 and define Δ = 8 B / λ to be the smallest integer not less than 8 B / λ . By Theorem 6, we can find an index k 0 1 such that (59) i k 0 u i - 1 - u k - 1 2 1 4 Δ . With this Δ and k 0 , Lemma 7 gives an index k k 0 such that (60) | K k , Δ λ | > Δ 2 . For any index i [ k , k + Δ - 1 ] , by Cauchy-Schwartz inequality and (59), (61) u i - u k - 1 j = k i u j - u j - 1 ( i - k + 1 ) 1 / 2 ( j = k i u j - u j - 1 2 ) 1 / 2 Δ 1 / 2 ( 1 4 Δ ) 1 / 2 = 1 2 . From these relations (61) and (60) and taking l = k + Δ - 1 in (58), we get (62) 2 B 1 2 i = k k + Δ - 1 s i - 1 > λ 2 | K k , Δ λ | > λ Δ 4 . Thus Δ < 8 B / λ , which contradicts the definition of Δ . The proof is completed.

4. Numerical Experiments

In this section, we report the performance of the Algorithm 1 (WYLDL) on a set of test problems. The codes were written in Fortran 77 and in double precision arithmetic. All the tests were performed on the same PC. The experiments were performed on a set of 73 nonlinear unconstrained problems in . Some of the problems are from CUTE  library. For each test problem, we have performed 10 numerical experiments with number of variables n = 1000,2000 , , 10000 .

In order to assess the reliability of the WYLDL algorithm, we also tested this method against the DL method and WYL method using the same problems. All these algorithms are terminated when g k 1 0 - 5 . We also force stopped the routines if the iterations exceeded 1000 or the number of function evaluations reached 2000. In the Wolfe-Powell line search conditions (23) and (24), the parameters are δ = 1 0 - 4 , σ = 1 0 - 1 . For DL method, t = 0.1 , which is the same with . We also test WYLDL algorithm with t = 0.1 which is the best choice.

The comparing data contain the iterations, function and gradient evaluations, and CPU time. To approximatively assess the performance of WYLDL, WYL and DL methods, we use the profile of Dolan and Moré  as an evaluated tool.

Dolan and Moré  gave a new tool to analyze the efficiency of algorithms. They introduced the notion of a performance profile as a means to evaluate and compare the performance of the set of solvers S on a test set P . Assuming that there exist n s solvers and n p problems, for each problem p and solver s , they defined that

t p , s  = computing cost (iterations or function and gradient evaluations or CPU time) is required to solve problem p by solver s .

Requiring a baseline for comparisons, they compared the performance on problem p by solver s with the best performance by any solver on this problem; that is, using the performance ratio as follows: (63) r p , s = t p , s min { t p , s : s S } .

Suppose that a parameter M r p , s for all p , s . Set r p , s = M if and only if solver s does not solve problem p . Then they defined (64) ρ s ( τ ) = 1 n p size { p P : r p , s τ } , thus ρ s ( τ ) is the probability for solver s that a performance ratio r p , s is within a factor τ 1 of the best possible ratio. Then function ρ s is the distribution function for the performance ratio. The performance profile ρ s is a nondecreasing, piecewise constant function. That is, for subset of the methods being analyzed, we plot the fraction P of the problems for which any given method is within a factor τ of the best.

For the testing problems, if all three methods cannot terminate successfully, then we got rid of them. In case one method fails, but there are other methods that terminate successfully, then the performance ratio of the failed method is set to be M ( M are the maxima of the performance ratios). The performance profiles based on iterations, function and gradient evaluations, and CPU-time of the three methods are plotted in Figures 1, 2, and 3, respectively.

Performance profiles based on iterations.

Performance profiles based on function and gradient evaluations.

Performance profiles based on CPU time.

From Figure 1, which plots the performance profile based on iterations, when τ = 1 , the DL method performs better than WYL and WYLDL methods. With the increasing of τ , when τ 2.2 , the profiles of WYLDL and WYL methods outperform DL method. This means that, from the iteration point of view, for a subset of problems, DL method is better than WYL and WYLDL methods. But, for all the testing problems, WYLDL method is more robust than DL method.

From Figure 2, which plots the performance profile based on function and gradient evaluations, it can be seen that for τ < 2 , DL method performs better than WYL and WYLDL methods. Comparing with Figure 1, the difference of these methods is much less than the iterations’ profile. One of the possible reason is as follows: for WYLDL and WYL methods, the average times of function and gradient evaluations required during the iterations are less than DL method. From this point of view, the CPU time consumed by WYLDL or WYL methods should be less than DL method, since the CPU time is mainly dependent on function and gradient evaluations. Figure 3 validates this phenomenon. From Figures 1 to 3, it is easy to see that the performances of WYL method and WYLDL method are quite similar. The possible reason I thank is that the second part of β k WYLDL : - t ( g k T s k - 1 / d k - 1 T y k - 1 ) , t = 0.1 is very small compared with β k WYL . One of the reasons may be relevant to the Wolfe line search. Since the line search used in this paper is based on Lemarechal , Fletcher , or More and Thuente's  strategy, this may make the directional derivative | g k T d k - 1 | very small.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This research was supported by the Guangxi Universities Foundation Grant no. 2013BYB210.

Hestenes M. R. Stiefel E. Methods of conjugate gradients for solving linear systems Journal of Research of the National Bureau of Standards 1952 49 409 436 MR0060307 ZBL0048.09901 Fletcher R. Reeves C. M. Function minimization by conjugate gradients The Computer Journal 1964 7 149 154 MR0187375 10.1093/comjnl/7.2.149 ZBL0132.11701 Polak E. Ribière G. Note sur la convergence de méthodes de directions conjuguées Revue Française d'Informatique et de Recherche Opérationnelle 1969 3 16 35 43 MR0255025 ZBL0174.48001 Dai Y. Han J. Liu G. Sun D. Yin H. Yuan Y.-X. Convergence properties of nonlinear conjugate gradient methods SIAM Journal on Optimization 2000 10 2 345 358 10.1137/S1052623494268443 MR1740949 Hager W. W. Zhang H. C. A new conjugate gradient method with guaranteed descent and an efficient line search SIAM Journal on Optimization 2005 16 1 170 192 10.1137/030601880 Zoutendijk G. Jabadie Nonlinear programming, computational methods Integer and Nonlinear Programming 1970 Amsterdam, The Netherlands North-Holland 37 86 MR0437081 ZBL0336.90057 Powell M. J. D. Restart procedures for the conjugate gradient method Mathematical Programming 1977 12 2 241 254 MR0478622 10.1007/BF01593790 ZBL0396.90072 Powell M. J. D. Nonconvex minimization calculations and the conjugate gradient method Numerical analysis 1984 1066 Berlin, Germany Springer 122 141 Lecture Notes in Mathematics 10.1007/BFb0099521 MR760460 ZBL0531.65035 Dai Y. H. Yuan Y. Nonlinear Conjugate Gradient Method 2000 Shanghai science and Technology Press Al-Baali M. Descent property and global convergence of the Fletcher-Reeves method with inexact line search IMA Journal of Numerical Analysis 1985 5 1 121 124 10.1093/imanum/5.1.121 MR777963 ZBL0578.65063 Liu G. H. Han J. Y. Yin H. X. Global convergence of the Fletcher-Reeves algorithm with inexact linesearch Applied Mathematics 1995 10 1 75 82 10.1007/BF02663897 MR1335968 ZBL0834.90122 Dai Y. H. Yuan Y. Convergence properties of the Fletcher-Reeves method IMA Journal of Numerical Analysis 1996 16 2 155 164 10.1093/imanum/16.2.155 MR1382713 ZBL0851.65049 Gilbert J. C. Nocedal J. Global convergence properties of conjugate gradient methods for optimization SIAM Journal on Optimization 1992 2 1 21 42 10.1137/0802003 MR1147881 ZBL0767.90082 Li G. Tang C. Wei Z. New conjugacy condition and related new conjugate gradient methods for unconstrained optimization Journal of Computational and Applied Mathematics 2007 202 2 523 539 10.1016/j.cam.2006.03.005 MR2319974 ZBL1116.65069 Wei Z. X. Li G. Y. Qi L. Q. Global convergence of the Polak-Ribière-Polyak conjugate gradient method with an Armijo-type inexact line search for nonconvex unconstrained optimization problems Mathematics of Computation 2008 77 264 2173 2193 10.1090/S0025-5718-08-02031-0 MR2429880 ZBL1198.65091 Yu G. Guan L. Li G. Global convergence of modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property Journal of Industrial and Management Optimization 2008 4 3 565 579 10.3934/jimo.2008.4.565 MR2417521 ZBL1168.65030 Grippo L. Lucidi S. A globally convergent version of the Polak-Ribière conjugate gradient method Mathematical Programming 1997 78 3 375 391 10.1016/S0025-5610(97)00002-6 MR1466138 ZBL0887.90157 Wei Z. Yao S. Liu L. The convergence properties of some new conjugate gradient methods Applied Mathematics and Computation 2006 183 2 1341 1350 10.1016/j.amc.2006.05.150 MR2294093 ZBL1116.65073 Huang H. Wei Z. Yao S. The proof of the sufficient descent condition of the Wei-Yao-Liu conjugate gradient method under the strong Wolfe-Powell line search Applied Mathematics and Computation 2007 189 2 1241 1245 10.1016/j.amc.2006.12.006 MR2331795 ZBL1131.65049 Shengwei Y. Wei Z. Huang H. A note about WYL's conjugate gradient method and its applications Applied Mathematics and Computation 2007 191 2 381 388 10.1016/j.amc.2007.02.094 MR2385539 ZBL1193.90213 Huang H. Yao S. Lin H. A new conjugate gradient method based on HS-DY methods Journal of Guangxi University of Technology 2008 4 63 66 Zhang L. An improved Wei-Yao-Liu nonlinear conjugate gradient method for optimization computation Applied Mathematics and Computation 2009 215 6 2269 2274 10.1016/j.amc.2009.08.016 MR2557113 ZBL1181.65089 Zhang L. Jian S. Further studies on the Wei-Yao-Liu nonlinear conjugate gradient method Applied Mathematics and Computation 2013 219 14 7616 7621 10.1016/j.amc.2013.01.048 MR3032601 Dai Z. Wen F. Another improved Wei-Yao-Liu nonlinear conjugate gradient method with sufficient descent property Applied Mathematics and Computation 2012 218 14 7421 7430 10.1016/j.amc.2011.12.091 MR2892710 ZBL1254.65074 Dai Y.-H. Liao L.-Z. New conjugacy conditions and related nonlinear conjugate gradient methods Applied Mathematics and Optimization 2001 43 1 87 101 10.1007/s002450010019 MR1804396 ZBL0973.65050 Moré J. J. Garbow B. S. Hillstrom K. E. Testing unconstrained optimization software ACM Transactions on Mathematical Software 1981 7 1 17 41 10.1145/355934.355936 MR607350 ZBL0454.65049 Bongartz I. Conn A. R. Gould N. Toint P. L. CUTE: constrained and unconstrained testing environment ACM Transactions on Mathematical Software 1995 21 1 123 160 Dolan E. D. Moré J. J. Benchmarking optimization software with performance profiles Mathematical Programming 2002 91 2 201 213 10.1007/s101070100263 MR1875515 ZBL1049.90004 Lemarechal C. Auslander A. Oettli W. Stoer J. A view of line-searches Optimization and Optimal Control 1981 30 Berlin, Germany Springer 59 78 Lecture Notes in Control and Information Science MR618474 ZBL0458.65054 Fletcher R. Practical Methods of Optimization. Vol. 1: Unconstrained Optimization 1989 New York, NY, USA John Wiley and Sons MR585160 Moré J. Thuente D. J. line search algorithms with guaranteed sufficient decrease ACM Transactions on Mathematical Software 1994 20 286 307 10.1145/192115. 192132