1. Introduction

AAA

Abstract and Applied Analysis

1687-0409 1085-3375

Hindawi Publishing Corporation

279891

10.1155/2014/279891

279891

Research Article

A Hybrid of DL and WYL Nonlinear Conjugate Gradient Methods

http://orcid.org/0000-0003-2369-6963

Yao

Shengwei

^{1, 2} Qin

Bin

¹ Atangana

Abdon

School of Information and Statistics

Guangxi University of Finance and Economics

Nanning 530003

China

gxufe.cn

School of Science

East China University of Science and Technology

Shanghai 200237

China

ecust.edu.cn

2014

2532014

2014 29 11 2013 27 01 2014 25 3 2014

2014

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The conjugate gradient method is an efficient method for solving large-scale nonlinear optimization problems. In this paper, we propose a nonlinear conjugate gradient method which can be considered as a hybrid of DL and WYL conjugate gradient methods. The given method possesses the sufficient descent condition under the Wolfe-Powell line search and is globally convergent for general functions. Our numerical results show that the proposed method is very robust and efficient for the test problems.

1. Introduction

The nonlinear conjugate gradient (CG) method has played a special role in solving large-scale nonlinear optimization due to the simplicity of their iterations and their very low memory requirements. In fact, the CG method is not among the fastest or most robust optimization algorithms for nonlinear problems available today, but it remains very popular for engineers and mathematicians who are interested in solving large-scale problems. As we know, the nonlinear conjugate gradient method is the extended based on linear conjugate gradient method. The first linear conjugate gradient method is proposed by Hestenes and Stiefel 1952 [1] for solving linear equations. In 1964, Fletcher and Reeves [2] extended it to nonlinear problems and get the first nonlinear conjugate gradient method (FR method).

In this paper, we focus on solving the following nonlinear unconstrained optimization problem by conjugate gradient method: (1) min ⁡ ⁡ { f ( x ) : x ∈ R n } , where f ( x ) : R n → R is a smooth, nonlinear function whose gradient will be denoted by g ( x ) . The iterative formula of the conjugate gradient method is given by (2) x k = x k - 1 + α k - 1 d k - 1 , where, d k - 1 is the search direction at x k - 1 and α k - 1 is the step-length. For nonlinear conjugate gradient method, d k is computed by (3) d k = - g k + β k d k - 1 , d 1 = - g 1 , where β k is a scalar and g k = ∇ f ( x k ) denotes the gradient. Different conjugate gradient methods correspond to the different ways to compute β k . Some well-known formulae for β k are the Fletcher-Reeves (FR), Polak-Ribière (PR), Hestense-Stiefel (HS), Dai-Yuan (DY), and CG-DESCENT, which are given, respectively, by (4) β k FR = ∥ g k ∥ 2 ∥ g k - 1 ∥ 2 (see [2]), (5) β k PR = g k T y k - 1 ∥ g k - 1 ∥ 2 (see [3]), (6) β k HS = g k T y k - 1 d k - 1 T y k - 1 (see [1]), (7) β k DY = ∥ g k ∥ 2 d k - 1 T y k - 1 (see [4]), (8) β k N = ( y k - 1 - 2 d k - 1 ∥ y k - 1 ∥ 2 d k - 1 T y k - 1 ) T g k d k - 1 T y k - 1 (see [5]), where y k - 1 = g k - g k - 1 denotes the gradient change.

Although all these methods are equivalent in the linear case, namely, when f is a strictly convex quadratic function and α k are determined by exact line search, their behaviors for general objective functions may be far different. For general functions, Zoutendijk [6] proved the global convergence of FR method with exact line search. (Here and throughout this paper, for global convergence, we mean the sequence generated by the corresponding methods will either terminate after finite steps or contain a subsequence such that it converges to a stationary point of the objective function from a given initial point.) Although one would be satisfied with its global convergence properties, the FR method performs much worse than the PR (HS) method in real computations. Powell [7] analyzed a major numerical drawback of the FR method, namely, if a small step is generated away from the solution point, the subsequent steps may be also very short. On the other hand, in practical computation, the HS method resembles the PR method, and both methods are generally believed to be the most efficient conjugate gradient methods since these two methods essentially perform a restart if a bad direction occurs. However, Powell [8] constructed a counterexample and showed that the PR method and HS method can cycle infinitely without approaching the solution. This example suggests that these two methods have a drawback that they are not globally convergent for general functions. Therefore, over the past few years, much effort has been put to find out new formulas for conjugate methods such that they are not only globally convergent for general functions but also have good numerical performance. The similar counterexamples are also constructed by Dai and Yuan [9].

From the structure of the above formulae β k , we know that β k FR and β k DY have the common numerator ∥ g k ∥ 2 . They are globally convergent if the objective function Lipschitz continuous and the level set { x ∈ R n : f ( x ) < f ( x 0 ) } is bounded. For inexact line search, Al-Baali [10] proved the global convergence of FR method under the strong Wolfe-Powell line search with the restriction σ < 1 / 2 . Based on Al-Baali's result, Liu et al. [11] extended the global convergence of FR method to the case σ = 1 / 2 . Dai and Yuan [12] proved that the sufficient descent condition must hold for one of the directions d k + 1 and d k , and proposed the global convergence of FR method with general Wolfe line searches.

β k PR and β k HS share the common numerator g k T y k - 1 , they possess a built-in restart feature to avoid the jamming problem as follows: when the step x k - x k - 1 is small, the factor y k - 1 in the numerator of β k tends to be zero. Hence, the next search direction d k is essentially the steepest descent direction - g k . So, the numerical performance of these methods is better than the performance of the methods with ∥ g k ∥ 2 in numerator of β k . In [3] Polak and Ribière proved that if the objective function is strongly convex and line search is exact, the PR method is globally convergent. For general functions, Powell [7, 8] analyzed the convergence properties of PR method and constructed an example which shows that the PR method may cycle infinitely between nonstationary points. To get the global convergence, Gilbert and Nocedal [13] made the following nonnegative restriction on β k : (9) β k PR + = max ⁡ ⁡ { β k PR , 0 } .

Generally speaking, methods with numerator g k T y k - 1 possess better convergence than the methods with numerator ∥ g k ∥ 2 . But from numerical performance point of view, methods with numerator ∥ g k ∥ 2 outperform the methods with the numerator g k T y k - 1 . So, a lot of effort has been made to find the method which has nice convergence properties and efficient numerical performance in the past decades. In [14], authors proposed a new conjugacy condition which made use of not only gradient values but also function values. Based on the given conjugacy condition, a class of nonlinear conjugate gradient methods is proposed. The PR method outperforms a lot of methods in numerical experiments, but it does not possess the sufficient descent condition. So, some modified forms of PR method have been studied in [15, 16], the given methods possess the sufficient descent condition and are globally convergent for general functions.

2. Motivations and the New Formula

Since the PR method is considered as one of the most efficient nonlinear conjugate gradient methods, a lot of effort has been made on its convergence properties and its modifications. In [13], with the sufficient descent assumption, Gilbert and Nocedal proved the global convergence of PR+ method under the Wolfe line search. Grippo and Lucidi [17] constructed an Armijo-type line search and proved that under this line search, directions d k generated by PR method satisfy the sufficient descent condition.

Recently, Wei et al. [18] and Huang et al. [19] gave a modified formula for computing β k as follows: (10) β k WYL = g k T y k - 1 ~ ∥ g k - 1 ∥ 2 , where y k - 1 ~ = g k - ( ∥ g k ∥ / ∥ g k - 1 ∥ ) g k - 1 .

The method with formula β k WYL not only has nice numerical results but also possess the sufficient descent condition and global convergence properties under the strong Wolfe-Powell line search. From the structure of β k WYL , we know that, the method with β k WYL can also avoid jamming such that when the step x k - x k - 1 is small, ∥ g k ∥ / ∥ g k - 1 ∥ tends to be 1 and the next search direction tends to be the steepest descent direction which is similar to PR method. But WYL method has some advantages, such as under strong Wolfe-Powell line search, β k WYL ≥ 0 , and if the parameter σ ≤ 1 / 4 in SWP, WYL method possesses the sufficient descent condition which deduces the global convergence of the WYL method.

In [20, 21], Shengwei et al. and Huang et al. extended such modification to HS method as follows: (11) β k MHS = g k T y k - 1 * d k - 1 T y k - 1 , y k - 1 * = g k - ∥ g k ∥ ∥ g k - 1 ∥ g k - 1 . The above formulae β k WYL and β k MHS can be considered as the modification form of β k PR and β k HS by using y k - 1 * to replace y k - 1 , respectively. In [20, 21], the corresponding methods are proved to be globally convergent for general functions under the strong Wolfe-Powell line search and Grippo-Lucidi line search. Based on the same approach, some authors extended other discussions and modifications in [22–24]. In fact, y k - 1 * is not our point at the beginning, our purpose is involving the information of the angle between g k and g k - 1 . From this point of view, β k WYL has the following form: (12) β k WYL = β k FR ( 1 - cos ⁡ ⁡ ( θ k - ) ) , where θ k - is the angle between g k and g k - 1 . By multiplying β k FR with 1 - cos ⁡ ⁡ θ k - , the method not only has similar convergence properties with FR method, but also avoid jamming which is similar to PR method.

Recently, Dai and Liao [25] proposed a new conjugacy condition which is based on the Quasi-Newton techniques. According to the new conjugacy condition, the following formula β k DL is given: (13) β k DL 1 = g k T ( y k - 1 - t s k - 1 ) d k - 1 T y k - 1 , where t ≥ 0 , for simplicity, we call the method with (13) as DL1 method. It is obviously that (14) β k DL 1 = β k HS - t g k T s k - 1 d k - 1 T y k - 1 . In [25], for the method with β k DL 1 , if the line search is exact, DL1 method has the same convergence properties with PR method, which indicates that DL1 method does not converge for general functions. To get the global convergence, Dai and Liao replace (13) by (15) β k DL = max ⁡ ⁡ { β k HS , 0 } - t g k T s k - 1 d k - 1 T y k - 1 . The formula (15) can be considered as a modified form of β k HS , by adding the part t ( g k T s k - 1 / d k - 1 T y k - 1 ) which may contain some information of Hessian ∇ 2 f ( x ) [25]. From the convergence analysis in [25], the nonnegative restriction max ⁡ ⁡ { β k HS , 0 } and the sufficient descent condition are significant for the global convergence results.

Motivated by the above discussion, in this paper, we give the following formula to compute the parameter β k : (16) β k * = β k WYL - t g k T s k - 1 d k - 1 T y k - 1 .

The formula β k * can be considered a modification of β k WYL , namely, by adding t ( g k T s k - 1 / d k - 1 T y k - 1 ) , the β k * may contain some Hessian information [25]. It also can be considered as a modified form of β k DL by substituting max ⁡ ⁡ { β k HS , 0 } with β k WYL . We call the method with (2), (3), and (16) as WYLDL method and give the corresponding algorithm as follows.

Algorithm 1 (WYLDL).

1 55555555

Step 1. Given x 1 ∈ R n , ε ≥ 0 , set d 1 = - g 1 , k = 1 , if ∥ g 1 ∥ ≤ ε , then stop;

Step 2. Compute α k by the Strong Wolfe-Powell line search;

Step 3. Let x k = x k - 1 + α k - 1 d k - 1 , g k = g ( x k ) , if ∥ g k ∥ ≤ ε , then stop;

Step 4. Compute β k by (16) and generate d k by (3);

Step 5. Set k ∶ = k + 1 , go to Step 2.

The convergence properties of Algorithm 1 will be discussed in Section 3.

3. Convergence Analysis

For conjugate gradient methods, during the iteration process, the gradient of the objective function is required. We make the following basic assumptions on the objective functions.

Assumption 2.

(i) The level set Γ = { x ∈ R n : f ( x ) ≤ f ( x 1 ) } is bounded, namely, there exists a constant B > 0 such that (17) ∥ x ∥ ≤ B , ∀ x ∈ Γ .

(ii) In some neighborhood N of Γ , f is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant L > 0 such that (18) ∥ g ( x ) - g ( y ) ∥ ≤ L ∥ x - y ∥ , ∀ x , y ∈ N .

Under the above assumptions of f , there exists a constant γ - ≥ 0 such that (19) ∥ ∇ f ( x ) ∥ ≤ γ - , ∀ x ∈ Γ . Exact Line Search. Suppose that d k is a descent direction and step length α k is the solution of (20) min ⁡ α k > 0 ⁡ f ( x k + α d k ) . For exact line search, (21) g k T d k - 1 = 0 . Form (16) and (21), we can get that β k * = β k WYL . In [18], author proved that if the line search is exact, the method with β k WYL is globally convergent for the uniformly convex functions.

For conjugate gradient methods, the sufficient descent condition is significant to the global convergence. We say the sufficient descent condition holds if there exists a constant c > 0 such that (22) g k T d k ≤ - c ∥ g k ∥ 2 , ∀ k . In nonlinear optimization algorithm, the Strong Wolfe-Powell conditions, namely, (23) f ( x k + α k d k ) - f ( x k ) ≤ δ α k g k T d k , (24) | g ( x k + α k d k ) T d k | ≤ σ | g k T d k | , where 0 < δ < σ < 1 , are often imposed on the line search.

In [25], Dai and Liao proved that, if directions d k satisfy the sufficient descent condition (22), the DL method is globally convergent under the strong Wolfe-Powell line search for general functions. In this section, we will prove that the directions d k generated by Algorithm 1 satisfy the sufficient descent condition (22). Based on this result, the global convergence of Algorithm 1 will be established.

Lemma 3.

Suppose that the sequence { x k } is generated by Algorithm 1, the step-length α k satisfy the strong Wolfe-Powell conditions (23) and (24), if 0 < σ < 1 / 4 ; then, the generated directions d k satisfy the sufficient descent condition (22).

Proof.

We prove this result by induction. By using (3), we have g k T d k = - ∥ g k ∥ 2 + β k WYLDL g k T d k - 1 , and combining this equation with ( β k WYLDL ), we can deduce that (25) g k T d k ∥ g k ∥ 2 = - 1 + g k T d k - 1 ∥ g k - 1 ∥ 2 ( 1 - cos ⁡ ⁡ ( θ k - ) ) - t α k ( g k T d k - 1 ) 2 ∥ g k ∥ 2 d k - 1 T y k - 1 . By strong Wolfe-Powell condition (24), it follows that d k - 1 T y k - 1 ≥ ( σ - 1 ) g k - 1 T d k - 1 > 0 . Which means that (26) t α k ( g k T d k - 1 ) 2 ∥ g k ∥ 2 d k - 1 T y k - 1 > 0 . From (25), the following inequality holds: (27) g k T d k ∥ g k ∥ 2 ≤ - 1 + g k T d k - 1 ∥ g k - 1 ∥ 2 ( 1 - cos ⁡ ⁡ ( θ k - ) ) . (25) and Wolfe-Powell condition (24) deduce that (28) g k T d k ∥ g k ∥ 2 ≤ - 1 + - σ g k - 1 T d k - 1 ∥ g k - 1 ∥ 2 ( 1 - cos ⁡ ⁡ ( θ k - ) ) , namely, (29) g k T d k ∥ g k ∥ 2 ≤ - 1 + - 2 σ g k - 1 T d k - 1 ∥ g k - 1 ∥ 2 . The repeating of (29) can deduce that (30) g k T d k ∥ g k ∥ 2 ≤ - 2 + ∑ j = 0 k - 1 ( 2 σ ) j . Since (31) ∑ j = 0 k - 1 ( 2 σ ) j ‍ < ∑ j = 0 ∞ ( 2 σ ) j ‍ = 1 1 - 2 σ , (30) can be expressed as (32) g k T d k ∥ g k ∥ 2 ≤ - 2 + 1 1 - 2 σ . With the restrictions σ ∈ ( 0 , 1 / 4 ) and g 1 T d 1 = - ∥ g 1 ∥ 2 , for c = ( 1 - 4 σ ) / ( 1 - 2 σ ) , the inequality (32) means that (33) g k T d k ≤ - c ∥ g k ∥ 2 .

For PR method, when a small step-length occurs, β k PR will tend to be zero, and the next search direction d k automatically approaches to - g k . By such way, the PR method automatically avoids jamming. This property was first studied by Gilbert and Nocedal [13], which is called Property ( * ). We are going to show that the method with β k WYLDL possesses such property ( * ).

Property 1.

( * ) Consider a method of form (2) and (3). Suppose that (34) 0 < γ ≤ ∥ g k ∥ ≤ γ - ∀ k ≥ 1 . We say that the method has Property (*), if for all k , there exist constants b > 1 , λ > 0 such that | β k | ≤ b , and if ∥ s k - 1 ∥ ≤ λ , we have | β k | ≤ 1 / 2 b .

Lemma 4.

Consider a method of form (2) and (3). If β k is determined by β k WYLDL , α k satisfies the Wolfe-Powell condition (24), then the method possesses Property 1( * ).

Proof.

By Lemma 3, we know that the sufficient descent condition (22) holds. Combining with Wolfe-Powell condition (24), we have (35) d k - 1 T y k - 1 ≥ ( σ - 1 ) g k - 1 T d k - 1 ≥ ( 1 - σ ) c γ 2 . It follows from (17), (35), and β k WYLDL that (36) | β k WYLDL | ≤ ∥ g k ∥ ( ∥ g k - g k - 1 ∥ + ∥ g k - 1 - ( ∥ g k ∥ / ∥ g k - 1 ∥ ) g k - 1 ∥ ) ∥ g k - 1 ∥ 2 + t ∥ g k ∥ ∥ s k - 1 ∥ | d k - 1 T y k - 1 | ≤ ∥ g k ∥ ( 2 ∥ g k - g k - 1 ∥ ) ∥ g k - 1 ∥ 2 + t ∥ g k ∥ ∥ s k - 1 ∥ c ( 1 - σ ) ∥ g k - 1 ∥ 2 ≤ ∥ g k ∥ ( 2 L ∥ s k - 1 ∥ ) ∥ g k - 1 ∥ 2 + t ∥ g k ∥ ∥ s k - 1 ∥ c ( 1 - σ ) ∥ g k - 1 ∥ 2 ≤ 2 B γ - ( c ( 1 - σ ) 2 L + t ) γ 2 c ( 1 - σ ) . Set (37) b ∶ = 2 B γ - ( c ( 1 - σ ) 2 L + t ) γ 2 c ( 1 - σ ) , which means that | β k WYLDL | < b . By setting (38) λ ∶ = γ 2 c ( 1 - σ ) 2 b γ - ( c ( 1 - σ ) 2 L + t ) , we have (39) | β k WYLDL | < ∥ s k - 1 ∥ γ - ( c ( 1 - σ ) 2 L + t ) γ 2 c ( 1 - σ ) , < 1 2 b .

For nonlinear conjugate gradient methods, Dai et al. [4] proposed the following general conclusion.

Lemma 5.

Suppose that Assumption 2 holds. Consider any conjugate gradient method, where d k is a descent direction and α k is obtained by the strong Wolfe-Powell line search. if (40) ∑ 1 ∥ d k ∥ 2 ‍ = ∞ , we have (41) lim ⁡ inf ⁡ k → ∞ ⁡ ∥ g k ∥ = 0 .

By Lemma 3, we know that the method with β k WYLDL possesses the sufficient descent condition under the Wolfe-Powell line search. Combining with Lemma 5, we can have the following theorem.

Theorem 6.

Suppose that Assumption 2 holds. Consider WYLDL method, where α k is obtained by strong Wolfe-Powell lien search with σ < 1 / 4 . If there exists a constant γ > 0 such that (42) ∥ g k ∥ ≥ γ , ∀ k ≥ 1 , then d k ≠ 0 and (43) ∑ k ≥ 2 ∥ u k - u k - 1 ∥ 2 < ∞ , where u k = d k / ∥ d k ∥ .

Proof.

Firstly, note that d k ≠ 0 ; otherwise, the sufficient descent condition (22) fails. Therefore, u k is well defined. In addition, by relation (42) and Lemma 5 we have (44) ∑ k ≥ 1 1 ∥ d k ∥ 2 < ∞ . Now, we divide formula β k WYLDL into two parts as follows: (45) β k 1 = β k WYL , β k 2 = - t g k T s k - 1 d k - 1 T y k - 1 , and define (46) r k ∶ = ϑ k ∥ d k ∥ , δ k ∶ = β k 1 ∥ d k - 1 ∥ ∥ d k ∥ , where ϑ k = - g k + β k 2 d k - 1 .

Then by (3), we have, for all k ≥ 2 , (47) u k = r k + δ k u k - 1 . Using the identity ∥ u k ∥ = ∥ u k - 1 ∥ = 1 and (47), we can obtain (48) ∥ r k ∥ = ∥ u k - δ k u k - 1 ∥ = ∥ δ k u k - u k - 1 ∥ . Using the condition δ k = β k WYL ( ∥ d k - 1 ∥ / ∥ d k ∥ ) = ( ∥ g k ∥ 2 / ∥ g k - 1 ∥ 2 ) ( 1 - cos ⁡ ⁡ ( θ k - ) ) ( ∥ d k - 1 ∥ / ∥ d k ∥ ) ≥ 0 , the triangle inequality, and (48), we obtain (49) ∥ u k - u k - 1 ∥ ≤ ∥ ( 1 + δ ) u k - ( 1 + δ ) u k - 1 ∥ ≤ ∥ u k - δ k u k - 1 ∥ + ∥ δ k u k - u k - 1 ∥ = 2 ∥ r k ∥ . On the other hand, line search condition (24) gives (50) y k - 1 T d k - 1 ≥ ( σ - 1 ) g k - 1 T d k - 1 . Equations (22), (24), and (50) imply that (51) | g k T d k - 1 d k - 1 T y k - 1 | ≤ σ 1 - σ . It follows from the definition of ϑ k , (17), (36), and (51) that (52) ∥ ϑ k ∥ ≤ ∥ g k ∥ + t | g k T s k - 1 d k - 1 T y k - 1 | ∥ d k - 1 ∥ = ∥ g k ∥ + t | g k T d k - 1 d k - 1 T y k - 1 | ∥ s k - 1 ∥ ≤ γ - + t σ 1 - σ 2 B . So we have (53) ∑ ∥ u k - u k - 1 ∥ 2 ≤ 4 ∑ ‍ ∥ r k ∥ 2 ≤ 4 ∑ ‍ ϑ k 2 ∥ d k ∥ 2 ≤ 4 ( γ - + t σ 1 - σ 2 B ) 2 ∑ ‍ 1 ∥ d k ∥ 2 < ∞ . Let N * denote the set of positive integers. For λ > 0 and a positive integer Δ , denote (54) K k , Δ λ ∶ = { i ∈ N * : k ≤ i ≤ k + Δ - 1 , ∥ s k - 1 ∥ > λ } . Let | K k , Δ λ | denote the number of elements in K k , Δ λ . Dai and Liao [25] pointed out that for conjugate gradient method which satisfies (i)

Property 1(*);

(ii)

the sufficient descent condition;

(iii)

Theorem 6;

if (42) holds, then the small step-sizes should not be too many. This property is described as follows.

Lemma 7.

Suppose that Assumption 2 holds. Consider WYLDL method, where α k is obtained by the strong Wolfe-Powell line search in which σ < 1 / 4 . Then if (42) holds, there exists λ > 0 such that, for any Δ ∈ N * and any index k 0 , there is an index k ≥ k 0 such that (55) | K k , Δ λ | > Δ 2 .

Proof.

It follows from Lemmas 3 and 4 and Theorem 6 that WYLDL method possesses the above three conditions in [25]. So, according to Lemma 3.5 in [25], the Lemma 7 holds. We omit the detailed proof of this Lemma 7.

According to the above lemmas and theorems, we can prove the following convergence theorem for WYLDL method.

Theorem 8.

Suppose that Assumption 2 holds. Consider WYLDL method, if α k is obtained by strong Wolfe-Powell line search with σ < 1 / 4 , then we have (56) lim ⁡ inf ⁡ k → ∞ ⁡ ∥ g k ∥ = 0 .

Proof.

We prove this theorem by contradiction. If liminf ⁡ k → ∞ ∥ g k ∥ > 0 , then (42) must hold. Then the conditions of Theorem 6 and Lemma 7 hold. Defining u i = d i / ∥ d i ∥ , we have, for any indices l , k , with l ≥ k , (57) x l - x k - 1 = ∑ i = k l x i - x i - 1 ‍ = ∑ i = k l α i - 1 d i - 1 = ∑ i = k l u i - 1 ∥ s i - 1 ∥ ‍ = ∑ i = k l ∥ s i - 1 ∥ u k - 1 + ∑ i = k l ∥ s i - 1 ∥ ( u i - 1 - u k - 1 ) ‍ . (57), ∥ u i ∥ = 1 , and (17) give the following: (58) ∑ i = k l ∥ s i - 1 ∥ ≤ ∥ x l - x k - 1 ∥ + ∑ i = k l ∥ s i - 1 ∥ ∥ u i - 1 - u k - 1 ∥ ≤ 2 B + ∑ i = k l ∥ s i - 1 ∥ ∥ u i - 1 - u k - 1 ∥ . Let λ > 0 be given by Lemma 7 and define Δ ∶ = ⌈ 8 B / λ ⌉ to be the smallest integer not less than 8 B / λ . By Theorem 6, we can find an index k 0 ≥ 1 such that (59) ∑ i ≥ k 0 ∥ u i - 1 - u k - 1 ∥ 2 ≤ 1 4 Δ . With this Δ and k 0 , Lemma 7 gives an index k ≥ k 0 such that (60) | K k , Δ λ | > Δ 2 . For any index i ∈ [ k , k + Δ - 1 ] , by Cauchy-Schwartz inequality and (59), (61) ∥ u i - u k - 1 ∥ ≤ ∑ j = k i ∥ u j - u j - 1 ∥ ≤ ( i - k + 1 ) 1 / 2 ( ∑ j = k i ∥ u j - u j - 1 ∥ 2 ‍ ) 1 / 2 ≤ Δ 1 / 2 ( 1 4 Δ ) 1 / 2 = 1 2 . From these relations (61) and (60) and taking l = k + Δ - 1 in (58), we get (62) 2 B ≥ 1 2 ∑ i = k k + Δ - 1 ∥ s i - 1 ∥ ‍ > λ 2 | K k , Δ λ | > λ Δ 4 . Thus Δ < 8 B / λ , which contradicts the definition of Δ . The proof is completed.

4. Numerical Experiments

In this section, we report the performance of the Algorithm 1 (WYLDL) on a set of test problems. The codes were written in Fortran 77 and in double precision arithmetic. All the tests were performed on the same PC. The experiments were performed on a set of 73 nonlinear unconstrained problems in [26]. Some of the problems are from CUTE [27] library. For each test problem, we have performed 10 numerical experiments with number of variables n = 1000,2000 , … , 10000 .

In order to assess the reliability of the WYLDL algorithm, we also tested this method against the DL method and WYL method using the same problems. All these algorithms are terminated when ∥ g k ∥ ≤ 1 0 - 5 . We also force stopped the routines if the iterations exceeded 1000 or the number of function evaluations reached 2000. In the Wolfe-Powell line search conditions (23) and (24), the parameters are δ = 1 0 - 4 , σ = 1 0 - 1 . For DL method, t = 0.1 , which is the same with [25]. We also test WYLDL algorithm with t = 0.1 which is the best choice.

The comparing data contain the iterations, function and gradient evaluations, and CPU time. To approximatively assess the performance of WYLDL, WYL and DL methods, we use the profile of Dolan and Moré [28] as an evaluated tool.

Dolan and Moré [28] gave a new tool to analyze the efficiency of algorithms. They introduced the notion of a performance profile as a means to evaluate and compare the performance of the set of solvers S on a test set P . Assuming that there exist n s solvers and n p problems, for each problem p and solver s , they defined that

t p , s = computing cost (iterations or function and gradient evaluations or CPU time) is required to solve problem p by solver s .

Requiring a baseline for comparisons, they compared the performance on problem p by solver s with the best performance by any solver on this problem; that is, using the performance ratio as follows: (63) r p , s = t p , s min ⁡ ⁡ { t p , s : s ∈ S } .

Suppose that a parameter M ≥ r p , s for all p , s . Set r p , s = M if and only if solver s does not solve problem p . Then they defined (64) ρ s ( τ ) = 1 n p size ⁡ { p ∈ P : r p , s ≤ τ } , thus ρ s ( τ ) is the probability for solver s that a performance ratio r p , s is within a factor τ ≥ 1 of the best possible ratio. Then function ρ s is the distribution function for the performance ratio. The performance profile ρ s is a nondecreasing, piecewise constant function. That is, for subset of the methods being analyzed, we plot the fraction P of the problems for which any given method is within a factor τ of the best.

For the testing problems, if all three methods cannot terminate successfully, then we got rid of them. In case one method fails, but there are other methods that terminate successfully, then the performance ratio of the failed method is set to be M ( M are the maxima of the performance ratios). The performance profiles based on iterations, function and gradient evaluations, and CPU-time of the three methods are plotted in Figures 1, 2, and 3, respectively.

Figure 1

Performance profiles based on iterations.

Figure 2

Performance profiles based on function and gradient evaluations.

Figure 3

Performance profiles based on CPU time.

From Figure 1, which plots the performance profile based on iterations, when τ = 1 , the DL method performs better than WYL and WYLDL methods. With the increasing of τ , when τ ≥ 2.2 , the profiles of WYLDL and WYL methods outperform DL method. This means that, from the iteration point of view, for a subset of problems, DL method is better than WYL and WYLDL methods. But, for all the testing problems, WYLDL method is more robust than DL method.

From Figure 2, which plots the performance profile based on function and gradient evaluations, it can be seen that for τ < 2 , DL method performs better than WYL and WYLDL methods. Comparing with Figure 1, the difference of these methods is much less than the iterations’ profile. One of the possible reason is as follows: for WYLDL and WYL methods, the average times of function and gradient evaluations required during the iterations are less than DL method. From this point of view, the CPU time consumed by WYLDL or WYL methods should be less than DL method, since the CPU time is mainly dependent on function and gradient evaluations. Figure 3 validates this phenomenon. From Figures 1 to 3, it is easy to see that the performances of WYL method and WYLDL method are quite similar. The possible reason I thank is that the second part of β k WYLDL : - t ( g k T s k - 1 / d k - 1 T y k - 1 ) , t = 0.1 is very small compared with β k WYL . One of the reasons may be relevant to the Wolfe line search. Since the line search used in this paper is based on Lemarechal [29], Fletcher [30], or More and Thuente's [31] strategy, this may make the directional derivative | g k T d k - 1 | very small.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This research was supported by the Guangxi Universities Foundation Grant no. 2013BYB210.

Hestenes

M. R.

Stiefel

Methods of conjugate gradients for solving linear systems

Journal of Research of the National Bureau of Standards 1952 49 409 436

MR0060307

ZBL0048.09901

Fletcher

Reeves

C. M.

Function minimization by conjugate gradients

The Computer Journal 1964 7 149 154

MR0187375

10.1093/comjnl/7.2.149

ZBL0132.11701

Polak

Ribière

Note sur la convergence de méthodes de directions conjuguées

Revue Française d'Informatique et de Recherche Opérationnelle 1969 3 16 35 43

MR0255025

ZBL0174.48001

Dai

Han

Liu

Sun

Yin

Yuan

Y.-X.

Convergence properties of nonlinear conjugate gradient methods

SIAM Journal on Optimization 2000 10 2 345 358

10.1137/S1052623494268443

MR1740949

Hager

W. W.

Zhang

H. C.

A new conjugate gradient method with guaranteed descent and an efficient line search

SIAM Journal on Optimization 2005 16 1 170 192

10.1137/030601880

Zoutendijk

Jabadie

Nonlinear programming, computational methods

Integer and Nonlinear Programming 1970

Amsterdam, The Netherlands

North-Holland

37 86

MR0437081

ZBL0336.90057

Powell

M. J. D.

Restart procedures for the conjugate gradient method

Mathematical Programming 1977 12 2 241 254

MR0478622

10.1007/BF01593790

ZBL0396.90072

Powell

M. J. D.

Nonconvex minimization calculations and the conjugate gradient method

Numerical analysis 1984 1066

Berlin, Germany

Springer

122 141 Lecture Notes in Mathematics

10.1007/BFb0099521

MR760460

ZBL0531.65035

Dai

Y. H.

Yuan

Nonlinear Conjugate Gradient Method 2000

Shanghai science and Technology Press

Al-Baali

Descent property and global convergence of the Fletcher-Reeves method with inexact line search

IMA Journal of Numerical Analysis 1985 5 1 121 124

10.1093/imanum/5.1.121

MR777963

ZBL0578.65063

Liu

G. H.

Han

J. Y.

Yin

H. X.

Global convergence of the Fletcher-Reeves algorithm with inexact linesearch

Applied Mathematics 1995 10 1 75 82

10.1007/BF02663897

MR1335968

ZBL0834.90122

Dai

Y. H.

Yuan

Convergence properties of the Fletcher-Reeves method

IMA Journal of Numerical Analysis 1996 16 2 155 164

10.1093/imanum/16.2.155

MR1382713

ZBL0851.65049

Gilbert

J. C.

Nocedal

Global convergence properties of conjugate gradient methods for optimization

SIAM Journal on Optimization 1992 2 1 21 42

10.1137/0802003

MR1147881

ZBL0767.90082

Tang

Wei

New conjugacy condition and related new conjugate gradient methods for unconstrained optimization

Journal of Computational and Applied Mathematics 2007 202 2 523 539

10.1016/j.cam.2006.03.005

MR2319974

ZBL1116.65069

Wei

Z. X.

G. Y.

L. Q.

Global convergence of the Polak-Ribière-Polyak conjugate gradient method with an Armijo-type inexact line search for nonconvex unconstrained optimization problems

Mathematics of Computation 2008 77 264 2173 2193

10.1090/S0025-5718-08-02031-0

MR2429880

ZBL1198.65091

Guan

Global convergence of modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property

Journal of Industrial and Management Optimization 2008 4 3 565 579

10.3934/jimo.2008.4.565

MR2417521

ZBL1168.65030

Grippo

Lucidi

A globally convergent version of the Polak-Ribière conjugate gradient method

Mathematical Programming 1997 78 3 375 391

10.1016/S0025-5610(97)00002-6

MR1466138

ZBL0887.90157

Wei

Yao

Liu

The convergence properties of some new conjugate gradient methods

Applied Mathematics and Computation 2006 183 2 1341 1350

10.1016/j.amc.2006.05.150

MR2294093

ZBL1116.65073

Huang

Wei

Yao

The proof of the sufficient descent condition of the Wei-Yao-Liu conjugate gradient method under the strong Wolfe-Powell line search

Applied Mathematics and Computation 2007 189 2 1241 1245

10.1016/j.amc.2006.12.006

MR2331795

ZBL1131.65049

Shengwei

Wei

Huang

A note about WYL's conjugate gradient method and its applications

Applied Mathematics and Computation 2007 191 2 381 388

10.1016/j.amc.2007.02.094

MR2385539

ZBL1193.90213

Huang

Yao

Lin

A new conjugate gradient method based on HS-DY methods

Journal of Guangxi University of Technology 2008 4 63 66

Zhang

An improved Wei-Yao-Liu nonlinear conjugate gradient method for optimization computation

Applied Mathematics and Computation 2009 215 6 2269 2274

10.1016/j.amc.2009.08.016

MR2557113

ZBL1181.65089

Zhang

Jian

Further studies on the Wei-Yao-Liu nonlinear conjugate gradient method

Applied Mathematics and Computation 2013 219 14 7616 7621

10.1016/j.amc.2013.01.048

MR3032601

Dai

Wen

Another improved Wei-Yao-Liu nonlinear conjugate gradient method with sufficient descent property

Applied Mathematics and Computation 2012 218 14 7421 7430

10.1016/j.amc.2011.12.091

MR2892710

ZBL1254.65074

Dai

Y.-H.

Liao

L.-Z.

New conjugacy conditions and related nonlinear conjugate gradient methods

Applied Mathematics and Optimization 2001 43 1 87 101

10.1007/s002450010019

MR1804396

ZBL0973.65050

Moré

J. J.

Garbow

B. S.

Hillstrom

K. E.

Testing unconstrained optimization software

ACM Transactions on Mathematical Software 1981 7 1 17 41

10.1145/355934.355936

MR607350

ZBL0454.65049

Bongartz

Conn

A. R.

Gould

Toint

P. L.

CUTE: constrained and unconstrained testing environment

ACM Transactions on Mathematical Software 1995 21 1 123 160

Dolan

E. D.

Moré

J. J.

Benchmarking optimization software with performance profiles

Mathematical Programming 2002 91 2 201 213

10.1007/s101070100263

MR1875515

ZBL1049.90004

Lemarechal

Auslander

Oettli

Stoer

A view of line-searches

Optimization and Optimal Control 1981 30

Berlin, Germany

Springer

59 78 Lecture Notes in Control and Information Science

MR618474

ZBL0458.65054

Fletcher

Practical Methods of Optimization. Vol. 1: Unconstrained Optimization 1989

New York, NY, USA

John Wiley and Sons

MR585160

Moré

Thuente

D. J.

line search algorithms with guaranteed sufficient decrease

ACM Transactions on Mathematical Software 1994 20 286 307

10.1145/192115. 192132