A Conjugate Gradient Method with Global Convergence for Large-Scale Unconstrained Optimization Problems

The conjugate gradient (CG) method has played a special role in solving large-scale nonlinear optimization problems due to the simplicity of their very low memory requirements. This paper proposes a conjugate gradient method which is similar to Dai-Liao conjugate gradient method (Dai and Liao, 2001) but has stronger convergence properties.The givenmethod possesses the sufficient descent condition, and is globally convergent under strong Wolfe-Powell (SWP) line search for general function. Our numerical results show that the proposed method is very efficient for the test problems.


Introduction
The conjugate gradient (CG) method has played a special role in solving large-scale nonlinear optimization due to the simplicity of their iterations and their very low memory requirements.In fact, the CG method is not among the fastest or most robust optimization algorithms for nonlinear problems available today, but it remains very popular for engineers and mathematicians who are interested in solving large problems.The conjugate gradient method is designed to solve the following unconstrained optimization problem: where () :   →  is a smooth, nonlinear function whose gradient will be denoted by ().The iterative formula of the conjugate gradient method is given by where   is a step length which is computed by carrying out a line search, and   is the search direction defined by where   is a scalar and   denotes the gradient ∇(  ).If  is a strictly convex quadratic function, namely, where  is a positive definite matrix and if   is the exact one-dimensional minimizer along the direction   , then the method with (2) and ( 3) are called the linear conjugate gradient method.Otherwise, (2) and ( 3) is called the nonlinear conjugate gradient method.The most important feature of linear conjugate gradient method is that the search directions satisfy the following conjugacy condition: For nonlinear conjugate gradient methods, for general objective functions, (5) does not hold, since the Hessian ∇ 2 () changes at different points.
where ‖ ‖ denotes the Euclidean norm.Their corresponding conjugate methods are abbreviated as FR, PR, HS, and DY methods.Although all these method are equivalent in the linear case, namely, when  is a strictly convex quadratic function and   are determined by exact line search, their behaviors for general objective functions may be far different.
For general functions, Zoutendijk [1] proved the global convergence of FR methods with exact line search (here and throughout this paper, for global convergence, we mean that the sequence generated by the corresponding methods will either terminate after finite steps or contain a subsequence such that it converges to a stationary point of the objective function from a given initial point).Although one would be satisfied with its global convergence properties, the FR method performs much worse than the PR (HS) method in real computations.Powell [2] analyzed a major numerical drawback of the FR method; namely, if a small step is generated away from the solution point, the subsequent steps may be also very short.On the other hand, in practical computation, the HS method resembles the PR method, and both methods are generally believed to be the most efficient conjugate gradient methods since these two methods essentially perform a restart if a bad direction occurs.However, Powell [3] constructed a counterexample and showed that the PR method and HS method can cycle infinitely without approaching the solution.This example suggests that these two methods have a drawback that they are not globally convergent for general functions.Therefore, in the past two decades, much effort has been exceterd to find out new formulas for conjugate methods such that not only they are globally convergent for general functions but also they have good numerical performance.
Recently, using a new conjugacy condition, Dai and Liao [4] proposed two new methods.Interestingly, one of their methods is not only globally convergent for general functions but also performs better than HS and PR methods.In this paper, similar to Dai and Liao's approach, we propose another formula for   , analyze the convergence properties for the given method, and also carry the numerical experiment which shows that the given method is robust and efficient.
The remainder of this paper is organized as follows.In Section 2, we firstly state the corresponding formula which is proposed by Dai and Liao [4] and the motivations of this paper, and then we propose the new nonlinear conjugate gradient method.In Section 3, convergence analysis for the given method is presented.Numerical results are reported in Section 4. Finally, some conclusions are given in Section 5.

Motivations and New Nonlinear
Conjugate Gradient Method For a general nonlinear function , we know by the mean value theorem that there exists some  ∈ (0, 1) such that Therefore, it is reasonable to replace (5) with the following conjugacy condition: Recently, extension of ( 12) has been studied by Dai and Liao in [4].Their approach is based on the Quasi-Newton techniques.Recall that, in the Quasi-Newton method, an approximation matrix  −1 of the Hessian ∇ 2 ( −1 ) is updated such that the new matrix   satisfies the following Quasi-Newton equation: The search direction   in Quasi-Newton method is calculated by Combining these two equations, we obtain The previous relation implies that (12) holds if the line search is exact since in this case     −1 = 0.However, practical numerical algorithms normally adopt inexact line searches instead of exact line searches.For this reason, it seems more reasonable to replace the conjugacy condition (12) with the condition where  ≥ 0 is a scalar.
To ensure that the search direction   satisfies the conjugate condition (16), one only needs to multiply (3) with  −1 and use (16), yielding It is obvious that For simplicity, we call the method with (2), (3), and (17) as DL1 method.Dai and Liao also prove that the conjugate gradient method with DL1 is globally convergent for uniformly convex functions.For general functions, Powell [3] constructed an example showing that the PR method may cycle without approaching any solution point if the step length   is chosen to be the first local minimizer along   .
Since the DL1 method reduces to the PR method in the case that     −1 = 0 holds, this implies that the method with (17) need not converge for general functions.To get the global convergence, like Gilbert and Nocedal [5], who have proved the global convergence of the PR method with the restriction that  PR  ≥ 0, Dai and Liao replaced (17) by We also call the method with (2), (3), and (19) as DL method, Dai and Liao show that DL method is globally convergent for general functions under the sufficient descent condition (21) and some suitable conditions.Besides, some numerical experiments in [4] indicate the efficiency of this method.Similar to Dai and Liao's approach, Li et al. [6] proposed another conjugate condition and related conjugate gradient methods, and they also prove that the proposed methods are globally convergent under some assumptions.

Motivations.
From the above discussions, Dai and Liao's approach is effective; the main reason is that the search directions   generated by DL1 method or DL method not only contain the gradient information but also contain some Hessian ∇ 2 () information.From ( 18) and ( 19),  DL1  and  DL  are formed by two parts; the first part is  HS  , and the second part is . So, we also can consider DL1 and DL methods as some modified forms of the HS method by adding some information of Hessian ∇ 2 () which is contained in the second part.The convergence properties of the HS method are similar to PR method; it does not converge for general functions even if the line search is exact.In order to get the convergence, one also needs the nonnegative restriction   = max{ HS  , 0} and the sufficient descent assumption (21).From the above discussion, the descent condition or sufficient descent condition and nonnegative property of   play important roles in the convergence analysis.We say that the descent condition holds if for each search directions In addition, we say that the sufficient descent condition holds if there exists a constant  > 0 such that for each search direction   , we have Motivated by the above ideal, in this paper, we focus on finding the new conjugate gradient method which possesses the following properties: (1) nonnegative property   ≥ 0; (2) the new formula contains not only the gradient information but also some Hessian information; (3) the search directions   generated by the proposed method satisfy the sufficient descent conditions (21).

The New Conjugate Gradient Method.
From the structure of ( 6), ( 7), (8), and ( 9), the PR and HS methods have the common numerator     −1 , and the FR and DY methods have the common numerator ‖  ‖ 2 ; and this different choice makes them have different properties.Generally speaking, FR and DY methods have better convergence properties, and PR and HS methods have better numerical experiments.Powell [3] pointed out that the FR method, with exact line search, was susceptible to jamming.That is, the algorithm could take many short steps without making significant progress to the minimum.If the line search is exact, that means     −1 = 0, in this case, DY method will turn out to be FR method.So, these two methods have the same disadvantage.The PR and HS methods which share the common numerator     −1 possess a built-in restart feature to avoid the jamming problem: when the step   −  −1 is small, the factor  −1 in the numerator of   tends to zero.Hence, the next search direction   is essentially the steepest descent direction −  .So, the numerical performance of these methods is better than the performance of the methods with ‖  ‖ 2 in numerator of   .
Just as above discussions, great attentions were given to find the methods which not only have global convergent properties but also have nice numerical experiments.
Recently, Wei et al. [7] proposed a new formula The method with formula  WYL  not only has nice numerical results but also possesses the sufficient descent condition and global convergence properties under the strong Wolfe-Powell line search.From the structure of  WYL  , we know that the method with  WYL  can also avoid jamming: when the step   −  −1 is small, ‖  ‖/‖ −1 ‖ tends to 1 and the next search direction tends to the steepest descent direction which is similar to PR method.But WYL method has some advantages, such as under strong Wolfe-Powell line search,  WYL  ≥ 0, and if the parameter  ≤ 1/4 in SWP, WYL method possesses the sufficient descent condition which deduces the global convergence of the WYL method.
In [8,9], Shengwei et al. extended such modification to HS method as follows: The previous formulae  WYL  and  MHS  can be considered as the modification forms of  PR  and  HS  by using  * −1 to replace  −1 , respectively.In [8,9], the corresponding methods are proved to be globally convergent for general functions under the strong Wolfe-Powell line search and Grippo-Lucidi line search.Based on the same approach, some authors give other discussions and modifications in [10][11][12].
In fact,  * −1 is not our point at the beginning, our purpose is involving the information of the angle between   and  −1 .From this point of view,  WYL  has the following form: where   is the angle between   and  −1 .By multiplying  FR  with 1 − cos   , the method not only has similar convergence properties with FR method, but also avoids jamming which is similar to PR method.The above analysis motivates us to propose the following formula to compute   : where Since the  MHS  are nonnegative under the strong Wolfe-Powell line search, we omit the nonnegative restriction and propose the following formula: From ( 25) and ( 26), we know that we only substitute  −1 in the first part of the numerator of  DL1

𝑜𝑟 𝛽 DL
by  *  .The reason is that we hope the formulae (25) and (26) contain the angle information between   and  −1 .In fact,  MDL  can be expressed as For simplicity, we call the method generated by ( 2), (3), and (26) as MDL method and give the algorithm as follows.
Step 2. Compute   by some line searches.
We make the following basic assumptions on the objective functions.
The step length   in Algorithm 1 (MDL) is obtained by some line search scheme.In conjugate gradient methods, the strong Wolfe-Powell conditions; namely, where 0 <  <  < 1, are often imposed on the line search (SWP).

Convergence Analysis
Under Assumption A, based on the Zoutendijk condition in [1], for any conjugate gradient method with the strong Wolfe-Powell line search, Dai et al. in [13] proved the following general result.

Lemma 2. Suppose that
Proof.By SWP condition (32), we have The proof is completed.
In addition, we can also prove that, in conjugate gradient method of forms ( 2 Proof.We prove this theorem by induction.Firstly, we prove the descent condition      < 0 as follow.Since    1  1 = −‖ 1 ‖ 2 < 0, supposing that      < 0 holds for  ≤  − 1, we deduce that the descent condition holds by proving that      < 0 holds for  =  as follow.By SWP condition (32), we have 3) and (26), we have Equation ( 41) means that descent condition holds.
Secondly, we prove the following sufficient descent condition.
By Theorem 5, we can prove the following Lemma 6. Lemma 6. Suppose that Assumption A holds.Consider MDL method, where   is obtained by strong Wolfe-Powell lien search with  < 1/3.If there exists a constant  > 0 such that then   ̸ = 0 and where   =   /‖  ‖.
Proof.Firstly, note that   ̸ = 0; otherwise, ( 21) is false.Therefore,   is well defined.In addition, by relation (42) and Lemma 2, we have Now, we divide formula  MDL  into two parts as follows: and define where   = −  +  2   −1 .Then by (3) we have for all  ≥ 2, Using the identity ‖  ‖ = ‖ −1 ‖ = 1 and (47) we can obtain using the condition   =  MHS  (‖ −1 ‖/‖  ‖) ≥ 0, the triangle inequality, and (48), it follows that On the other hand, the line search condition (32) gives Equations ( 50 So, we have The proof is completed. Gilbert and Nocedal [5] introduced property ( * ) which is very important for the convergence properties of the conjugate gradient methods.We are going to show that method with  MDL  possesses such property ( * ).
Property ( * ).Consider a method of forms (2) and (3).Suppose that We say that the method has property ( * ), if for all , there exist constants  > 1,  > 0 such that In fact, by (50), (21), and (42), we have The proof of this lemma is similar to the proof of Lemma 3.5 in [4].In [4], authors proved that method with (19) has this property, if the search direction   satisfies the sufficient descent condition (21).In our paper, we do not need this assumption, since the directions generated by MDL method with strong Wolfe-Powell line search always possess the sufficient descent condition (21).So, we omit the proof of this lemma.
According to the previous lemmas and theorems, we can prove the following convergence theorem for the MDL.
Let  > 0 be given by Lemma 7, and define Δ := ⌈8/⌉ to be the smallest integer not less than 8/.By Lemma 6, we can find an index  0 ≥ 1 such that  The step length   in all methods is determined such that the strong Wolfe-Powell conditions (31) and (32) hold with  = 0.01 and  = 0.1.
The test problems are drawn from [14].The numerical results of our tests are reported in Table 1.
The column problem represents the problem name in [14], Dim represents the dimension of the problems.The numerical results are given in the form of //, where , , and  denote the numbers of iterations, function evaluations and gradient evaluations, respectively.The stopping condition is ‖  ‖ ≤ 10 −6 .Since we want to compare the performance of the different methods, in the numerical results, we omit the problems if all the four methods perform equally.The notation  means that, for this problem, the corresponding method fails.

Conclusions
In this paper, based on  DY  and  DL  , a new formula is proposed to compute the parameter   of the conjugate gradient methods.The main motivations are to improve both the convergence properties and numerical behavior of the conjugate gradient method.For general conjugate gradient methods, in order to get the global convergence results, the methods are required to possess the following major properties: (1) the generated directions   are descent directions; (2) the parameters   are nonnegative.
In addition, to ensure that the methods have robust and efficient numerical behavior, the parameter   needs to approach zero, when the small step   occurs.
From the convergence analysis of this paper, we known that the directions   generated by MDL method are descent directions, which is not true for DY or DL methods, and the proposed MDL method is globally convergent for general functions.In the previous section, we compare the numerical performance of the MDL method with the DY, MHS, and DL methods.From the convergence analysis and numerical results, comparing with the DL, DY, and MHS method, we can have the following.
(a) MDL method versus DL method: from the computational point of view, for most of the test problems, MDL method performs quite similarly with DL method.There are 15 problems in which MDL method outperforms the DL method and 18 problems in which DL method outperforms the MDL method.But, from the convergent point of view, the MDL method outperforms the DL method.
(b) MDL method versus DY method: the convergence properties of MDL method are similar to DY method.By comparing the numerical results of MDL method with DY method, there are 27 test problems in which MDL method outperforms the DY method and only 4 test problems in which DY method outperforms the MDL method.Therefore, we could say that MDL method is much better than the DY method in numerical behavior.
(c) MDL method versus MHS method: they possess similar convergence properties; the numerical results show that MDL method performs little better than the MHS method.

Theorem 5 .
)-(3), if   is computed by  MDL  (26) and   is determined by strong Wolfe-Powell line search, then the search direction   satisfies the sufficient descent condition (21).In any conjugate gradient methods, in which the parameter   is computed by (26), namely,   =  MDL  , and   is determined by strong Wolfe-Powell line search of (31) and (32), if  < 1/3, then the search direction   satisfied the sufficient descent condition (21).
It is well known that the linear conjugate gradient methods generate a sequence of search directions   such that the conjugacy condition (5) holds.Denote  −1 to be the gradient change, which means that 2.1.Dai-Liao's Methods.
In order to prove the convergence of the MDL method, we need to state some properties of  MHS In any conjugate gradient methods, if the parameter   is computed by (23), namely,   =  MHS  , and   is determined by strong Wolfe-Powell line search of (31) and (32), then which implies the truth of (33).Therefore, by Lemma 2 we have (34), which is equivalent to (36) for uniformly convex functions.The proof is completed.
| denote the number of elements in   ,Δ .From the previous property ( * ), we can prove the following lemma.Suppose that Assumption A holds.Consider MDL method, where   is obtained by the strong Wolfe-Powell line search in which  < 1/3.Then if (42) holds, there exists  > 0 such that, for any Δ ∈  * and any index  0 , there is an index  ≥  0 such that

)
With this Δ and  0 , Lemma 7 gives an index  ≥  0 such that ∑ =