Several Guaranteed Descent Conjugate Gradient Methods for Unconstrained Optimization

. This paper investigates a general form of guaranteed descent conjugate gradient methods which satisfies the descent condition 𝑔 𝑇𝑘 𝑑 𝑘 ≤ −(1 − 1/(4𝜃 𝑘 ))‖𝑔 𝑘 ‖ 2 (𝜃 𝑘 > 1/4) and which is strongly convergent whenever the weak Wolfe line search is fulfilled. Moreover,wepresentseveralspecificguaranteeddescentconjugategradientmethodsandgivetheirnumericalresultsforlarge-scale unconstrainedoptimization


Introduction
Consider the following unconstrained optimization problem: min { () :  ∈   } , where   is the -dimensional Euclidean space,  :   →  is continuously differentiable, and its gradient () is available.
Conjugate gradient methods are very efficient to solve problem (1) due to their simple iteration and their low memory requirements.For any given starting point  0 ∈   , they generate a sequence {  } by the following recursive relation: where   = (  ),   is a step length obtained by means of a one-dimensional search, and   is a scalar that characterizes the method.In general, the step length   in (2) is obtained by fulfilling the following weak Wolfe conditions [1,2]: where 0 <  ≤  < 1.And different choices for the scalar   in (3) result in different nonlinear conjugate gradient methods.Well-known formulas for   are the Fletcher-Reeves (FR), Hestenes-Stiefel (HS), Polak-Ribiére-Polyak (PRP), Dai-Yuan (DY), and Liu-Storey (LS) formulas (see [3], [4], [5], [6], [7], and [8], resp.) and are given by where ‖ ⋅ ‖ means the Euclidean norm and   =  +1 −   .Their corresponding conjugate gradient methods are viewed as basic conjugate gradient methods.Among these basic conjugate gradient methods, the PRP and HS methods perform very similarly and perform better than other basic conjugate gradient methods [9].While Powell [10] utilized an example illustrating that the PRP and HS methods may cycle without approaching any solution point, then modified versions of the PRP and HS methods were presented by many researchers (see, e.g., [11][12][13][14][15][16]). 2

Journal of Applied Mathematics
The following (sufficient) descent condition,      ≤ −           2 , ∀ ≥ 0,  > 0, (6) is very important for conjugate gradient methods, so we are particularly interested in the conjugate gradient methods with sufficient descent conditions.Up to now, there are many descent conjugate gradient methods proposed by researchers; please see [12,[16][17][18][19] and references therein.One well-known guaranteed descent conjugate gradient method was proposed by Hager and Zhang [16,20,21] with The method is designed based on the HS method and satisfies the sufficient descent condition (6) with  = 7/8 for any (inexact) line search.In [18], Zhang and Li proposed a general case of the HZ method with where ℎ > 0 and   is a scalar to be specified.It also satisfies the sufficient descent condition (6) with  = 7/8, and it is globally convergent in the sense of lim inf  → ∞ ‖  ‖ = 0.For   = ‖ −1 ‖ 2 and   = −  −1  −1 , it becomes a descent PRP type method and a descent LS type method, respectively.
A more general form of the scalar   was suggested by Dai [22] and was defined as where   ∈ , V  ∈   , and  > 1/4, while its convergence has not been given in [22].More recently, Nakamura et al. [19] proved that the method is globally convergent in the sense of lim inf  → ∞ ‖  ‖ = 0 with the weak Wolfe conditions.Moreover, we say that a conjugate gradient method is strongly convergent if lim  → ∞   = 0. Obviously, the later is stronger than the former, that is, the global convergence indicates that there exists at least one cluster point which is a stationary point of , while the strong convergence means that every cluster point of {  } will be a stationary point of .
Observe formulas ( 8) and ( 9); we find that although  ZL  is a special case of formula (9), it has its own feature; that is, its denominator is lower bounded by ℎ 2 ‖ −1 ‖ 2 .Motivated by this, we consider the general formula (9) by where   > 1/4 and  > 0, and prove that the general conjugate gradient method with  CGM  has better convergence properties; that is, it is strongly convergent.Another difference between the two formulas ( 9) and ( 10) is their choices of V  .In order to guarantee convergence, the choices of V  and   in (9) must satisfy the assumption that for all  ≥ 0, there exist positive constants  1 and  2 such that ), then whether the above assumption is satisfied is difficult to verify, while the requirement of V  in (10) only is normbounded.
The rest of this paper is organized as follows.In Section 2, we describe the general form of guaranteed descent conjugate gradient methods with (10) and establish that the corresponding search directions always yield descent condition ) independently of choices of the parameters V  and   .And under some mild conditions, we prove its strong convergence with the weak Wolfe conditions.Moreover, we specifically design several efficient descent conjugate gradient methods combined with the features of the basic conjugate gradient methods above.In Section 3, we test the proposed conjugate gradient methods using the large-scale unconstrained problems in the CUTEr test library and compare them with the ZL method.Finally, we give some conclusions in Section 4.

Algorithm and Convergence
In this section, we describe the conjugate gradient method with (10) and show its strong convergence.And we give several specific conjugate gradient methods by combining formula (10) with some basic conjugate gradient methods.Firstly, we make the following assumption.
Assumption 1 implies that there exists a positive constant γ such that Algorithm 2.
Next, we analyze the convergence properties of Algorithm 2. Under Assumption 1, we state the following Zoutendijk condition, which is originally given by Zoutendijk [23] and Wolfe [1,2] and is used to prove global convergence of nonlinear conjugate gradient methods.Theorem 3. Suppose that  0 is a starting point for which Assumption 1 holds.Consider any iterative method in the form (2), where   is a descent direction and   satisfies the weak Wolfe conditions (4); then The following lemma shows that the directions of Algorithm 2 satisfy the sufficient descent condition.
The lemma above is similar to Theorem 1.1 in [16].And from this lemma, we can see that the descent property is independent of any line search and choices of the parameters V  and   , while different choices of the parameters V  ,   , and   may yield very different numerical behaviors.Theorem 5. Consider Algorithm 2, where   satisfies the weak Wolfe conditions (4) and   is defined by (10) with ‖V  ‖ being bounded.Then, either   = 0 for some  or Proof.Suppose that   ̸ = 0 for all .Utilizing ( 13) and ( 14), we have Since ‖V  ‖ is bounded, then there must exist a large number  < ∞ such that ‖V  ‖ ≤  for all .By using the definition of   , we have Inserting this upper bound for   in (17) yields which implies (16).Now, we propose several specific versions of Algorithm 2. Since hybrid conjugate gradient methods are regarded as better performing conjugate gradient methods in practice, then the specific methods are designed as hybrid versions based on some basic conjugate gradient methods.As mentioned in Section 1, the PRP and HS methods are two efficient methods, so the first specific hybrid method is designed using the features of the PRP and HS methods with Since the LS method has a similar structure to the PRP method, then the second hybrid method is proposed based on the PRP and LS methods with The third one is derived from the FR and DY methods with And the last one is proposed with where  * −1 =  −1 + ‖ −1 ‖ −1  −1 is similar to that of [24] and utilizes some secant condition.In addition, many conjugate gradient methods have been proposed based on different secant conditions; please refer to [15,[25][26][27][28] for further information.
From Assumption 1 and inequality (19), we have that   and   are norm-bounded for all ; then global convergence properties of the four new hybrid descent conjugate gradient methods can be given following the proof of Algorithm 2.
Here, the parameter   in ( 10) is chosen to be the constant number 2. It also could have other choices, such as   = max{1/4+, |  |/‖V  ‖ 2 }, while, in most cases,   = 2 performs better than other choices.

Numerical Experiments
In this section, we did some numerical experiments to test the performances of the proposed methods and compared them with the ZL method.Numerical results reported in [18] showed that the ZL method with   = −  −1  −1 in (8), denoted by TDLS method, performs better than the HZ method and the descent PRP type method, so we only compared the proposed methods with the TDLS method.All codes were written in Matlab and run on a desktop computer with an Intel(R) Xeon(R) 2.40 GHZ CPU, 6.00 GB of RAM, and Linux operating system Ubuntu 8.04.All test problems were drawn from the CUTEr test library [29,30] and were accessed from within Matlab R2012a by using Matlab interface.We were particularly interested in largescale problems, so the dimension of each test problem was at least 100.
For all the implemented methods, the step size   satisfied the weak Wolfe conditions (4) with  = 0.1 and  = 0.9 and its initial guess was generated by the rules in [21], the value of ℎ in TDLS method was taken to be 10 −5 following [18], and the stopping criterion was          ∞ ≤ max {,  (1 where  = 10 −6 and   = (  ).The numerical results were reported in Table 1, where Problem, Dim, Iter, Nf, Ng, and CPU represent the name of the test problems, the dimension, the number of iterations, the number of function evaluations, the number of gradient evaluations, and the CPU time elapsed in seconds, respectively, and "−" means that the method failed to achieve a prescribed accuracy when the number of iterations exceeded 50,000 or the cost function generated a "NaN." The performances of all methods were evaluated using the profiles of Dolan and Morè [31].That is, we plotted the fraction  of the test problems for which each of the methods was within a factor . Obviously, the top curve represented the most roust one within the same factor .And the left curve represented the fastest one to solve the same percentage of the test problems.Figures 1, 2, and 3 showed the performance profiles referring to the number of function evaluations, the number of gradient evaluations, and CPU time, respectively.These figures revealed that all the test methods were efficient and the CGM1, CGM2, and CGM4 methods were comparable with the TDLS method, while the CGM3 method performed relatively bad.It is worth noting that the CGM1, CGM2 and CGM4 methods are hybrid versions related to the PRP method, so they inherit the good numerical performance of the PRP method.Among the three methods and the TDLS method, the CGM1 method performed more efficiently than the CGM2 method and more robustly than the TDLS and CGM4 methods, so the CGM1 method was the winner of these test methods.

Conclusions
This paper has studied a general form of guaranteed descent conjugate gradient methods and has proven that whenever  the weak Wolfe conditions are fulfilled, it is strongly convergent with lim  → ∞   = 0.Then, we gave several specific guaranteed descent conjugate gradient methods and investigated their numerical behaviors using the test problems from the CUTEr library.From the numerical results, we can conclude that the specific methods are efficient to solve unconstrained nonlinear problems.More recently, a class of conjugate gradient methods [28] was proposed based on different secant conditions.They followed the form of the HZ method and satisfied sufficient descent condition.While not all of the global convergence properties of them were obtained for a general objective function, then our further investigation is to improve these methods from theory analysis and numerical efficiency.

4 𝜏Figure 1 :
Figure 1: Performance profile based on the number of function evaluations.

1 𝜏Figure 3 :
Figure 3: Performance profile based on the CPU time. 

Table 1 :
Numerical results for test problems from the CUTEr library.