A New Conjugate Gradient Algorithm with Sufficient Descent Property for Unconstrained Optimization

A new nonlinear conjugate gradient formula, which satisfies the sufficient descent condition, for solving unconstrained optimization problem is proposed. The global convergence of the algorithm is established under weak Wolfe line search. Some numerical experiments show that this new WWPNPRP


Introduction
In this paper, we consider the following unconstrained optimization problem: where () : R  → R is a twice continuously differentiable function whose gradient is denoted by () : R  → R  .Its iterative formula is given by where and   is a step size which is computed by carrying out a line search,   is a scalar, and   denotes (  ).There are at least six famous formulas for   , which are given below: To establish the global convergence results of the above conjugate gradient (CG) methods, it is usually required that the step size   should satisfy some line search conditions, such as the weak Wolfe-Powell (WWP) line search where  ∈ (0, 1/2) and  ∈ (, 1), and strong Wolfe-Powell (SWP) line search (5) where  ∈ (0, 1/2) and  ∈ (, 1).Wolfe-Powell is referred to as Wolfe.
Considerable attentions have been made on the global convergence behaviors for the above methods.Zoutendijk [1] proved that the FR method with exact line search is globally convergent.Al-Baali [2] extended this result to the strong Wolfe line search conditions.In [3], Dai and Yuan proposed the DY method which produces a descent search direction at every iteration and converges globally provided that the line search satisfies the weak Wolfe conditions.In [4], Wei et   .
Under the Wolfe line search, the method possesses global convergence and efficient numerical performance.
On some studies of the conjugate gradient methods, the sufficient descent condition is often used to analyze the global convergence of the nonlinear conjugate gradient method with the inexact line search techniques.For instance, Touati-Ahmed and Storey [9], Al-Baali [2], Gilbert and Nocedal [10], and Hu and Storey [11] hinted that the sufficient descent condition may be crucial for conjugate gradient methods.Unfortunately, this condition is hard to hold.It has been showed that the PRP method with the strong Wolfe Powell line search does not ensure this condition at each iteration.So, Grippo and Lucidi [12] managed to find some line searches which ensure the sufficient descent condition, and they presented a new line search which ensures this condition.The convergence of the PRP method with this line search had been established.Yu et al. [13] analyzed the global convergence of modified PRP CGM with sufficient descent property.Gilbert and Nocedal [10] gave another way to discuss the global convergence of the PRP method with the weak Wolfe line search.By using a complicated line search, they were able to establish the global convergence result of the PRP and HS methods by restricting the parameter   in (3), not allowed to be negative; that is, which yields a globally convergent CG method, being also computationally efficient [14].In spite of the numerical efficiency of the PRP method, as an important defect, the method lacks the following descent property: even for uniformly convex objective functions [15].This motivated the researchers to pay much attention to finding some extensions of the PRP method with descent property.In this context, Yu et al. [16] proposed a modified form of  PRP  as follows: with a constant  ≥ 1/4, leading to a CG method with the sufficient descent property.Dai and Kou [17] propose a family of conjugate gradient methods and an improved Wolfe line search; meanwhile, to accelerate the algorithm, an adaptive restart along negative gradients method is introduced.Jiang and Jian [18] proposed two modified CGMs with disturbance factors based on a variant of PRP method; the two proposed methods not only generate sufficient descent direction at each iteration but also converge globally for nonconvex minimization if the strong Wolfe line search is used.A new hybrid conjugate gradient method was presented for unconstrained optimization.The proposed method can generate decent directions at every iteration; moreover, this property is independent of the steplength line search.Under the Wolfe line search, the proposed method possesses global convergence [19].
The main purpose of this paper is to design an efficient algorithm which possesses the properties of global convergence, sufficient descent, and good numerical results.In next section, we present a new CG formula and give its properties.In Section 3, the new algorithm and its global convergence result will be established.To test and compare the numerical performance of the proposed method, in the last part of this work, a large amount of medium-scale numerical experiments are reported by tables and performance profiles.

The Formula and Its Property
Because sufficient descent condition ( 9) is a very nice and important property to analyze the global convergence of the CG methods, we hope to find   such that   satisfies (9).In the following, we propose a sequence {  } and prove that it has such property.Firstly, we give a definition of a descent sequence (or a sufficient descent sequence): a sequence {  } is called a descent sequence (or a sufficient descent sequence) for the CG methods if there exists a constant  ∈ (0, 1) (or  ∈ [0, 1)) such that, for all  ≥ 2, By using (3), we have, for all  ≥ 2, From the above discussion, we require that The above inequality implies (13).
In [20], the authors proposed a variation of the FR formula: where  1 ∈ (0, +∞),  2 ∈ [ 1 +  1 , +∞),  3 ∈ (0, +∞), and  1 is any given positive constant.It is easy to prove that { VFR  } is a descent sequence (with  =  1 / 2 ) for CG methds if      ≤ 0. Formula ( 16) possesses the sufficient descent property and proved that there exist some nonlinear conjugate gradient formulae possessing the sufficient descent property without any line searches, where By restricting the parameter  ≤ 1/4 under the SWP line search condition, the WYL method possessed the sufficient descent condition [21].
In [22], the authors designed the following variation of the PRP formula which possesses the sufficient descent property without any line searches: in which  ≥ 1.
Motivated by the ideas in [20,22] without any line search and sufficient descent, and taking into account the good convergence properties of [10] and the good numerical performance in [14], we propose a class new formula about   as follows: where the definitions of  1 ,  2 are the same as those in formula (16);  ∈ (0, 1).
In order to ensure the nonnegative of the parameter   , we define Thus if a negative of  NPRP  occurs, this strategy will restart the iteration along the steepest direction.
The following two propositions show that the { NPRP +

𝑘
} is a descent sequence, so that   can make sufficient descent condition (9) hold.
By the proof of Proposition 2, we can know that the formula  1 / 2 < 1 is necessary; otherwise, the sufficient descent condition can not be held.

Global Convergence
In this section, we propose an algorithm related to  NPRP +  and then we study the global convergence property of this algorithm.Firstly, we make the following two assumptions, which have been widely used in the literature to analyze the global convergence of the CG methods with the inexact line searches.
Assumption A. The level set Now we give the algorithm.
Since {(  )} is decreasing sequence, it is clear that the sequence {  } is contained in Ω, and there exists a constant  * , such that lim By using Assumptions A and B, we can deduce that there exists  > 0 such that The following important result was obtained by Zoutendijk [1] and Wolfe [23,24].
Lemma 4. Suppose () is bounded below, and () satisfies the Lipschitz condition.Consider any iteration method of formula (2), where   satisfies      < 0 and   is obtained by the weak Wolf line search.Then The following lemma was obtained by Dai and Yuan [25].
Lemma 5. Assume that a positive series {  } satisfies the following inequality for all : where  > 0 and  are constant.Then one has Theorem 6. Suppose that Assumptions A and B hold; {  } is a sequence generated by Algorithm 3. Then one has Proof.Equation (3) indicates that, for all  ≥ 2, Squaring both sides of (33), we obtain Suppose that We have where ℎ  = ‖  ‖ which contradicts to Zoutendijk condition (29).This shows that (32) holds.The proof of the theorem is complete.
From the proof of the above theorem, we can conclude that any conjugate gradient method with the formula  NPRP +  and some certain step size technique which ensures that Zoutendijk condition (29) holds is globally convergent.In particular, the formula  NPRP +  with the weak Wolfe conditions can generate a globally convergent result.

Numerical Results
All methods above are tested on 56 test problems, where the former test problems 1-48 (from arwhead to woods) in Table 1 are taken from the CUTE library in Bongartz et al. [26] and the others are taken from Moré et al. [27]; is generated by Grippo and Lucidi [12].All codes were written in Matlab 7.5 and run on a HP with 1.87 GB RAM and Windows XP operating system.The parameters are  = 0.1,  = 0.01,  1 = 1,  2 = 2, and  = 0.3.Stop the iteration if criterion ‖  ‖ ≤  = 10 −6 is satisfied.
In Table 1, "Name" denotes the abbreviation of the test problems, "" denotes the dimension of the test problems, "Itr/NF/NG" denote the number of iteration, function evaluations, and gradient evaluations, respectively, and "Tcpu" denotes the computing time of CPU for computing a test problem (units: second).
On the other hand, to show the performance difference clearly between the hJHJ, hAN, hDY, and hHuS method, we adopt the performance profiles given by Dolan and Moré [28] to compare the performance according to Itr, NF, NG, and Tcpu, respectively.Benchmark results are generated by running a solver on a set P of problems and recording information of interest such as NF and Tcpu.Let S be the set of solvers in comparison.
where size  means the number of elements in set ; then   () is the probability for solver  ∈ S that a performance  ratio  , is within a factor  ∈   .The   is the (cumulative) distribution function for the performance ratio.The value of   (1) is the probability that the solver will win over the rest of the solvers.
Based on the theory of the performance profile above, four performance figures, that is, Figures 1-4, can be generated according to Table 1.From the four figures, we can see that the NPRP is superior to the other three CGMs on the testing problems.

Conclusion
In this paper, we carefully studied the combination of the variations of the formulas  FR  and  PRP  .We have found that the new formula possesses the following features: (1)  NPRP +  is a descent sequence without any line search; (2) the new method possesses the sufficient descent property and converge globally; (3) the strategy will restart the iteration automatically along the steepest descent direction if a negative value of  NPRP  occurs; (4) the initial numerical results are promising.
2/‖  ‖ 4 and   = −     /‖  ‖ 2 .Note that ℎ 1 = 1/‖ 1 ‖ 2 and  1 = 1.It follows from (36) that Assume that S consists of   and P consists of   problems.For each problem  ∈ P and solver  ∈ S, denote  , by the computing time (or the number of function evaluation, etc.) required to solve problem  ∈ P by solver  ∈ S, and the comparison between different solvers is based on the performance ratio defined by Assume that a large enough parameter   ≥  , for all ,  is chosen, and  , =   if and only if solvers  do not solve problem .Define solvers size { ∈ P : log  , ≤ } ,