A Descent Dai-Liao Conjugate Gradient Method Based on a Modified Secant Equation and Its Global Convergence

We propose a conjugate gradient method which is based on the study of the Dai-Liao conjugate gradient method. An important property of our proposed method is that it ensures sufficient descent independent of the accuracy of the line search. Moreover, it achieves a high-order accuracy in approximating the second-order curvature information of the objective function by utilizing the modified secant condition proposed by Babaie-Kafaki et al. (2010). Under mild conditions, we establish that the proposed method is globally convergent for general functions provided that the line search satisfies the Wolfe conditions. Numerical experiments are also presented.


Introduction
We consider the unconstrained optimization problem: where f : R n → R is a continuously differentiable function.
Conjugate gradient methods are probably the most famous iterative methods for solving the optimization problem (1), especially when the dimension is large characterized by the simplicity of their iteration and their low memory requirements.These methods generate a sequence of points {x k }, starting from an initial point x 0 ∈ R n , using the iterative formula: where α k > 0 is the stepsize obtained by some line search and d k is the search direction defined by where g k is the gradient of f at x k and β k is a scalar.Wellknown formulas for β k include the Hestenes-Stiefel (HS) [1], the Fletcher-Reeves (FR) [2], the Polak-Ribière (PR) [3], the Liu-Storey (LS) [4], the Dai-Yuan (DY) [5] and the conjugate descent (CD) [6].They are specified by , respectively, where

and
• denotes the Euclidean norm.If f is a strictly convex quadratic function, and the performed line search is exact, all these methods are equivalent, but for a general function different choices of β k give rise to distinct conjugate gradient methods with quite different computational efficiency and convergence properties.We refer to the books [7,8], the survey paper [9], and the references therein about the numerical performance and the convergence properties of conjugate gradient methods.During the last decade, much effort has been devoted to develop new conjugate gradient methods which are not only globally convergent for general functions but also computationally superior to classical methods and there are classified by two classes.
The first class utilizes the second-order information of the objective function to improve the efficiency and robustness of conjugate gradient methods.Dai and Liao [10] proposed a new conjugate gradient method by exploiting a new conjugacy condition based on the standard secant equation in which β k in (3) is defined by where t ≥ 0 is a scalar.Moreover, Dai and Liao also suggested a modification of ( 5) from a viewpoint of global convergence for general functions, by restricting the first term of being nonnegative, namely, Along this line, many researchers [11][12][13][14][15] proposed variants of Dai-Liao method based on modified secant conditions with higher orders of accuracy in the approximation of the curvature.Under proper conditions, these methods are globally convergent and sometimes competitive to classical conjugate gradient methods.However, these methods do not ensure to generate descent directions, therefore, the descent condition is usually assumed in their analysis and implementations.
The second class focuses to generate conjugate gradient methods which ensure sufficient descent independent of the accuracy of the line search.On the basis of this idea, Hager and Zhang [16] considered to modify parameter β k in (3) and proposed a new conjugate gradient method, called CG-DESCENT in which the update parameter is defined as follows: where and η > 0 is a constant.An important feature of the CG-DESCENT method is that the generated direction satisfies Moreover, Hager and Zhang [16] established that the CG-DESCENT is globally convergent for general functions under the Wolfe line search conditions.
Quite recently, Zhang et al. [17] considered a different approach, to modify the search direction such that the generated direction satisfies g T k d k = − g k 2 , independently of the line search used.More analytically, they proposed a modified FR method in which the search direction is given by This method is reduced to the standard FR method in case the performed line search is exact.Furthermore, in case β k is specified by another existing conjugate gradient formula, the property g T k d k = − g k 2 is still satisfied.Along this line, many related conjugate gradient methods have been extensively studied [17][18][19][20][21][22][23][24] with strong convergence properties and good average performance.
In this work, we propose a new conjugate gradient method which has both characteristics of the two previously discussed classes.More analytically, our method ensures sufficient descent independent of the accuracy of the line search and achieves a high-order accuracy in approximating the second-order curvature information of the objective function by utilizing the modified secant condition proposed by Babaie-Kafaki et al. [11].Under mild conditions, we establish the global convergence of our proposed method.The numerical experiments indicate that the proposed method is promising.
The remainder of this paper is organized as follows.In Section 2, we present our motivation and our proposed conjugate gradient method.In Section 3, we present the global convergence analysis of our method.The numerical experiments are reported in Section 4 using the performance profiles of Dolan and Moré [25].Finally, Section 4 presents our concluding remarks and our proposals for future research.
Throughout this paper, we denote f (x k ) and ∇ 2 f (x k ) as f k and G k , respectively.

Algorithm
Firstly, we recall that for quasi-Newton methods, an approximation matrix B k−1 to the Hessian ∇ 2 f k−1 is updated so that a new matrix B k satisfies the following secant condition: Zhang et al. [26] and Zhang and Xu [27] expanded this condition and derived a class of modified secant condition with a vector parameter, in the form where u is any vector satisfying s T k−1 u > 0, and θ k−1 is defined by Observing that this new quasi-Newton equation contains not only gradient value information but also function value information at the present and the previous step.Moreover, in [26], Zhang et al. proved that if s k−1 is sufficiently small, then Clearly, these equations imply that the quantity s T k−1 y k−1 approximates the second order curvature s T k−1 G k s k−1 with a higher precision than the quantity s T k−1 y k−1 does.However, for values of s k−1 greater than one (i.e., s k−1 > 1), the standard secant equation ( 10) is expected to be more accurate than the modified secant equation (11).Recently, Babaie-Kafaki et al. [11] to overcome this difficulty considered an extension of the modified secant (11) as follows: where parameter ρ k−1 is restricted to values of {0, 1} and adaptively switch between the standard secant equation (10) and the modified secant equation ( 14), by setting 1 and setting ρ k−1 = 0, otherwise.In the same way as Dai and Liao [10], they obtained an expression for β k , in the form where t ≥ 0 and y * k−1 is defined by (14).Furthermore, following Dai and Liao's approach, in order to ensure global convergence for general functions, they modified formula (15) as follows: Motivated by the theoretical advantages of the modified secant equation ( 14) and the technique of the modified FR method [17], we propose a new conjugate gradient method as follows.Let the search direction be defined by where β k is defined by (15).It is easy to see that the sufficient descent condition holds: using any line search.Moreover, if f is a convex quadratic function and the performed line search is exact, then θ k−1 = 0, y * k−1 = y k−1 , and g T k s k−1 = 0; hence, the conjugate gradient method (2)-( 17) is reduced to the standard conjugate gradient method, accordingly.Now, based on the above discussion, we present our proposed algorithm called modified Dai Liao conjugate gradient algorithm (MDL-CG).

Convergence Analysis
In order to establish the global convergence analysis, we make the following assumptions for the objective function f .Assumption 1.The level set Assumption 2. In some neighborhood N ∈ L, f is differentiable and its gradient g is Lipschitz continuous, namely, there exists a positive constant L > 0 such that It follows directly from Assumptions 1 and 2 that there exists a positive constant M > 0 such that In order to guarantee the global convergence of Algorithm 1, we will impose that the steplength α k satisfies the Armijo conditions or the Wolfe conditions.The Armijo line search is to find a steplength α k = max{λ j , j = 0, 1, 2, . ..} such that where δ, λ ∈ (0, 1) are constants.In the Wolfe line search the steplength α k satisfies Next, we present some lemmas which are very important for the global convergence analysis.
Lemma 1 (see [11]).Suppose that Assumptions 1 and 2 hold.For θ k−1 and y * k−1 defined by (12) and (14), respectively, one has Lemma 2. Suppose that Assumptions 1 and 2 hold.Let {x k } be generated by Algorithm MDL-CG, where the line search satisfies the Armijo condition (22), then there exists a positive constant c > 0 such that for all k ≥ 0.

ISRN Computational Mathematics
Proof.From the Armijo condition (22) and Assumptions 1 and 2, we have Using this together with inequality (18) implies that We now prove (26) by considering the following cases.
From inequalities ( 26) and ( 28), we can easily obtain the following lemma.Lemma 3. Suppose that Assumptions 1 and 2 hold.Let {x k } be generated by Algorithm MDL-CG where the line search satisfies the Armijo line search (22), then Next, we establish the global convergence theorem for Algorithm MDL-CG for uniformly convex functions.
Theorem 1. Suppose that Assumptions 1 and 2 hold and f is uniformly convex on L, namely, there exists a positive constant γ > 0 such that Let {x k } and {d k } be generated by Algorithm MDL-CG and let α k satisfy the Armijo condition (22), then one has either g k = 0 for some k or Proof.Suppose that g k / = 0 for all k ≥ 0. By the convexity assumption (33), we have Combining the previous relation with Lemma 1, we obtain Therefore, by the definition of the search direction d k in (17) together with the previous inequality, we give an upper bound for d k : Inserting this upper bound for d k in (32) yields k≥0 g k 2 < ∞, which completes the proof.
Next, we present a lemma which shows that, asymptotically, the search directions change slowly.Lemma 5. Suppose that Assumptions 1 and 2 hold.Let {x k } and {d k } be generated by Algorithm MDL + -CG and let α k be obtained by the Wolfe line search (23) and (24), then d k / = 0 and where Proof.Firstly, note that d k / = 0, for otherwise (18) would imply g k = 0. Therefore, w k is well defined.Next, we divide formula β * k in two parts as follows: where Moreover, let us define a vector r k and a scalar δ k by where Therefore, from (17), for k ≥ 1, we obtain Using this relation with the identity w k = w k−1 = 1, we have that In addition, using this with the condition δ k ≥ 0 and the triangle inequality, we get Now, we evaluate the quantity υ k .It follows from the definition of υ k in ( 52) and ( 21), (39), (40), and (45) that there exists a positive constant D > 0 such that From the previous relation and Lemma 3, we obtain Therefore, using this with (54), we complete the proof.

ISRN Computational Mathematics
Let Z + denote the set of positive integers.For λ > 0 and a positive integer Δ, we define the set The following lemma shows that if the gradients are bounded away from zero and Lemma 4 holds, then a certain fraction of the steps cannot be too small.This lemma is equivalent to Lemma 3.5 in [10] and Lemma 4.2 in [28].Lemma 6. Suppose that all assumptions of Lemma 5 hold.Then, there exists constant λ > 0 such that, for any Δ ∈ Z + and any index k 0 , there exists a greater index k ≥ k 0 such that Next, making use of Lemmas 4, 5, and 6, we can establish the global convergence theorem for Algorithm MDL + -CG under the Wolfe line search for general functions whose proof is similar to that of Theorem 3.6 in [10] and Theorem 4.3 in [28], thus we omit it.
Theorem 2. Suppose that Assumptions 1 and 2 hold.If {x k } is obtained by Algorithm MDL + -CG and α k is obtained by the Wolfe line search (23) and (24), then one has either g k = 0 for some k or

Numerical Experiments
In this section, we report numerical experiments which were performed on a set of 73 unconstrained optimization problems.These test problems with the given initial points can be found in Andrei Neculai's web site (http:// camo.ici.ro/neculai/SCALCG/testuo.pdf).Each test function made an experiment with the number of variables 1000, 5000, and 10000, respectively.We evaluate the performance of our proposed conjugate gradient method MDL + -CG with that of the CG-DESCENT method [16].The CG-DESCENT code is coauthored by Hager and Zhang obtained from Hager's web page (http://www.math.ufl.edu/∼hager/papers/CG/).The implementation code was written in Fortran and have been compiled with the Intel Fortran compiler ifort (with compiler settings −02 -double-size 128) on a PC (2.66 GHz Quad-Core processor, 4 Gbyte RAM) running Linux operating system.All algorithms were implemented with the Wolfe line search proposed by Hager and Zhang [16] and the parameters were set as default.In our experiments, the termination criterion is g k ≤ 10 −6 and set u = s k−1 as in [11].In the sequel, we focus our interest on our experimental analysis for the best value of parameter t; hence, we have tested values of t ranging from 0 to 1 in steps of 0.005.The detailed numerical results can be found in the web site: http://www.math.upatras.gr/∼livieris/Results/MDL results.zip.
Figure 1 presents the percentage of the test problems that were successfully solved by Algorithm MDL + -CG for each  the choice t = 0.995 exhibits the worst performance in terms of computational cost and success rate.We conclude our analysis by considering the performance profiles of Dolan and Moré [25] for the worst and the best parameter t choices.The use of performance profiles provide a wealth of information such as solver efficiency, robustness, and probability of success in compact form and eliminate the influence of a small number of problems on the benchmarking process and the sensitivity of results associated with the ranking of solvers [25].The performance profile plots the fraction P of problems for which any given method is within a factor τ of the best method.The horizontal axis of the figure gives the percentage of the test problems for which a method is the fastest (efficiency), while the vertical axis gives the percentage of the test problems that were successfully solved by each method (robustness).The curves in Figures 3 and 4 have the following meaning.
(iv) "MDL + 3 " stands for Algorithm MDL + -CG with t = 0.995.Figures 3-5 present the performance profiles of CG-DESCENT, MDL + 1 MDL + 2 , and MDL + 3 relative to the function evaluations, gradient evaluations, and CPU time (in seconds), respectively.Obviously, MDL + 1 exhibits the best overall performance, significantly outperforming all other conjugate gradient methods, relative to all performance metrics.More analytically, MDL + 1 solves about 64.4% and 66.2% of the test problems with the least number of function and gradient evaluations, respectively, while CG-DESCENT solves about 48.8% and 47%, in the same situations.Moreover, MDL + 2 is more robust than the CG-DESCENT since it solves 55.3% and 53% of the test problems with the least number of function and gradient evaluations, respectively.As regarding the CPU time metric, the interpretation in Figure 5 illustrates that MDL + 1 reports the best performance, followed by MDL + 2 .More specifically, MDL + 1 solves 68.4% of the test problems with the least CPU time while MDL + 2 solves about 58% of the test problems.In terms of efficiency, MDL +  2 and CG-DESCENT exhibit the best performance, successfully solving 216 out of 219 of the test problems.MDL +  3 presents the worst performance, since its curves lie under the curves of the other conjugate gradient methods, regarding all performance metrics.In summary, based on the performance of MDL + 1 , MDL + 2 and MDL + 3 , we point out that the choice of parameter t crucially affects the efficiency of Algorithm MDL + -CG.

Conclusions and Future Research
In this paper, we proposed a conjugate gradient method which consists of a modification of Dai and Liao method.An important property of our proposed method is that it ensures sufficient descent independence of the accuracy of the line search.Moreover, it achieves a high-order accuracy in approximating the second-order curvature information of the objective function by utilizing the modified secant condition proposed by Babaie-Kafaki et al. [11].Under mild conditions, we establish that the proposed method is globally convergent for general functions under the Wolfe line search conditions.
The preliminary numerical results show that if we choose a good value of parameter t, our proposed algorithm performs very well.However, we have not theoretically established an optimal parameter t, yet which consists our motivation for future research.Moreover, an interesting idea is to apply our proposed method to a variety of challenging real-world problems such as protein folding problems [29].

Figure 5 :
Figure 5: Performance profiles of CG-DESCENT, MDL + 1 , MDL + 2 Figure 1: Percentage of successfully solved problems by Algorithm MDL + -CG for each value of parameter t.