A Simple Sufficient Descent Method for Unconstrained Optimization

We develop a sufficient descent method for solving large-scale unconstrained optimization problems. At each iteration, the search direction is a linear combination of the gradient at the current and the previous steps. An attractive property of this method is that the generated directions are always descent. Under some appropriate conditions, we show that the proposed method converges globally. Numerical experiments on some unconstrained minimization problems from CUTEr library are reported, which illustrate that the proposed method is promising.


Introduction
In this paper, we consider the unconstrained optimization problem min f x , x ∈ Ê n , 1.1 where f : Ê n → Ê is a continuously differentiable function, and its gradient at point x k is denoted by g x k , or g k for the sake of simplicity.n is the number of variables, which is automatically assumed to be large.Large-scale optimization is one of the important research areas both in optimization theory and algorithm design.There exist some kinds of effective methods available for solving 1.1 , as for instance, inexact Newton, limited memory quasi-Newton, conjugate gradient, spectral gradient, and subspace methods 1-4 .
The iterative formula the method is given by where α k is a steplength, and d k is a search direction.Generally, we say that d k is a descent direction of f at x k if d k g k < 0. Furthermore, if there exists a positive constant C > 0 such that d k g k ≤ −C g k 2 , then d k is often called a sufficient descent direction.The descent property is very important for the iterative method to be global convergent, especially for the conjugate gradient method 5 .Unfortunately, for the earliest spectral gradient method 6 and the PRP conjugated gradient method, their generated directions are not always descent.Therefore, in the last years, much effort has been done to smooth away the drawback and develop new methods, for example, 7 .
As is wellknown, if Armijo line search is used, the standard PRP method can cycle infinitely without approaching any optimal point when solving nonconvex minimization problems.To overcome this drawback, Hager and Zhang 8 made a slight modification on the PRP formula.Consequently, the resulting method own the remarkable property: for any line search, the generated direction always satisfies the sufficient descent condition d k g k ≤ − 7/8 g k 2 .Moreover, Zhang et al. 9 further studied the PRP formula and proposed a three-term modified PRP method, in which the generated directions satisfy and this property is independent of any line search.Additionally, for improving the numerical performance of the standard FR method, Zhang et al. 10 developed a modified FR method where the direction also satisfy 1.3 .
Although much progress has been made in designing a sufficient descent direction in a conjugate gradient method, it seemingly receive little attention in other methods.Very recently, An et al. 5 proposed a robust technique to construct a sufficient descent direction for unconstrained optimization.The descent direction is a linear combination of the steepest descent direction and the projection of the original direction, that is, where λ k is a scalar and d k is an original direction.Based on the definition of d k , it is easy to deduce that the sufficient descent condition 1.3 holds.This technique has been applied to the PSB updated formula and showed that the resulting method converges globally and superlinearly for an uniformly convex function.Moreover, the direction in 11 is truly a special case of 1.4 .We note that 1.4 can be considered as a general form of a nonlinear conjugate gradient method, in which where β k is a scalar.Comparing with 1.4 , we see that the scalar β k takes the place of the coefficient matrix of d k .In this paper, we further study the sufficient descent technique 1.4 , and propose a simple sufficient descent method for solving unconstrained optimization problems.Our motivation is simple, we choose d k as the gradient of the previous step in 1.4 .The search direction in this proposed method always satisfies a sufficient descent condition.
Under some conditions, we show that the algorithm converges globally by using a special line search.The performance of the method on some CUTEr test problems indicates that it is encouraging.
We organize this paper as follows.In the next section, we construct the sufficient descent direction.In Section 3, we state the steps of our new algorithm with a special line search.We also report some experiments on some large-scale unconstrained optimization problems.Finally, we conclude this paper with some remarks in the last section.Throughout this paper, the symbol • denotes the Euclidean norm of a vector.

New Search Direction
This section aims to state the new direction formula and investigate its properties.Now, if we take g k−1 as d k and set λ k ≡ 1 in 1.4 , we get Obviously, the direction is a linear combination of the gradient at the current and the previous steps.Additionally, to obtain the global convergence of PRP method, Cheng 11 introduced a descent direction defined as We note that 2.2 is only a special choice of 1.4 , in which d k is replaced by d k−1 and λ k is chosen as the scalar β PRP k .If we denote then it is easy to see that for any y / 0, we have which indicates that H k is a symmetric and positive semidefinite matrix.

Sufficient Descent Method
As we have stated in the previous section, the directions in 2.1 satisfy a sufficient descent condition.In this section, we list our algorithm and establish its global convergence.Firstly, we state the steps of the simple sufficient descent method.In order to achieve the global convergence of our method, we consider the backtracking line search of Grippo and Lucidi GL 12 .That is, for given δ ∈ 0, 1 , β > 0, ρ ∈ 0, 1 , find the first j k j 1, 2, . . .such that α k βρ j k satisfies Now, we are ready to state the steps of the Simple Sufficient Descent SSD method.
Algorithm 3.1 SSD .We have the following steps.
Step 2. If g k 0, then stop.
Step 3. Compute d k by 2.1 .
The remainder of this section is devoted to investigate the global convergence of Algorithm 3.1.We first state some assumptions.

Assumption 3.2. Function f is continuously differentiable and the level set
The gradient of f is Lipschitz continuous, that is, there exists a constant L > 0 such that

3.2
We first prove the following lemma.Lemma 3.4.Let the sequence {x k } be generated by Algorithm 3.1.Suppose that Assumptions 3.2 and 3.3 hold.Then, Proof.We have the following cases.

3.4
By the mean value theorem and Assumption 3.3, there exists θ k ∈ 0, 1 such that where L > 0 is the Lipschitz constant of g.Substituting the last inequality into 3.4 , we get This implies that there is a constant M > 0 such that

3.7
Case 2. If α β, then we have 3.9 Proof.From Assumption 3.3, we know that there exists a positive constant γ such that g k ≤ γ.

3.10
By the definition of d k in 2.1 , we have which shows that d k is bounded.We get from 3.1 that Consequently, we have lim For the sake of contradiction, we suppose that 3.9 does not hold.Then there is a constant > 0 such that g k ≥ for all k ≥ 0. If lim inf k → ∞ α k > 0, we get from 3.13 that lim inf k → ∞ d k 0, which shows lim inf k → ∞ g k 0. This contradicts the assumption 3.9 . 3.14 It follows from the line search step in Algorithm 3.1 that when k ∈ K is sufficiently large, ρ −1 α k does not satisfy 3.1 .Then from the first part of the proof of Lemma 3.4 we see that 3.6 holds.Since d k is bounded and lim k∈K,k → ∞ α k 0, 3.6 implies lim inf k → ∞ g k 0. This also yields a contradiction.The proof is complete.

Numerical Results
In this section, we test the feasibility and effectiveness of the Algorithm SSD.The algorithm is implemented in Fortran77 code using double precision arithmetic.All runs are performed on a PC Intel Pentium Dual E2140 1.6 GHz, 256 MB SDRAM with Linux operations system.The algorithm stops if the infinity-norm of the final gradient is below 10 −5 , that is ∇f x ≤ 10 −5 .

4.1
The iteration is also stopped if the number of iterations exceeds 10000 or the number of function evaluations reaches 20000.Our experiments are performed on the subset of the nonlinear unconstrained problems from CUTEr 13 collection.The second-order derivatives of all the selected problems are available.Since we are interested in large problems, we refined this selection by considering only problems where the number of variables is at least 50.The parameter in line search 3.1 is taken as: β 1, ρ 0.  In addition, we also present extensive numerical results of the state-of-the-art algorithm CG DESCENT 8, 14 .CG DESCENT is a conjugate gradient descent method for solving large-scale unconstrained optimization problems.A new nonmonotone line search were used in CG DESCENT which makes this algorithm very efficient.The Fortran code can be obtained from http://www.math.ufl.edu/∼hager/.When running of CG DESCENT, default values are used for all parameters.The results are summarized in Table 2.
Observing the tables, we see that SSD works well, as it reached a stationary point based on the stopping criterion 4.1 for almost all these test problems.Although SSD requires large number of iterations or function evaluations, it needs less time consuming comparatively.However, the numerical comparisons tell us that the state-of-the-art algorithm CG DESCENT performs a little better than SSD, but for some specific problems, the enhancement of SSD is still noticeable.The numerical experiments also show that SSD can potentially used to solve unconstrained optimization problems with higher dimensions.

Concluding Remarks
In this paper, motivated by a descent technique, we have proposed a simple sufficient descent method for solving large-scale unconstrained optimization problems.At each iteration, the generated directions are only related to the gradient information of two successive points.Under some mild conditions, we have shown that this method is global convergent.The numerical results indicate that it works well on some selected problems from CUTEr library.We think there are several other issues that could be lead to improvements.A first point that should be considered is probably the choice of a nonmonotone line search technique.
We have adopted the monotone scheme of Grippo and Lucidi 3.1 , but this is not the only possible choice, and the framework developed in 15, 16 may allow one to achieve a better performance.Similar to 5 , another important point that should be further investigated is a modification of the search direction without violating the nice sufficient descent property.To this end, it is interesting to note that this method is capable of solving nonlinear systems of equations without Jacobian information and cheaply computing.

. 8 whichTheorem 3 . 5 .
shows that 3.7 holds with M δ −1 β −2 .Moreover, 3.7 indicates the lemma claims.Let {x k } be generated by Algorithm 3.1 with GL backtracking line search 3.1 .Then lim inf 1, and δ 10 −4 .The numerical results of the algorithms SSD are listed in Table 1.The columns have the following meanings: Norm-2: l 2 -norm of the final gradient, and Norm-0: ∞-norm of the final gradient.

Table 1 :
Test results for SSD method.

Table 2 :
Test results for CG DESCENT.