Global Convergence of a Nonlinear Conjugate Gradient Method

A modified PRP nonlinear conjugate gradient method to solve unconstrained optimization problems is proposed. The important property of the proposedmethod is that the sufficient descent property is guaranteed independent of any line search. By the use of the Wolfe line search, the global convergence of the proposedmethod is established for nonconvexminimization. Numerical results show that the proposed method is effective and promising by comparing with the VPRP, CG-DESCENT, and DL methods.


Introduction
The nonlinear conjugate gradient method is one of the most efficient methods in solving unconstrained optimization problems.It comprises a class of unconstrained optimization algorithms which is characterized by low memory requirements and simplicity.
Consider the unconstrained optimization problem where f : R n → R is continuously differentiable, and its gradient g is available.
The iterates of the conjugate gradient method for solving 1.1 are given by where stepsize α k is positive and computed by certain line search, and the search direction d k is defined by where g k ∇f x k , and β k is a scalar.Some well-known conjugate gradient methods include Polak-Ribière-Polyak PRP method 1, 2 , Hestenes-Stiefel HS method 3 , Hager-Zhang HZ method 4 , and Dai-Liao DL method 5 .The parameters β k of these methods are specified as follows: where || • || is the Euclidean norm and y k−1 g k − g k−1 .We know that if f is a strictly convex quadratic function, the above methods are equivalent in the case that an exact line search is used.If f is nonconvex, their behaviors may be further different.
In the past few years, the PRP method has been regarded as the most efficient conjugate gradient method in practical computation.One remarkable property of the PRP method is that it essentially performs a restart if a bad direction occurs see 6 .Powell 7 constructed an example which showed that the PRP method can cycle infinitely without approaching any stationary point even if an exact line search is used.This counterexample also indicates that the PRP method has a drawback that it may not globally be convergent when the objective function is nonconvex.Powell 8 suggested that the parameter β k is negative in the PRP method and defined β k as 1.5 Gilbert and Nocedal 9 considered Powell's suggestion and proved the global convergence of the modified PRP method for nonconvex functions under the appropriate line search.In addition, there are many researches on convergence properties of the PRP method see 10-12 .
In recent years, much effort has been investigated to create new methods, which not only possess global convergence properties for general functions but also are superior to original methods from the computation point of view.For example, Yu et al. 13 proposed a new nonlinear conjugate gradient method in which the parameter β k is defined on the basic of β PRP k such as where ν > 1 in this paper, we call this method as VPRP method .And they proved the global convergence of the VPRP method with the Wolfe line search.Hager and Zhang 4 discussed the global convergence of the HZ method for strong convex functions under the Wolfe line search and Goldstein line search.In order to prove the global convergence for general functions, Hager and Zhang modified the parameter β HZ k as where The corresponding method of 1.7 is the famous CG-DESCENT method.Dai and Liao 5 proposed a new conjugate condition, that is, Under the new conjugate condition, they proved global convergence of the DL conjugate gradient method for uniformly convex functions.According to Powell's suggestion, Dai and Liao gave a modified parameter The corresponding method of 1.10 is the famous DL method.Under the strong Wolfe line search, they researched the global convergence of the DL method for general functions.This paper is organized as follows: in the next section, we propose a modified PRP method and prove its sufficient descent property.In Section 3, the global convergence of the method with the Wolfe line search is given.In Section 4, numerical results are reported.We have a conclusion in the last section.

Modified PRP Method
In this section, we propose a modified PRP conjugate gradient method in which the parameter β k is defined on the basic of β PRP k as follows: in which m ∈ 0, 1 .We introduce the modified PRP method as follows.
Step 5. Set k k 1, and go to Step 2.
In the convergence analyses and implementations of conjugate gradient methods, one often requires the inexact line search to satisfy the Wolfe line search or the strong Wolfe line search.The Wolfe line search is to find α k such that where 0 < δ < σ < 1.The strong Wolfe line search consists of 2.2 and the following strengthened version of 2.3 : Moreover, in most references, we can see that the sufficient descent condition

2.8
Secondly, if g T k d k−1 > 0, then from 2.7 , we also have 2.9 From the above, the conclusion 2.6 holds under any line search.

Global Convergences of the Modified PRP Method
In order to prove the global convergence of the modified PRP method, we assume that the objective function f x satisfies the following assumption.
Assumption H ii In a neighborhood V of Ω, f is continuously differentiable and its gradient g is Lipchitz continuous, namely, there exists a constant L > 0 such that 3.1 Under these assumptions on f, there exists a constant γ > 0 such that The conclusion of the following lemma, often called the Zoutendijk condition, is used to prove the global convergence properties of nonlinear conjugate gradient methods.It was originally given by Zoutendijk where Proof.From 2.1 and 3.4 , we get By 2.6 and 3.6 , we know that d k / 0 for each k.Define the quantities

3.8
Since u k is unit vector, we get 3.9 From δ k ≥ 0 and the above equation, one has

3.13
By 3.10 and the above inequality, one has Proof.From Assumption ii , we know that 3.2 holds.By 2.1 , 3.2 , and 3.4 , one has   Proof.To obtain this result, we proceed by contradiction.Suppose that 3.18 does not hold, which means that there exists r > 0 such that 19 so, we know that Lemmas 3.2 and 3.4 hold.We also define u k d k /||d k ||, then for all l, k ∈ Z l ≥ k , one has

3.21
From Assumption H, we know that there exists a constant ξ > 0 such that x ≤ ξ, for x ∈ V.

3.22
From 3.21 and the above inequality, one has

3.23
Let Δ be a positive integer and Δ ∈ 8ξ/λ, 8ξ/λ 1 where λ has been defined in Lemma 3.4.From Lemma 3.2, we know that there exists k 0 such that i≥k 0 From the Cauchy-Schwartz inequality and 3.24 , for all i ∈ k, k Δ − 1 , one has

3.25
By Lemma 3.4, we know that there exists k ≥ k 0 such that

Numerical Results
In this section, we compare the modified PRP conjugate gradient method, denoted the MPRP method, to VPRP method, CG-DESCENT method, and DL method under the strong Wolfe line search about problems 20 with the given initial points and dimensions.The parameters are chosen as follows: δ 0.01, σ 0.1, v 1.25, η 0.01, and t 0.1.If ||g k || ≤ 10 −6 is satisfied, we will stop the program.The program will be also stopped if the number of iteration is more than ten thousands.All codes were written in Matlab 7.0 and run on a PC with 2.0 GHz CPU processor and 512 MB memory and Windows XP operation system.
The numerical results of our tests with respect to the MPRP method, VPRP method, CG-DESCENT method, and DL method are reported in Tables 1, 2, 3, 4, respectively.In the tables, the column "Problem" represents the problem's name in 20 , and "CPU," "NI," "NF," and "NG" denote the CPU time in seconds, the number of iterations, function evaluations, gradient evaluations, respectively."Dim" denotes the dimension of the tested problem.If the limit of iteration was exceeded, the run was stopped, and this is indicated by NaN.
In this paper, we will adopt the performance profiles by Dolan and Moré 21 to compare the MPRP method to the VPRP method, CG-DESCENT method, and DL method in the CPU time, the number of iterations, function evaluations, and gradient evaluations performance, respectively see  4 show the performance of the four methods relative to CPU time, the number of iterations, the number of function evaluations, and the number of gradient evaluations, respectively.For example, the performance profiles with respect to CPU time means that for each method, we plot the fraction P of problems for which the method is within a factor τ of the best time.The left side of the figure gives the percentage of the test problems for which a method is the fastest; the right side gives the percentage of the test problems that are successfully solved by each of the methods.The top curve is the method that solved of the most problems in a time that was within a factor τ of the best time.
Obviously, Figure 1 shows that MPRP method outperforms VPRP method, CG-DESCENT method, and DL method for the given test problems in the CPU time.Figures 2-4 show that the MPRP method also has the best performance with respect to the number of iterations and function and gradient evaluations since it corresponds to the top curve.So, the MPRP method is computationally efficient.

Conclusions
We have proposed a modified PRP method on the basic of the PRP method, which can generate sufficient descent directions with inexact line search.Moreover, we proved that the proposed modified method converge globally for general nonconvex functions.The performance profiles showed that the proposed method is also very efficient.

Figure 1 :Figure 2 :
Figure 1: Performance profiles of the conjugate gradient methods with respect to CPU time.

1 YFigure 3 :
Figure 3: Performance profiles with respect to the number of function evaluations.

Figures 1 , 2 , 3 , 4 . 1 Figures 1 -
Figures1-4show the performance of the four methods relative to CPU time, the number of iterations, the number of function evaluations, and the number of gradient evaluations, respectively.For example, the performance profiles with respect to CPU time means that for each method, we plot the fraction P of problems for which the method is within a factor τ of the best time.The left side of the figure gives the percentage of the test problems for which a method is the fastest; the right side gives the percentage of the test problems that are successfully solved by each of the methods.The top curve is the method that solved of the most problems in a time that was within a factor τ of the best time.Obviously, Figure1shows that MPRP method outperforms VPRP method, CG-DESCENT method, and DL method for the given test problems in the CPU time.Figures2-4show that the MPRP method also has the best performance with respect to the number of iterations and function and gradient evaluations since it corresponds to the top curve.So, the MPRP method is computationally efficient.
Zhang et al. 14proposed a modified DL conjugate gradient method and proved its global convergence.Moreover, some researchers have been studying a new type of method called the spectral conjugate gradient method see15-17 .
18 .Suppose that, Assumption H holds. Consider any iteration of 1.2 -1.3 , where d k satisfies g T k d k < 0 for k ∈ N and α k satisfies the Wolfe line search, then Lemma 3.2.Suppose that Assumption H holds. Consider the method 1.2 -1.3 , where β k β MPRP k , and α k satisfies the Wolfe line search and 2.6 .If there exists a constant r > 0, such that

Table 1 :
denotes the number of the R λ k,Δ .Suppose that Assumption H holds. Let {x k } and {d k } be generated by 1.2 -1.3 , in which α k satisfies the Wolfe line search and 2.6 , β k β MPRP The numerical results of the modified PRP method.

Table 2 :
The numerical results of the VPRP method.

Table 2 :
Continued.Performance profiles with respect to the number of gradient evaluations.

Table 3 :
The numerical results of the CG-DESCENT method.

Table 4 :
The numerical results of the DL method.