A Conjugate Gradient Algorithm under Yuan-Wei-Lu Line Search Technique for Large-Scale Minimization Optimization Models

This paper gives a modified Hestenes and Stiefel (HS) conjugate gradient algorithm under the Yuan-Wei-Lu inexact line search technique for large-scale unconstrained optimization problems, where the proposed algorithm has the following properties: (1) the new search direction possesses not only a sufficient descent property but also a trust region feature; (2) the presented algorithm has global convergence for nonconvex functions; (3) the numerical experiment showed that the new algorithm is more effective than similar algorithms.


Introduction
Consider the minimization optimization models defined by where the function  : R  → R and  ∈  2 .There exist many good algorithms for (1), such as the quasi-Newton methods [1] and the conjugate gradient methods [2][3][4][5], where the iterative formula of the conjugate gradient algorithm for (1) is designed by where   is the th iterative point,   is the steplength, and   is the so-called conjugate gradient search direction with where   is a scalar determined from different conjugate gradient formulas and the HS method [3] is one of the most well-known conjugate gradient methods, which is where   =  +1 −   ,  +1 = ∇( +1 ) and   = ∇(  ).The HS method has good numerical results for (1); however, the convergent theory is not interesting especially for the nonconvex function.At present, there exist many good conjugate gradients (see [6][7][8], etc.).Yuan, Wei, and Lu [9] gave a modified weak Wolfe-Powell ( where  ∈ (0, 1/2),  1 ∈ (0, ),  ∈ (, 1), and ‖ ⋅ ‖ denotes the Euclidean norm.It is well known that there exist two open problems which are the global convergence of the normal BFGS method and the global convergence of the PRP method for nonconvex functions under the inexact line search technique, where the first problem is regarded as one of the most difficult one thousand mathematical problems of the 20th century [10].Yuan et al. [9] partly solved these two open problems under the YWL technique, and the numerical performance shows that the YWL technique is more competitive than the normal weak Wolfe-Powell technique.Further study work can be found in their paper [11].By (5), it is not difficult to see that the YWL conditions are equivalent to the weak Wolfe-Powell (WWP) conditions if − 1      < (  /2)‖  ‖ 2 holds, which implies that the YWL technique includes the WWP technique in some sense.Motivated by the above observations, we will make a further study and propose a new algorithm for (1).The main features of this paper are as follows: (i) A modified HS conjugate gradient formula is given, which has not only a sufficient descent property but also a trust region feature.
(ii) The global convergence of the given HS conjugate gradient algorithm for nonconvex functions is established.
(iii) Numerical results show that the new HS conjugate gradient algorithm under the YWL line search technique is better than the normal weak Wolfe-Powell technique.
This paper is organized as follows.In Section 2, a modified HS conjugate gradient algorithm is introduced.The global convergence of the given algorithm for nonconvex functions is established in Section 3 and numerical results are reported in Section 4.

Motivation and Algorithm
The nonlinear conjugate gradient algorithm is simple and has low memory requirement properties and is very effective for large-scale optimization problems, where the HS method is one of the most effective methods.However, the normal HS method has good numerical performance but fails in the convergence of nonconvex functions under the inexact line search technique.In order to overcome this shortcoming, a modified HS formula is defined by where   =  +1 −   and  1 ,  2 , and  3 are positive constants.This formula is inspired by the idea of these two papers [6,8].In recent years, lots of scholars like to study the three-term conjugate gradient formula because of its good properties [7].In the next section, we will prove that the new formula possesses not only a sufficient descent property but also a trust region feature.The sufficient descent property is good for the convergence and the trust region makes the convergence easy to prove.Now, we give the steps of the proposed algorithm as follows.

Sufficient Descent Property, Trust Region Feature, and Global Convergence
This section will prove some properties of Algorithm 1.
Lemma 2. The search direction   is designed by (7); the following two relations hold: where  * > 0 is a constant. Proof.
Inequality (8) shows that the new formula has a sufficient descent property and inequality (9) proves that the new formula possesses a trust region feature.Both of these properties ( 8) and ( 9) are good theory characters and they play an important role in the global convergence of a conjugate gradient algorithm.The following global convergence theory will explain all this.
The following general assumptions are needed.
The objective function () is bounded below, twice continuously differentiable, and is Lipschitz continuous; namely, the following inequality is true: where  > 0 is the Lipschitz constant.
By Lemma 2 and Assumption A, similar to [9], it is not difficult to show that the YWL line search technique is reasonable and Algorithm 1 is well defined.Here, we do not state it anymore.Now, we prove the global convergence of Algorithm 1 for nonconvex functions.Proof.By ( 5), (8), and ( 9), we obtain Summing these inequalities for  = 0 to ∞ and using Assumption A (ii) generate Inequality ( 14) implies that lim is true.By ( 6) and ( 8) again, we get Thus, the inequality holds, where the first inequality follows (8) and the last inequality follows (11).Then, we have By ( 15) and (18), we have lim Therefore, we get (12) and the proof is complete.

Numerical Results Performance
This section will give numerical results of Algorithm 1 and the similar algorithms for comparing them.We will give another two algorithms for comparison; they are listed as follows.

Problems and Experiment.
The following are some notes.
Test Problems.These problems and the related initial points are listed in Table 1; the detailed problems can be found in Andrei [12], and some papers also use these problems [13].
Experiments.Codes are run on Intel(R) Xeon(R) CPU, E5507 @2.27 GHz, and 6.00 GB memory and Windows 7 operation system and written by MATLAB R2009a.
Dimension.Large-scale dimensions  = 3000, 6000, 12000, and 30000.Other Cases.The line search technique accepts   if the searching number is more than 6 and the algorithm will stop if the total iteration number is larger than 800.
The numerical results are listed in Table 2, where "Number" is the tested problems number; "Dim." is the problems dimension; "NI" is the total iteration number; "CPU" is the system CPU time in seconds; "NFG" is the total number of functions and gradients.

Results and Discussion. We use the tool of Dolan and
Moré [14] to analyze the efficiency of the three given algorithms.Figures 1 and 2 show that the performance of Algorithm 1 is the best and that Algorithm 1 has the best robust property among those three methods and Algorithm 2 is better than Algorithm 3, which shows that the given formula ( 7) is competitive to the normal three-term conjugate gradient formula (20) and the YWL line search technique is more effective than the norm WWP technique, and all of these conclusions are coincident with the results of [9].Algorithm 1 in Figure 3 is competitive to the other two algorithms and it has the best robust property.It is not difficult to see that Figure 3 shows that Algorithm 1 is not so good and we think the reason is formula (7) or the YWL technique since more information is needed and hence more CPU time is necessary.

Conclusions
This paper proposes a modified HS three-term conjugate gradient algorithm for large-scale optimization problems and the given algorithm has some good features.
(1) The modified HS three-term conjugate gradient possesses a trust region property, which makes the global convergence of the general functions easy to get.However, the normal HS formula including many other conjugate gradient formulas does not have this feature, which may be the crucial point for the global convergence of the general functions.
(2) The largest dimension of the test problems is 30000 variables and the numerical results show that the presented algorithm is competitive to other similar methods.More experiments will be done to prove the performance of the proposed algorithm in the future.

Figure 2 :
Figure 2: NFG performance of these methods.

Figure 3 :
Figure 3: CPU time performance of these methods.