A New Modified Three-Term Conjugate Gradient Method with Sufficient Descent Property and Its Global Convergence

A new modified three-term conjugate gradient (CG) method is shown for solving the large scale optimization problems. The idea relates to the famous Polak-Ribière-Polyak (PRP) formula. As the numerator of PRP plays a vital role in numerical result and not having the jamming issue, PRP method is not globally convergent. So, for the new three-term CG method, the idea is to use the PRP numerator and combine it with any good CG formula’s denominator that performs well. The new modification of three-term CG method possesses the sufficient descent condition independent of any line search. The novelty is that by using the Wolfe Powell line search the new modification possesses global convergence properties with convex and nonconvex functions. Numerical computation with the Wolfe Powell line search by using the standard test function of optimization shows the efficiency and robustness of the new modification.


Introduction
The conjugate gradient method is an efficient and organized tool for solving the large-scale nonlinear optimization problem, due to its simplicity, easiness, and low memory requirements.This method is very popular for mathematician and engineers and those who are interested in solving the large-scale optimization problems [1][2][3].
In the new three-term modification, we put our attention on the numerator of PRP method, in which the parameter   is given as The PRP method is among one of the most efficient and reliable conjugate gradient method due to good numerical performance.The global convergence of PRP is established when the objective function  is strongly convex and the line search is exact [6].On the other hand Powell [33] through his analysis expressed that there exist nonconvex functions for which PRP method does not converge globally.Gilbert and Nocedal [34] established the so-called PRP + method; in this method  PRP +  is restricted to be nonnegative denoted as  PRP +  = max{0, PRP}.If the standard Wolfe line search (10) is used, then PRP + method attains the global convergence and also sufficient descent conditions are being satisfied.
Recently Sun and Liu [32] proposed a new conjugate gradient method called TMPRP 1 method by using the VFR formula from Wei et al. [35], in which the search direction is stated as where . This method has attractive property of satisfying the sufficient descent condition      = −‖  ‖ 2 independent of any line search and attains global convergence if standard Wolfe line is used.As compared with the strong Wolfe line search, the standard Wolfe line search takes less computation in order to get an acceptable step size at each iteration.Hence the standard Wolfe line search increases the effectiveness of the conjugate gradient method [32].
The rest of the paper is organized as follows.In Section 2, the motivation and formula for construction of three-term conjugate gradient method are given.In Section 3 we have presented Algorithm 1.1 in which the general form of threeterm conjugate gradient method is shown.In Section 4 the sufficient descent condition and the global convergence properties for convex and nonconvex function are proven.In Section 5, the detailed numerical results to test the proposed method are reported.

Motivation and Formula
Wei et al. [35] proposed three new formulas which are given in the following: There is an efficient conjugate gradient method named  2 *  (, ).In this formula, the denominator plays an important role in satisfying the sufficient descent condition and performs well in terms of global convergence and numerical result.This motivated us to take the denominator from this formula.Secondly, the PRP [6,7] method is considered to be one of the most proficient CG parameters due to the properties of its numerator (  −  −1 ).If the step taken becomes very small, then (  −  −1 ) reaches zero such as (  −  −1 ) ≈ 0. Afterwards  PRP  ≈ 0; then the search direction continued as the steepest descent method.So the numerator of PRP method worked efficiently and does not jam.
This motivated us to construct a new modified three-term conjugate gradient method, such as Further, Powell [36] showed that the PRP method can cycle infinitely without approaching a minimum point, even if the step size   is chosen to the least positive minimizer.
To overcome this, Gilbert and Nocedal [34] showed their analysis So In  BZAU  ,  BZAU+  , and  BZAU  , the parameters  and  have an important role in the sense that when  is getting smaller, the numbers of iteration, function evaluation, and gradient evaluation are decreased and when  is getting larger, the numbers of iteration, function evaluation, and gradient evaluation are also decreased.So we observe that the best value for the parameters is (, ) = (1, 2).
Step 2. Compute the search direction ( 19) by using  BZAU  and  BZAU  where where Step 3. Determine the step size   > 0 by the Wolfe line search (10).
Step 4. Compute  +1 =   +     where   is given in Step 3 and   is given in Step 2.

Global Convergence of Modified Three Term
then Assumptions (A1) and (A2) and [32,34] imply that there exist positive constants  and  such that Since (  ) is decreasing as  → +∞, from Assumption (A1) it is shown that the sequence (  ) created by Algorithm 1.1 will be contained in a bounded region.Then the sequence (  ) is convergent.Now we will prove the sufficient descent condition independent of line search      = −‖  ‖ 2 and also ‖  ‖ ≤ ‖  ‖.From (19), Multiplying by    , we obtain that is, Hence the sufficient descent condition independent of line search holds.Now we prove that ‖  ‖ ≤ ‖  ‖.
Lemma 1. Assumptions (A1) and (A2) hold if  0 is supposed to be an initial point.Now consider any method in the form of ( 2), in which   is a descent direction and   satisfies the Wolfe condition (10) or the strong Wolfe line search condition (11).Then we have the Zoutendijk condition: which is normally used to prove the global convergence of CG method.From (29) the Zoutendijk condition is equivalent to the following inequality: Definition 2 (see [32]).The function  is called uniformly convex on R  , if there exists a positive constant  such that where the function  has the Hessian matrix ∇ 2 (  ).
We now show the global convergence of Algorithm 1.1 for uniformly convex functions.Lemma 3. Let both sequences (  ) and (  ) be generated by Algorithm 1.1 and suppose that (32) holds; then where  1 = (1 − ) −1 (/2),  is a positive constant, and  is a positive number whose range is 0 <  <  < 1, from the Wolfe line search (10).
Proof.Detail of proof can be seen in Lemma 2.1 of [37].
The proof has the following two parts.
Part 1.We noticed that for any  and  we have  ≥ , such that Proceeding the same proof of Theorem 2.2, step 1 from [32], we have At the beginning of proof we assume that lim →∞ inf ‖  ‖ ̸ = 0; then there exists a constant  > 0 and also there exists which contradicts with (A2), (31), and (51).Hence it is proved that lim →∞ inf ‖  ‖ = 0.

Numerical Results
In this part we compare the numerical results of proposed three-term BZAU (Bakhtawar, Zabidin, Ahmad and Ummu) method with recently developed TMPRP1 method and also compare their performance.The Wolfe line search (10) is used and the values of the parameters for BZAU and TMPRP1 method are  = 2, 10 (−4) ;  = 1, 0;  = 0.1, 0.1; and  = 0.5, 0.5,  respectively.The code was written in Matlab 7.1 and run on an i5 computer with 2.40 GHz CPU processor, 2.0 GB RAM memory.We test the functions taken from [38] with dimension ranges [2,5000].The main purpose in optimization for the selection of large number of test functions is to test the unconstrained optimization algorithms properly.Dantzig (1914Dantzig ( -2005) ) said the final test of a theory is its capacity to solve the problems which originated it.This is one of the main reasons we select the large-scale unconstrained optimization problems to test the theoretical progress in numerical form through mathematical programming [38].Moré et al. [39] claimed the efficiency of a method and that algorithm for a small number of test functions is not suitable because this will lead to the choice of an algorithm that is not favorable.Testing a method or algorithm for a large number of test functions would lead to large amount of data and from that data we can interpret which method or algorithm is more efficient and robust.But the number of test functions should not be very large nor very small, so there is a benchmark of 75 numbers of test functions which are chosen to test the efficiency of any method.
Practically, optimizers need to evaluate nonlinear optimization method.To prove the global convergence properties of any method, the theory is not enough to determine the reliability and efficiency of a method.As a result, the robustness of any method is established by testing the large number of test problems [38].
In global convergence property  BZAU  is used in case of proving convex function and property  BZAU+  is used for proving nonconvex function.But in the numerical part  BZAU+  is used for a comparison with TMPRP1 method.The TMPRP1 possesses the sufficient descent property without any line searches.Theoretically, TMPRP1 method established well and converges globally.When it comes to numerical computation, the TMPRP1 method is tested by a benchmark of 75 numbers of test functions and shows a promising result.Hence the TMPRP1 method is then compared with our BZAU method.
In Table 1 number of iterations, number of function evaluations, number of gradient evaluations, and CPU time are represented by NI/NF/GE/CT.If the CT exceeds 500 seconds and the NI is more than 10000 iterations, then the function is given the name of Fail F. This standard is followed by every paper.For most of the function we can get the result within this limit and the function that does not come in this limit is named Fail F.
The performance profiles are adopted by Dolan and Moré [40].In Figures 1-4 we compare the performance based on the NI/NF/GE/CT.For every method, we plot fraction  of problems for which the method is within a factor  of  4 shows that the BZAU method outperforms the TMPRP1 method in every case.The top curve is the most efficient method, so the new modified three-term CG method is also efficient in terms of numerical result.

Conclusion
In this paper, we have proposed a new modified three-term conjugate gradient method for unconstrained optimization.
The new modified three-term BZAU method possesses the sufficient descent property independent of any line search.Global convergence is shown for both convex and nonconvex functions using the Wolfe line search.In numerical result we compare the three-term BZAU method with TMPRP1 method [32].As in [32] the TMPRP1 method is shown to be numerically efficient when it comes to comparison with other two robust methods such as CG Descent method [41] and DTPRP method [42].That is the reason for comparing our BZAU method with TMPRP1 method.

Figure 1 :
Figure 1: Performance profiles based on the number of iterations.

Figure 2 :
Figure 2: Performance profiles based on the CPU time.

)
Part 2. Taking a bound on the direction   .

Table 1 :
A list of test problem functions.

Table 1 :
Continued.The left hand side of the figure represents the percentage of the test problem of which method is robust and the fastest; the right hand side of the figure shows the percentage of test problems that are solved successfully by either the BZAU or TMPRP1 method.In the graph there are two axes  and , as there are much values of  which creates difficulty in understanding the graph.The value of -axis is then converted in natural log of  so it shows -axis values in exponent like  0 ,  1 ,  2 ,  3 and the values of -axis are taken in linear form of .Comparing Figures 1-