An Efficient Hybrid Conjugate Gradient Method with the Strong Wolfe-Powell Line Search

1School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia 2Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, 21300 Kuala Terengganu, Terengganu, Malaysia 3Department of Computer Science and Mathematics, Universiti Teknologi MARA (UITM) Terengganu, Campus Kuala Terengganu, 21080 Kuala Terengganu, Terengganu, Malaysia


Introduction
The nonlinear conjugate gradient (CG) method is a useful tool to find the minimum value for unconstrained optimization problems.Consider the following form: where  :   →  is continuously differentiable function and its gradient is denoted by () = ∇().The CG method is to find a sequence of points   ,  ≥ 1, starting from initial point  1 ∈   which is given as the following iterative formula: where   is the current iteration point and   > 0 is the step length obtained by some line search.The search direction   is defined by where   = (  ) and   is known as CG method, formula, or coefficient.To find the step length (  ) we can use the exact line search which is given by  (  +     ) = min  (  +   ) ,  ≥ 0. (4) In fact, (4) is not an effective line search since it needs heavy computations in function and gradient evaluations.Therefore, we prefer to use inexpensive line search.The strong Wolfe-Powell (SWP) line search [1,2], which is given as follows, (  +     ) ≤  (  ) +        , where 0 <  <  < 1, is to find an approximation of   where the descent property (see (14)) must be satisfied and no longer searching in the direction when   is far from the solution.Thus, by using SWP line search we inherit the advantages of exact line search with inexpensive and low computational cost.However, different choices of   and   imply different CG methods.In fact the SWP line search is a strong version of weak Wolfe-Powell (WWP) line search where the latter is given by (5) and The CG method has been developed recently based on its simplicity, numerical efficiency, and low memory requirements.Thus, it is used widely in engineering medical science and other fields.As an application in engineering we can use CG method to solve some real life problem similar to that mentioned in [3].The CG method is limited for the functions where their gradient is available.Thus, the heuristic algorithm [4] can be used as an alternative method to find the solution for general functions.A heuristic algorithm is to find an approximation solution for the objective functions with accepted time.In addition, the heuristic algorithms can be applied without using computers.We refer the reader to see some applications for this algorithm in [5][6][7].
The most popular formulas for   are Hestenes-Stiefel (HS) [8], Fletcher-Reeves (FR) [9], Polak-Ribière-Polyak (PRP) [10], and Wei et al. (WYL) [11], respectively, as follows: where Hestenes-Stiefel [8] proposed the first formula for solving the quadratic functions in 1952.Fletcher and Reeves [9] presented the first formula (9) for nonlinear functions in 1964.The convergence properties of FR method with exact line search were obtained by Zoutendijk [12].Al-Baali [13] proved that FR method is globally convergent with the SWP line search when  < 1/2.Later on Guanghui et al. [14] extended the result to  ≤ 1/2.The global convergence of PRP method (10) with the exact line search was proved by Elijah and Ribiere in [10].Powell [15] gave out a counterexample showing that there exists nonconvex function, where PRP method does not converge globally, even when the exact line search is used.Powell suggested using nonnegative PRP method to reveal this problem.Gilbert and Nocedal [16] proved that nonnegative PRP (PRP+) method, that is,   = max{ PRP  , 0}, is globally convergent under complicated line searches.However, there is no guarantee that PRP+ is convergent with SWP line search for general nonlinear functions.
Touati-Ahmed and Storey [17] suggest the following hybrid method: In 2006 Wei et al. [11] presented a new positive CG method (11), which is quite similar to original PRP method which has been studied in both exact and inexact line search.Many modifications have appeared, such as the following [18][19][20], respectively: where  ≥ 1. ( Recently many CG formulas were constructed in order to get the efficiency and robustness.For more about the latest CG methods we refer the reader to see [21,22].One of the important rules in CG methods is the descent condition; that is, if one can prove then we have a guarantee for ( +1 ) < (  ).If we extended (14) to the following form, then ( 15) is called the sufficient descent condition.This paper is organized as follows; in Section 2 we will present the current problem for PRP and nonnegative PRP method with SWP line search.Later on we will suggest the new hybrid CG formula and its simplifications.Furthermore, we will establish the global convergence properties with the SWP line search, in Section 3. Numerical results with conclusion will be presented in Sections 4 and 5, respectively.

Motivation and the Hybrid Formula
The PRP formula is one of the best CG methods in this century.However, as we mentioned before this method fails to solve some standard test problems for nonconvex functions; even the exact line search is used.Thus, the main contribution of this paper is to extend using PRP formula in several cases with SWP line search under mild condition and restart the CG algorithm by using NPRP CG formula when PRP failed to satisfy that condition.
The following discussion illustrates the cases in which PRP method fails and succeeds with SWP line search to obtain the convergence properties.The PRP method could be simplified as follows: Therefore we have the following cases.
Case A. If     −1 > 0, then we have the following two possibilities.
In this case, PRP method is efficient and has global convergence properties.
In this case based on [16] we fail to obtain the global convergence properties for nonconvex functions.
In Case B there is no guarantee that this method will satisfy the sufficient descent condition.
For the next discussion, we will discuss the nonnegative PRP method which is given as follows: Therefore, we have a problem in Case B. To solve this problem Gilbert and Nocedal [16] used another line search to satisfy the convergence properties.In addition, if  PRP+  = 0, the CG method returns to the steepest descent method which is sometimes a weak tool to find the optimum point for functions.Furthermore we can notice that So from PRP method we can use only Case A1.To improve the above ideas, we suggest the following hybrid method: Thus, NPRP method is a suitable nonnegative value to use.
The following algorithm is an algorithm of CG method with the new coefficient  PRP * *  .Algorithm 1.Consider the following.

The Global Convergence Properties for
PRP * *  with SWP Line Search The following standard assumptions are necessary for this work.
In some open convex neighborhood  of Χ,  is continuous and differentiable, and its gradient is Lipschitz continuous; that is, for any ,  ∈ , there exists a constant  > 0 such that ‖() − ()‖ ≤ ‖ − ‖.
The following lemma is one of the most important lemmas which is used to prove the global convergence properties.
Lemma 2 (see [12]).Suppose Assumptions 1 and 2 are true.Consider any form of ( 2) and ( 3), with   computed by WWP line search direction   , is descent for all  ≥ 1; then where Equation ( 25) is known as Zoutendijk condition.
The following discussion will discuss the global convergence properties for  PRP * *  with SWP line search.

≤ 𝛽 FR
the proof of the global convergence properties is similar to  FR  .We refer the reader to see [13,14].Dividing both sides by ‖ −1 ‖ 2 , and by using Case 2 and ( 6), then we obtain
Gilbert and Nocedal [16] presented an important theorem (Theorem 4) to find the global convergence properties for nonnegative PRP method if the descent condition is satisfied.Furthermore [16] presented nice property called Property * as follows.
(III) The Zoutendijk condition ( 25) is satisfied by the line search.
Then the iterates are globally convergent.
The next lemma shows that if the gradients are bounded away from zero and Property * holds, then a certain fraction of steps cannot be too small.The proof is given in [16].However, we state it for readability.Lemma 5. Consider a CG algorithm as defined in ( 2) and ( 3 The proof is complete.
Lemma 6.The CG formula presented in Case 2 has the following properties: (1) (20) to be nonnegative.
(3)  PRP * *  satisfies the sufficient descent condition based on Theorem 3 and  ∈ (0, 1).By using Theorems 3 and 4 and Lemma 6 we have the following convergence result.The proof is similar to Theorem 4.3 which is presented in [16].

Numerical Results and Discussions
To evaluate the efficiency of the new method, we selected some of the test functions in Table 1 from CUTEr [24], Neculai [23], and Adorio and Diliman [25].We performed a comparison with other CG methods, including VHS, NPRP, PRP+, FR, and PRP * * formulas.The tolerance  is selected to 10 −6 for all algorithms to investigate the rapidity of the iteration methods towards the optimal solution.The gradient value is used as the stopping criteria.Here, the stopping criteria considered ‖  ‖ ≤ 10 −6 .We considered the method failed if the number of iterations exceeds 1000 times.
In Table 1 we selected different initial points for every function.Thus, this demonstrated that this method can be used in several real life functions from other fields such as engineering and medical science as we mentioned before.In addition different dimensions from 500 until 10000 are used.We also choose from different group of functions.We used Matlab 7.9 subroutine program, with CPU processor Intel (R) Core (TM), i3 CPU, and 2 GB DDR2 RAM under SWP line search to find the optimum point.The efficiency comparisons results are shown in Figures 1 and 2, respectively, using a performance profile introduced by Dolan and Moré [26].
This performance measure was introduced to compare a set of solvers  on a set of problems .Assuming   solvers and   problems in  and , respectively, the measure  , is defined as the computation time (e.g., the number of iterations or the CPU time) required to solve problem  by solver .
To create a baseline for comparison, the performance of solver  on problem  is scaled by the best performance of any solver in  on the problem using the ratio Thus,   () is the probability for solver  ∈  that the performance ratio  , is within a factor  ∈  of the best possible ratio.If we define the function   as the cumulative distribution function for the performance ratio, then the performance measure   :  → [0, 1] for a solver is nondecreasing and piecewise continuous from the right.The value of   (1) is the probability that the solver has the best performance of all of the solvers.In general, a solver with high values of (), which would appear in the upper right corner of the figure, is preferable.Based on the left side of Figures 1 and 2 the PRP * * formula is above the other curves.Therefore, it is the most efficient method among related PRP methods in terms of efficiency and robustness.In Figure 2 we see that the curve of PRP * * is still the best, but the efficiency is not good as the number of iterations since we use the complicated hybrid algorithm leads to high CPU time.Thus, using high processors computers to find the solution will be more efficient since the number of iterations decreased rapidly under PRP * * method.

Conclusion
In this paper, we proposed hybrid conjugate gradient method by using nonnegative PRP and NPRP formulas with the SWP line search which extended the cases of using PRP method under mild condition.The global convergence property is established and it is very simple.Our numerical results had shown that the hybrid method is the best when compared to other related PRP CG methods.

Figure 1 :
Figure 1: Performance profile based on the number of iterations.

Figure 2 :
Figure 2: Performance profile based on the CPU time.