Nonlinear Conjugate Gradient Coefficients with Exact and Strong Wolfe Line Searches Techniques

. Nonlinear conjugate gradient (CG) methods are very important for solving unconstrained optimization problems. These methods have been subjected to extensive researches in terms of enhancing them. Exact and strong Wolfe line search techniques are usually used in practice for the analysis and implementation of conjugate gradient methods. For better results, several studies have been carried out to modify classical CG methods. The method of Fletcher and Reeves (FR) is one of the most well-known CG methods. It has strong convergence properties, but it gives poor numerical results in practice. The main goal of this paper is to enhance this method in terms of numerical performance via a convexity type of modi ﬁ cation on its coe ﬃ cient β k . We ensure that with this modi ﬁ cation, the method is still achieving the su ﬃ cient descent condition and global convergence via both exact and strong Wolfe line searches. The numerical results show that this modi ﬁ ed FR is more robust and e ﬀ ective.


Introduction
An unconstrained optimization problem is solved using the nonlinear conjugate gradient method to obtain the minimal value of a given function. The CG technique is usually expressed as follows: such that f : R n ⟶ R, f ∈ C 1 ðRÞ, and gðxÞ are given function denoting its gradient. To apply CG methods in solving (1), we start with an initial point x 1 ∈ R n and use the iterative form such that x k is the current iterative point, α k > 0 is a step-size determined by some line searches, and d k is the search direction defined by where g k = ∇f ðx k Þ and β k are scalar. Different choices of the scalar β k relate to different conjugate gradient algorithms. Over the years, several versions of this approach have been presented, some of which are now extensively utilized. β k has at least six well-known formulas, which are listed below (Dai and Yaxiang [1], Conjugate descent [2], Fletcher and Colin [3], Liu and Storey [4], Hestenes and Stiefel [5], and Polak and Ribiere-Polyak [6]): Many authors have examined the behavior of global convergence for the β k 's formulas with several line search for several years (see, for example, [1][2][3][4][7][8][9][10][11][12][13][14][15][16][17][18]). When the objective function is a strongly convex quadratic and the line search is exact, these methods are identical, since the gradients are mutually orthogonal, and the scalars β k in these methods are equal. When applied for general nonlinear functions with inexact line searches, the behavior of these methods is clearly distinct (see [11,17,[19][20][21]). One of the most important properties of the CG methods is global convergence. Zoutendijk [22] proved the global convergence of the FR method via the exact line search. Although FR, DY, and CD methods have strong convergence properties, they may not perform well in practice [14]. The CD and DY methods were proved to have a global convergence under strong Wolfe line search [3,15]. Moreover, under the exact and strong Wolfe line searches, up to our knowledge, the global convergence and sufficient descent property of some CG methods such as the PRP and the HS have not been established [2,14]. Andrei [9] classified the CG into groups, these are scaled CG method, classical CG method, and hybrid and parameterized CG methods.
Formulas (4) to (9) are in the classical group. One of the most important among them is the FR method, in which when a bad direction and a little step from x k−1 to x k are generated, the next search direction and the next step s k = x k − x k−1 = α k−1 d k−1 will be delayed unless a rest along the gradient direction is performed. Despite this defect, it is well known that the FR method is globally convergent for general nonlinear functions with exact or inexact line search [18].
Al-Baali [7] proved the global convergence of the FR method if the strong Wolfe line search is used and the parameter σ is restricted in ð0, 1/2Þ. Also, Liu and Li [17] extended Al-Baali's result to the case that σ = 1/2. Moreover, Gilbert and Jorge [14] investigated global convergence properties of the dependent FR conjugate gradient method with β k satisfying jβ k j ≤ β FR k , provided that the line search satisfies the strong Wolfe line search. If β k satisfies jβ k j ≤ cβ FR k (where c > 1), they have given an example to indicate that even exact line search cannot guarantee the global convergence property of the dependent FR conjugate gradient method. Nosratipour and Keyvan [23] proved that the dependent FR conjugate gradient method is globally convergent with the strong Wolfe line search if β k satisfies where 0 < σ < 1, 0 < σ < 1/2, l i = jβ i /β FR i j, and c > 0. Zhang and Donghui [24] proposed a modified FR method (called MFR) in which the direction d k is defined by which is a descent direction independent of the line search. Competitive numerical and global convergence results are obtained by newly introduced or modified techniques; see for instance [10] and the references therein.
This paper is organized as follows: Section 2 is devoted obtain the modification and introduce the algorithm for the modified FR method. In Section 3, we have determined sufficient descent and global convergence properties under exact and strong Wolfe line searches. Section 4 provides preliminary numerical results and considerations. Section 5 shows the conclusions.

Motivation and Properties
Several authors have attempted to modify classical CG methods such as FR, PRP, HS, and LS in order to produce new modifications with sufficient descent and global convergence properties. In addition to that, the new modifications are expected to have an efficient numerical performance. It is well known that the FR method has poor numerical performance but has strong convergence properties. The main aim from this paper is to overcome this flaw, using the following modification to the FR formula: where 0 < θ < 1, θ is a scalar parameter, which is to be determined later. Note that if θ = 1, then β NMFR where k:k means the norm in R. (12) satisfies the following inequalities: Step 2. Evaluate β k according to (12).

Journal of Mathematics
Step 3. Evaluate d k according to (3); if kg k k ≤ ε, then stop; otherwise, go to the next step.
Step 5. Renew the point according to (2), if kg k k ≤ ε , then stop Step 6. Set k = k + 1, and go to step 2.
The ideal method to choose the step length is via exact line search. But since it is too expensive to choose it via exact line search in practice, some approximation methods called inexact line search such as strong Wolfe are used to define the step length that give suitable reductions in the objective function with minimal cost. However, the convergence properties of some CG methods such as FR, RMIL, and RMIL+ have been established under exact line search (see [3,6,25]).

Convergent Analysis
In this section, we will examine the convergence properties of β NMFR k . The main feature of Algorithm 1 is achieving sufficient descent conditions and global convergence properties according exact and strong Wolfe line searches.

Convergent Analysis via the Exact Line Search.
In this subsection, we show that our modification (12) will possess sufficient descent condition and global convergence properties according to the exact line search. (2) and (3) be generated by Algorithm 1 and α k be determined by the exact line search (14), where β k = β NMFR k is given as (12); there exists c > 0, such that

Theorem 1. Assume that
The proof of this Theorem 1 is obvious, from (3), and multiply by g k+1 ; then, When g T k+1 d k = 0, then (17) holds true for all k ≥ 1 and becomes 3.1.1. Global Convergence Properties. In this subsection, we show that our modified (12) coefficient satisfies global convergence according to the exact line search.

Assumption 2.
(i) There exists some positive constant c such that c ≥ k f ðxÞk for all x ∈ R n and f ∈ C 1 ðNÞ for some neighborhood N of Γ = fx ∈ R n jf ðxÞ ≤ f ðx 0 Þg. Also, assume that there exists a positive constant C 0 such that (ii) There exists positive constant C 1 From the above assumption, we can easily see that The following lemma will be used in our analysis (see Zoutendijk [22]).

Lemma 3.
Assume that Assumption 2 holds, for any conjugate gradient method as in (2), such that d k is a descent search direction and α k satisfies the exact line search. Then, Theorem 4. Assume that Assumption 2 holds, for conjugate gradient method as in (2) and (3) such that α k is achieved via exact line search. Furthermore, assume that the sufficient descent condition is satisfied. Then, Proof. The proof will be conducted by contradiction argument. So, assume that the statement of Theorem 4 is false. Thus, there exists some positive constant c, where We rewrite (3) as Taking the square on both sides of above equation, we Dividing both sides of (27) by kg k+1 k 4 , we obtain From (19) and (13), we obtain Recursively using (29) and noticing that kd Hence, As a result of (32) and (25), it is clear that which contradicts Lemma 3's Zoutendijk condition; hence, the proof is completed.

Convergent Analysis according to Strong Wolf Line
Searches. In this subsection, we show that our modified (12) coefficient satisfies sufficient descent condition and global convergence according to strong Wolfe line search (15) and (16).

Sufficient Descent Condition
Theorem 5. Let Assumption 1 hold true, for σ < 1/2 and θ = 0:3. Then, Algorithm 1 assures the sufficient descent condition (17) such that Proof. Multiplying (3) by g k+1 and (13) and taking the absolute value of the second term in (36), we obtain From (16) and the Cauchy-Schwartz inequality, we get Dividing both sides in the above inequality by kg k+1 k 2 , we obtain By repeating this process and the fact that kd 1 k = kg 1 k, we get Therefore, from (39), we can deduce that (17) holds for k ≥ 0. The proof is completed.

Global
Convergence. This subsection is devoted to the global convergence in the case of inexact line search technique. The following lemma is collected from [22]. Lemma 6. Let Assumption 1 and let any conjugate gradient method be in the form (2), where the descent direction is d k+1 and α k satisfies the strong Wolfe line search (15) and (16).
Proof. The proof will be conducted by contradiction argument. So, assume that the statement of Theorem 7 is not true; then, there exists a constant ϵ > 0, such that Equation (3) can be written as and multiplying (45) on both sides by d k+1 , we obtain Dividing both sides of (46) by kg k+1 k 4 with the help of (13), we get From Cauchy-Schwartz inequality, we get   Journal of Mathematics So we come to Referring to (39), using Cauchy-Schwartz inequality, we get Then, Substituting (51) into (49), we get where γ = 1/2 − 1/c. Recursively using (52) and noticing that kd 1 k 2 = kg 1 k 2 , we have As a result of (53) and (44), it can be concluded that which contradicts (43); hence, the proof is completed.

Numerical Results and Discussion
Most of the test problems used in this study are taken from Andrei [8], and they are used to evaluate the efficiency of the NMFR method to that of FR and PRP under exact line search and to that of FR, PRP, and CD under strong Wolfe line search. The step-size is computed using the exact and strong Wolfe line search techniques, and numerical results are compared based on the number of iterations and CPU time. For all test problems, the stopping criteria are set to be kg k k ≤ ϵ, where ϵ = 10 −6 ; for each test problem, various starting points are used, as suggested by Hillstrom [26]. All runs are performed on a PC ACER (Intel® Core™ i3-3217u CPU @ 1.8 GHZ, with 4.00 GB RAMS, Windows 7 Ultimate). Every problem mentioned in Table 1 is solved using Matlab10 subroutine programming. The performance results are shown in Figures 1-4, respectively, using a performance profile introduced by Dolan and Jorge [13].
In the list of problem functions, IN indicates the number of iterations, and CPU indicates the CPU time as shown in Table 1. F indicates that the test problem function is failing. Also, in some cases, it means that the computation came to a halt when a line search failed to locate a positive step-size, and it was deemed a failure.
We offer the concept of a way of evaluating and examine the effectiveness of set solvers s on a test set p using the performance profile. Assuming there are n s solvers and n p  7 Journal of Mathematics problems; they characterized t p,s as the computing time (computational time, CPU time, or other factors) needed to tackle problems p by solver s. They used the performance ratio r p,s = t p,s /min ft p,s : s ∈ Sg to compare solver s performance on problem p to the best performance by any solver on this problem.
We let r M ≥ r p,s for each chosen p and s such that r p,s = r M , whenever s is not a solution of problem p. The solution for performance s of presented problems has to be reliable, and we wish to have the entire evaluation of solution for performance s. Thus, we define PðtÞ s = ð1/n p Þsizefp ∈ P : r p,s ≤ tg, where PðtÞ s stands for the probability of solution for performance s ∈ S and P s is a cumulative distribution function for r p,s . The value Pð1Þ s is the probability that the solver will win over the rest of the solvers. In all, a solver with high values of PðtÞ s or at the top right of the figures is preferable or represents robust solver.
Clearly from Figures 1 and 2, we see that NMFR has better performance, since it solved all the test problems and reached 100% percentage. These CG coefficients could also be divided into three categories: the first category of which consists of NMFR, while the second consists of FR, and the third consists of PRP. It is easy to see that their performance is much better. Although the performance of the third category seems to be much better than NMFR, it could only solve 86% of the problems, whereas the performance of the second category only reached 89%. Hence, we considered NMFR as an efficient performance and robust method with the others because it can solve all the problems. Also, Figures 3 and 4 show that the curve of NMFR is higher than that of PRP, CD, and FR. This implies that the NMFR approach outperforms the other three methods significantly. Furthermore, the NMFR approach solves all problems; meanwhile, the PRP method solves about 86 percent of problems, and the CD and FR methods solve about 89 percent. As a result, we can infer that NMFR is the preferred approach because it has the highest curve and solves all problems.

Conclusion
In this article, we have proposed a new and simple modification for β FR k that is easy to implement, known as β NMFR k . Numerical results show that β NMFR k has efficient performance compared to other standard CG methods. In contrast to β FR k , we have seen that β NMFR k shows good numerical performance at each step. We have also proved that β NMFR k converges globally based on the exact and strong Wolfe line searches.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.