A Derivative-Free Conjugate Gradient Method and Its Global Convergence for Solving Symmetric Nonlinear Equations

We suggest a conjugate gradient (CG) method for solving symmetric systems of nonlinear equations without computing Jacobian and gradient via the special structure of the underlying function. This derivative-free feature of the proposed method gives it advantage to solve relatively large-scale problems (500,000 variables) with lower storage requirement compared to some existing methods. Under appropriate conditions, the global convergence of our method is reported. Numerical results on some benchmark test problems show that the proposed method is practically effective.


Introduction
Let us consider the systems of nonlinear equations: where  :   →   is a nonlinear mapping.Often, the mapping, , is assumed to satisfy the following assumptions: (A1) There exists  * ∈   s.t.( * ) = 0.
(A2)  is a continuously differentiable mapping in a neighborhood of  * .
The prominent method for finding the solution of (1) is the classical Newton's method which generates a sequence of iterates {  } from a given initial point  0 via where  = 0, 1, 2, . ... The attractive features of this method are rapid convergence and easy implementation.Nevertheless, Newton's method requires the computation of the Jacobian matrix, which require the first-order derivative of the systems.
In practice, computations of some functions derivatives are quite costly and sometime they are not available or could not be done precisely.In this case Newton's method cannot be applied directly.
In this work, we are interested in handling large-scale problems for which the Jacobian either is not available or requires a low amount of storage; the best method is CG approach.It is vital to mention that the conjugate gradient methods are among the popular used methods for unconstrained optimization problems.They are particularly efficient for handling large-scale problems due to their convergence properties, simple implementation, and low storage [1].Notwithstanding, the study of conjugate gradient methods for large-scale symmetric nonlinear systems of equations is scanty, and this is what motivated us to have this paper.
In general, CG methods for solving nonlinear systems of equations generate iterative points {  } from initial given point  0 using International Journal of Mathematics and Mathematical Sciences where   > 0 is attained via line search and direction   is obtained using where   is termed as conjugate gradient parameter.These problems which are under study may arise from an unconstrained optimization problem, a saddle point problem, Karush-Kuhn-Tucker (KKT) of equality constrained optimization problem, the discretized two-point boundary value problem, the discretized elliptic boundary value problem, and so forth.
Equation ( 1) is the first-order necessary condition for the unconstrained optimization problem when  is the gradient mapping of some function  :   → , min  () ,  ∈   . ( For the equality constrained problem, min  () , where ℎ is a vector-valued function, the KKT conditions can be represented as system (1) with  = (, V), and where V is the vector of Lagrange multipliers.Notice that the Jacobian ∇(, V) is symmetric for all (, V) (see, e.g., [2]).Problem (1) can be converted to the following global optimization problem (5) with our function  defined by A large number of efficient solvers for large-scale symmetric nonlinear equations have been proposed, analyzed, and tested in the last decade.Among them, the most classic one entirely due to Li and Fukushima [3], in which a Gauss-Newton-based BFGS method is developed, and the global and superlinear convergence are also established.Subsequently, its performance is further improved by Gu et al. [4], where norm descent BFGS methods are designed.Since then, norm descent type BFGS methods especially cooperating with trust regions strategy are presented in the literature and showed their moderate effectiveness experimentally [5].Still the matrix storage and solving of -linear systems of equations are required in the BFGS type methods presented in the literature.The recent designed nonmonotone spectral gradient algorithm [6] falls within the framework of matrix free methods.
The conjugate gradient methods for symmetric nonlinear equations have received a good attention and take an appropriate progress.However, Li and Wang [7] proposed a modified Fletcher-Reeves conjugate gradient method which is based on the work of Zhang et al. [8], and the results illustrate that their proposed conjugate gradient method is promising.
In line with this development, further studies on conjugate gradient are inspired for solving large-scale symmetric nonlinear equations.Zhou and Shen [9] extended the descent three-term polak-Ribiere-Polyak of Zhang et al. [10] for solving (1) by combining with the work of Li and Fukushima [3].Meanwhile the classic polak-Ribiere-Polyak is successfully used to solve symmetric equation (1) by Zhou and Shen [1].
Subsequently Xiao et al. [11] proposed a method based on well-known conjugate gradient of Hager and Zhang [12], and the proposed method converges globally.Extensive numerical experiments showed that each of the above-mentioned methods performs quite well.The combination of conjugate gradient algorithms and the Newton method, for the first time, was presented by Andrei [13,14].Hence, in this paper, we intended to present a new enhanced CG parameter   which is matrix-and derivative-free, respectively.This is made possible by combining Birgin and Martínez direction with classical Newton direction.
We organized the paper as follows.In the next section, we present the details of the proposed method.Convergence results are presented in Section 3. Some numerical results are reported in Section 4. Finally, conclusions are made in Section 5.

Derivation of the Method
In this section we present a new CG parameter   , as a result of combining Birgin and Martínez direction with classical Newton direction.Recalling the Birgin and Martínez direction in [15] is defined by where   =     /     ; see Raydan [16] for detail.In [2] Ortega and Rheinboldt used the term to approximate the gradient ∇(  ), which avoids computing exact gradient and   updated via line search method.It is clear that when ‖  ‖ is small, then   ≈ ∇(  ).
Recall, from Newton's direction, Combining ( 9) and ( 11), we have Multiplying both sides of (12) by (  ) leads to After little linear algebra, (13) transforms to To ensure good approximation, we multiply both sides of ( 14) by    to obtain Equation ( 15) can be rewritten as From secant condition, we have It is vital to note that, for this work, we claim that (  ) is symmetric matrix ∀.Hence, (18) can also be written as Substituting ( 19) into ( 16) yields After simplification, we obtained our CG parameter (  ) as Motivated by [3,7] and using (10) we derive our CG parameter Having derived the CG parameter (  ) in ( 22) and by using (9), we then present our direction as where   =     /     .Finally, we present our scheme as The direction   given by (23) may not be a descent direction of ( 8), and then the standard Wolfe and Armijo line searches cannot be used to compute the step size directly.Zhang et al. [8] proved the global convergence of the global PRP method for general nonconvex optimization using some nondescent line search.Motivated by this, we use the nonmonotone line search proposed by Li and Fukushima in [3] to compute our step size   .Let  1 > 0,  2 > 0, and  ∈ (0, 1) be constants and let {  } be a given positive sequence such that Let   = max{1,   } that satisfy Now, we can describe the algorithm for our proposed method as follows.
Step 2. Compute   using (10) and test the stopping criterion.If yes, then stop; otherwise continue with Step 3.

Convergence Result
This section presents global convergence results of the derivative-free conjugate gradient methods.To begin with, let us define the level set In order to analyze the convergence of our method, we will make the following assumptions on nonlinear systems .
Case (ii) is as follows: consider lim sup  → ∞   = 0. Since   ≥ 0, this case implies that lim and by definition of   in (10)  ( By the mean-value theorem, there exists   ∈ (0, 1) such that Since {  } ⊂ Ω is bounded, without loss of generality, we assume   →  * .By ( 10) and ( 23), we have lim where we use (45) and ( 26) and the fact that the sequence { +1 } is bounded.
The code for the DFCG method was written in Matlab 7.4 R2010a and run on a personal computer 1.8 GHz CPU processor and 4 GB RAM memory.We stopped the iteration if the total number of iterations exceeds 1000 or ‖  ‖ ≤ 10 −3 .We tested the two methods on nine test problems with different initial points and  values.Problems 6-9 are from [9].
(57) Problem 6: Problem 7: (60)   (61) In Table 1, we listed numerical results, where "Iter" and "Time" stand for the total number of all iterations and the CPU time in seconds, respectively; ‖  ‖ is the norm of the residual at the stopping point.The numerical results indicate that the proposed method DFCG compared to IPRP has minimum number of iterations and CPU time, respectively.Figures 1 and 2 are performance profile derived by Dolan and Moré [17] which show that our claim is justified, that is, less CPU time and number of iterations for each test problem.
Moreover, on average, our ‖(  )‖ is too small which signifies that the solution obtained is true approximation of the exact solution compared to the IPRP.

Conclusions
Many practical problems possessed symmetrical property.In this paper we present a derivative-free conjugate gradient (DFCG) method for symmetric nonlinear equations and compare its performance with that of an inexact PRP conjugate gradient method [1] by doing some numerical experiments.We also proved the global convergence of our proposed method by using a backtracking type line search, and the numerical results show that our method is efficient.

Figure 1 :
Figure 1: Comparison of the performance of DFCG and IPRP methods as the dimension increases (in terms of CPU time).

Figure 2 :
Figure 2: Comparison of the performance of DFCG and IPRP methods as the dimension increases (in terms of number of iterations).
International Journal of Mathematics and Mathematical Sciences Properties (i) and (ii) imply that there exist positive constants  1 ,  2 , and  1 such that Let the sequence {  } be generated by the algorithms above.Then the sequence {‖  ‖} converges and   ∈  for all  ≥ 0. ) and the fact that {  } satisfies (25) the series ∑  =0 ‖    ‖ 2 is convergent.This implies (31).By a similar way, we can prove that (32) holds.