A Limited Memory BFGS Method for Solving Large-Scale Symmetric Nonlinear Equations

A limited memory BFGS (L-BFGS) algorithm is presented for solving large-scale symmetric nonlinear equations, where a line search technique without derivative information is used. The global convergence of the proposed algorithm is established under some suitable conditions. Numerical results show that the given method is competitive to those of the normal BFGS methods.


Introduction
where ℎ : R → R is continuously differentiable, the Jacobian ∇ℎ( ) of is symmetric for all ∈ R , and denotes the large-scale dimensions. It is not difficult to see that if is the gradient mapping of some function : R → R, problem (1) is the first order necessary condition for the problem min ∈R ( ). Furthermore, considering where is a vector-valued function, then the KKT conditions can be represented as the system (1) with = ( , V) and ℎ( , V) = (∇ ( ) + ∇ ( )V, ( )), where V is the vector of Lagrange multipliers. The above two cases show that problem (1) can come from an unconstrained problem or an equality constrained optimization problem in theory. Moreover, there are other practical problems that can also take the form of (1), such as the discretized two-point boundary value problem, the saddle point problem, and the discretized elliptic boundary value problem (see Chapter 1 of [1] in detail). Let be the norm function ( ) = (1/2)‖ℎ( )‖ 2 ; then problem (1) is equivalent to the following global optimization problem: where ‖ ⋅ ‖ is the Euclidean norm. In this paper we will focus on the line search method for (1), where its normal iterative formula is defined by where is the so-called search direction and is a steplength along . To begin with, we briefly review some methods for . [2]). The stepsize is determined by

(i) Normal Line Search (Brown and Saad
where ∈ (0, 1) and ∇ ( ) = ∇ℎ( )ℎ( ). The convergence is proved and some good results are obtained. We all know that the nonmonotone idea is more interesting than the normal technique in many cases. Then a nonmonotone line search technique based on this motivation is presented by Zhu [3].
(ii) Nonmonotone Line Search (Zhu [3]). The stepsize is determined by , and is a nonnegative integer. The global convergence and the superlinear convergence are established under mild conditions, respectively. It is not difficult to see that, for the above two line search techniques, the Jacobian matrix ∇ℎ must be computed at each iteration, which obviously increase the workload and the CPU time consumed. In order to avoid this drawback, Yuan and Lu [4] presented a new backtracking inexact technique.
(iii) A New Line Search (Yuan and Lu [4]). The stepsize is determined by where ∈ (0, 1) and ℎ = ℎ( ). They established the global convergence and the superlinear convergence. And the numerical tests showed that the new line search technique is more effective than those of the normal line search technique. However, these three line search techniques can not directly ensure the descent property of . Thus more interesting line search techniques are studied.
(iv) Approximate Monotone Line Search (Li and Fukushima [5]). The stepsize is determined by where = , ∈ (0, 1), is the smallest nonnegative integer satisfying (8), 1 > 0 and 2 > 0 are constants, and is such that The line search (8) can be rewritten as it is straightforward to see that as → 0, the right-hand side of the above inequality goes to (1 + )‖ℎ ‖ 2 . Then it is not difficult to see that the sequence { } generated by one algorithm with line search (8) is approximately norm descent. In order to ensure the sequence { } is norm descent, Gu et al. [6] presented the following line search.
In the following, we present some techniques for .
(i) Newton Method. The search direction is defined by Newton method is one of the most effective methods since it normally requires a fewest number of function evaluations and is very good at handling ill-conditioning. However, its efficiency largely depends on the possibility to efficiently solve a linear system (12) which arises when computing. Moreover, the exact solution of the system (12) could be too burdensome or is not necessary when is far from a solution [7]. Thus the quasi-Newton methods are proposed.
(ii) Quasi-Newton Method. The search direction is defined by where is the quasi-Newton update matrix. The quasi-Newton methods represent the basic approach underlying most of the Newton-type large-scale algorithms (see [3,4,8], etc.), where the famous BFGS method is one of the most effective quasi-Newton methods, generated by the following formula: where = +1 − and = ℎ +1 − ℎ with ℎ = ℎ( ) and ℎ +1 = ℎ( + ). By (11) and (14), Yuan and Yao [9] proposed a BFGS method for nonlinear equations and some good results were obtained. Denote = −1 , and then (14) has the inverse update formula represented by Unfortunately, both the Newton method and the quasi-Newton method require many space to store × matrix at every iteration, which will prevent the efficiency of the algorithm for problems, especially for large-scale problems. Therefore low storage matrix information method should be necessary.
(iii) Limited Memory Quasi-Newton Method. The search direction is defined by where is generated by limited memory quasi-Newton method, where the famous limited memory quasi-Newton method is the so-called limited memory BFGS method. The L-BFGS method is an adaptation of the BFGS method for large-scale problems (see [10] in detail), which often Abstract and Applied Analysis 3 requires minimal storage and provides a fast rate of linear convergence. The L-BFGS method has the following form: where = 1/ , = − ,̃> 0 is an integer, and is the unit matrix. Formula (17) shows that matrix is obtained by updating the basic matrix 0̃t imes using BFGS formula with the previous̃iterations. By (17), together with (7) and (8), Yuan et al. [11] presented the L-BFGS method for nonlinear equations and got the global convergence. At present, there are many papers proposed for (1) (see [6,[12][13][14][15], etc.).
In order to effectively solve large-scale nonlinear equations and possess good theory property, based on the above discussions of and , we will combine (11) and (16) and present a L-BFGS method for (1) since (11) can make the norm function be descent and (16) need less low storage. The main attributes of the new algorithm are stated as follows.
(i) A L-BFGS method with (11) is presented.
(ii) The norm function is descent.
(iii) The global convergence is established under appropriate conditions.
(iv) Numerical results show that the given algorithm is more competitive than the normal algorithm for large-scale nonlinear equations.
This paper is organized as follows. In the next section, the backtracking inexact L-BFGS algorithm is stated. Section 3 will present the global convergence of the algorithm under some reasonable conditions. Numerical experiments are done to test the performance of the algorithms in Section 4.

Algorithms
This section will state the L-BFGS method in association with the new backtracking line search technique (11) for solving (1).
Step 3. If then take = 1 and go to Step 5. Otherwise go to Step 4.
Step 4. Let be the smallest nonnegative integer such that (11) holds for = . Let = .
In the following, to conveniently analyze the global convergence, we assume that the algorithm updates (the inverse of ) with the basically bounded and positive definite matrix 0 ( 0 's inverse). Then Algorithm 1 with has the following steps.
Remark 3. Algorithms 1 and 2 are mathematically equivalent. Throughout this paper, Algorithm 2 is given only for the purpose of analysis, so we only discuss Algorithm 2 in theory.
In the experiments, we implement Algorithm 1.

Global Convergence
Define the level set Ω by In order to establish the global convergence of Algorithm 2, similar to [4,11], we need the following assumptions.
Assumption A. is continuously differentiable on an open convex set Ω 1 containing Ω. Moreover the Jacobian of is symmetric, bounded, and positive definite on Ω 1 ; namely, there exist positive constants ≥ > 0 satisfying Assumption B. is a good approximation to ∇ ; that is, where ∈ (0, 1) is a small quantity.

Remark 4. Assumption A implies
The relations in (24) can ensure that +1 generated by (20) inherits symmetric and positive definiteness of . Thus, (19) has a unique solution for each . Moreover, the following lemma holds.
Remark 9. The above lemma shows that Algorithm 2 is well defined. By a way similar to Lemma 3.2 and Corollary 3.4 in [5], it is not difficult to deduce that then every accumulation point of { } is a solution of (1). Assumption A means that (1) has only one solution. Moreover, since Ω is bounded, { } ⊆ Ω has at least one accumulation point. Therefore { } itself converges to the unique solution of (1). Therefore, it suffices to verify (32). If (18) holds for infinitely many 's, then (32) is trivial. Otherwise, if (18) holds for only finitely many 's, we conclude that Step 3 is executed for all sufficiently large. By (11), we have Since {‖ℎ ‖} is bounded, by adding these inequalities, we get Then we have which together with (31) implies (32). This completes the proof.
Abstract and Applied Analysis 5
In the experiments, the parameters in Algorithm 1 and the normal BFGS method were chosen as = 0.1, = 0.5,