EXTRA MULTISTEP BFGS UPDATES IN QUASI-NEWTON METHODS

This note focuses on developing quasi-Newton methods that combine m+ 1 multistep and single-step updates on a single iteration for the sake of constructing the new approximation to the Hessian matrix to be used on the next iteration in computing the search direction. The approach considered here exploits the merits of the multistep methods and those of El-Baali (1999) to create a hybrid technique. Our numerical results are encouraging and reveal that our proposed approach is promising. The new methods compete well with El-Baali’s extra update algorithms (1999).


Introduction
This note considers methods that efficiently minimize unconstrained functions of the form minimize f (x), where f : R n −→ R, x ∈ R n . (1.1) Quasi-Newton methods require only the function and its first partial derivatives (gradient) to be available. However, an approximating matrix to the Hessian is used and updated throughout the iterations to reflect the changes in the function and its gradient. Let g and G denote the gradient and the Hessian of f , respectively. Given B i , the current approximation to the Hessian, we need to find a new approximating matrix B i+1 to the Hessian. The new Hessian approximation, B i+1 , satisfies in standard quasi-Newton methods the so-called secant equation: where 2 Extra multistep BFGS updates in quasi-Newton methods In general, updating formulas are of the form B i+1 = B i + C i , where C i is a correction matrix.
The most successful rank-two formula, developed independently by Broyden, Fletcher, Goldfarb, and Shanno, is known as the BFGS formula. This formula is given by Numerical results indicate that the BFGS method is superior to other updating formulas, especially when inaccurate line searches are used [8,10,11].
The note starts with a brief account of some of the successful multistep algorithms that will be used in the numerical comparisons in the last section of this note. Then the new implicit updated algorithms are derived. We finally present the numerical comparisons.

Multistep quasi-Newton methods
Let {x(τ)} or X denote a differentiable path in R n , where τ ∈ R, or simply the path X. Then we apply the chain rule to the gradient vector g(x(τ)) in order to find the derivative of the gradient g with respect to τ to obtain Therefore, at any point on the path X the Hessian G must satisfy (2.1) for any value of τ, specifically for τ = τ c , where τ c is a constant scalar and τ c ∈ R. This will result in the following equation, called the "Newton equation" [4,8]: Since we wish to derive a relation satisfied by the Hessian at x i+1 , we choose a value for τ, τ m , that corresponds to the most recent iterate in the Newton equation as follows: or equivalently, and {L k } m k=0 are the standard Lagrange polynomials. The choice of the parameters, τ k , for k = 0,1,2,...,m, are chosen such that they depend on some metric of the following general form: where M is a symmetric positive-definite matrix. This metric is used to define the values {τ k } m k=0 used in computing the vectors r i and w i . The choices based on this metric are numerically better than the unit-spaced method since they take into account the spacing between the iterates using some norm of measure (see [5]).
Several choices were considered for the metric matrix M. For instance, if M = I, we obtain the following (for m = 2): (a) accumulative algorithm A1: The new B-version BFGS formula is given by (2.8)

Extra BFGS updates
The BFGS formula corrects the eigenvalues of the Hessian approximation, B i , although this correction is found inadequate in practice when the eigenvalues are large (see Liu and Nocedal [9]). It is, therefore, as El-Baali [3] states, desirable to further correct those values. It should, however, be stated that such values are not readily available and they can only be estimated rather than do any expensive computations for exactly obtaining them. A well-known formula can be employed to detect the presence of large eigenvalues as will be specified in the next section. It is thus envisaged that extra updates applied to B i will introduce the desired corrections to the large eigenvalues. We now present the ingredients of the extra update methods. Byrd et al. [2] propose for some m ≤ n (n is the problem dimension) the following extra update: where {u (t) } m t=1 and {v (t) } m t=1 are any sequence of independent vectors. The authors were able to prove, under certain assumptions, global and superlinear convergence on convex functions. They use some finite differences to approximate the left-hand side of this last 4 Extra multistep BFGS updates in quasi-Newton methods relation since the Hessian itself is not explicitly available. The convergence results do not necessarily hold since the above relation is approximated.
One specific choice made by Liu and Nocedal [9] for limited memory BFGS and one that is also adopted by El-Baali [3] for the vectors u (t) and v (t) is which are retained from the latest m − 1 iterations. El-Baali conjectured that the curvature information obtained from (3.1) may be employed m times (m > 1) to improve the Hessian approximations. The methods we develop here (as those of El-Baali [3]) generate, at each iteration, matrices built using only B i , and the sequences {u (t) } m t=1 and {v (t) } m t=1 are readily available. We proceed at each iteration i by doing a single standard quasi-Newton single-step update as follows: The newly obtained Hessian approximation is used in the subsequent m − 1 subiterations to do multistep updates as follows: where m is a prescribed constant and 5) and the vectors r i and w i are as in (2.5). The (m + 1)th update is done as For m = 1, the proposed approach is equivalent to the standard 1-step BFGS method. We now state a theorem that addresses the convergence properties of the derived technique.
The following theorem shows that the sequence (3.2)-(3.4) possesses the superlinear convergence property.  It is left to tackle the issue of what m, the number of extra updates, may be chosen to be. We have tried two options: one is to determine a "good" value for m, based on the numerical test results. The other option, supported by the results obtained, is to make the choice on the basis of an estimate of the determinant of the updated Hessian approximation, B i+1 , as

2)-(3.4) is positive definite and that the actual Hessian G satisfies a Lipschitz condition
14) 6 Extra multistep BFGS updates in quasi-Newton methods (where m denotes the extra updates at a given iteration), our numerical tests have favored the choice m = 2 to that made in (3.14). The improvement incurred by our choice and that in (3.14) has not exceeded 8.4%, based on our experimentation on the new methods.

Numerical results and conclusions
Our numerical experiments of the new approach have been done with emphasis on algorithms that satisfy (3.2)-(3.4) by choosing values of {τ j } 2 j=0 that are consistent with successful choices published earlier (see [5,6]). In particular, we have chosen algorithm A1 (see [5]), corresponding to the most successful accumulative approach as a benchmark, along with the standard single-step BFGS, to compare with the new algorithms for the same parameter and metric choices made for A1. The new algorithm is referred to as EA1.
It should be noted here that in our implementation, we are maintaining the matrix B i in the factored form (LL T ).
Sixty functions classified into subsets of "low" (2 ≤ n ≤ 15), "medium" (16 ≤ n ≤ 45), and "high" (46 ≤ n ≤ 80) dimensions (as in [7]) were tested with four different starting points each. Some of the problems tested have variable dimensions and we tested our algorithms on several dimensions, depending on the specific nature of the problem. This has resulted in a total of 876 problems. The overall numerical results are given in Table 4  Step 1: Set B 0 = I and i = 0; evaluate f (x 0 ) and g(x 0 );

Repeat
Step 2: Compute the search-direction p i from B i p i = −g i ; Step 3: Compute x i+1 by means of a line search from x i along p i , using safeguarded cubic interpolation (see (3.7)-(3.8)); Step 4 Step 5: If i = 0 and n ≥ 10, then scale B 0 by the method of Shanno and Phua [11]; Step 6: Update B i to produce B i+1 , using (3.2)-(3.4), with m = 2 (the number of extra updates); increment i. Until g(x i ) 2 < ε (where ε is a problem-dependent tolerance). The numerical evidence provided by the tests reported in Tables 4.1-4.5 demonstrates clearly that the new method EA1 shows significant improvements, when compared with the standard, single-step, BFGS method and A1. In particular it has yielded, on average, improvements in the range 26%-30%, on the problems with the highest dimensions. The results reported in Table 4.2 indicate that, while EA1 does appear to offer some improvement over the method from which it was developed (namely, A1), it is not so superior on such class of problems. We are currently investigating the issue of whether the numerical performance of similar methods can be improved further. The convergence properties of such methods need also be explored.