A Newton-Like Trust Region Method for Large-Scale Unconstrained Nonconvex Minimization

We present a new Newton-like method for large-scale unconstrained nonconvex minimization. And a new straightforward limited memory quasi-Newton updating based on the modified quasi-Newton equation is deduced to construct the trust region subproblem, in which the information of both the function value and gradient is used to construct approximate Hessian. The global convergence of the algorithm is proved. Numerical results indicate that the proposed method is competitive and efficient on some classical large-scale nonconvex test problems.


Introduction
We consider the following unconstrained optimization: where  :   →  is continuously differentiable.
Trust region methods [1][2][3][4][5][6][7][8][9][10][11][12][13][14] are robust, can be applied to ill-conditioned problems, and have strong global convergence properties.Another advantage of trust region methods is that there is no need to require the approximate Hessian of the trust region subproblem to be positive definite.So, trust region methods are important and efficient for nonconvex optimization problems [6-8, 10, 12, 14].For a given iterate   ∈   , the main computation of trust region algorithms is solving the following quadratic subproblem: min where   = ∇(  ) is the gradient of () at   ,   is the true Hessian ∇ 2 (  ) or its approximation, Δ  > 0 is a trust region radius, and ‖⋅‖ refers to the Euclidean norm on   .For a trial step   , which is generated by solving the subproblem (2), adequacy of the predicted reduction and true variation of the objective function is measured by means of the ratio Then the trust region radius Δ  is updated according to the value of   .Trust region methods ensure that at least a Cauchy (steepest descent-like) decrease on each iteration satisfies an evaluation complexity bound of the same order under identical conditions [11].It follows that Newton's method globalized by trust region regularization satisfies the same ( −2 ) evaluation upper bound; such a bound can also be shown to be tight [12] provided additionally that the Hessian on the path of the iterates for which pure Newton steps are taken is Lipschitz continuous.Newton's method has been efficiently safeguarded to ensure its global convergence to first-and even second-order critical points, in the presence of local nonconvexity of the objective using line search [3], trust region [4], or other regularization techniques [9,13].Many variants of these globalization techniques have been proposed.These generally retain fast local convergence under some nondegeneracy assumptions, are often suitable when solving large-scale problems, and sometimes allow approximate rather than true Hessians to be employed.Solving-large scale problems needs expensive computation and storage.So many researchers have studied the limited memory techniques [15][16][17][18][19][20][21][22][23][24].The limited memory techniques are firstly applied to line search method.Liu and Nocedal [15,16] proposed a limited memory BFGS method (L-BFGS) for solving unconstrained optimization and proved its global convergence.Byrd et al. [17] gave the compact representations of the limited memory BFGS and SR1 formula, which made it possible for combining limited memory techniques with trust region method.Considering that the L-BFGS updating formula used the gradient information merely and ignored the available function value information, Yang and Xu [19] deduced modified quasi-Newton formula with limited memory compact representation based on the modified quasi-Newton equation with a vector parameter [18].Recently, some researchers combined the limited memory techniques with trust region method for solving largescale unconstrained and constrained optimizations [20][21][22][23][24].
In this paper, we deduce a new straightforward limited memory quasi-Newton updating based on the modified quasi-Newton equation, which uses both available gradient and function value information, to construct the trust region subproblem.Then the corresponding trust region method is proposed for large-scale unconstrained nonconvex minimization.The global convergence of the new algorithm is proved under some appropriate conditions.
The rest of the paper is organized as follows.In the next section, we deduce a new straightforward limited memory quasi-Newton updating.In Section 3, a Newton-like trust region method for large-scale unconstrained nonconvex minimization is proposed and the convergence property is proved under some reasonable assumptions.Some numerical results are given in Section 4.

The Modified Limited Memory
Quasi-Newton Formula In this section, we deduce a straightforward limited memory quasi-Newton updating based on the modified quasi-Newton equation, which employs both the gradients and function values to construct the approximate Hessian and is a compensation for the missing data in limited memory techniques.And then we apply the derived formula in trust region method.Consider the following modified quasi-Newton equation [18]: where   =  +1 −  ,   =  +1 −  , ŷ = (1+(  /     ))  =     , and   = 6((  )−( +1 ))+3(  + +1 )    .The quasi-Newton updating matrix constructed by (4) achieves a higher order accuracy in approximating Hessian.Based on (4), the modified BFGS (MBFGS) updating is as follows: For twice continuously differentiable function, if   converges to a point  * at which ( * ) = 0 and ∇ 2 ( * ) is positive definite, then lim  → ∞   = 0, and then lim  → ∞   = 1.Moreover, if  is sufficiently large, the MBFGS updating approaches to the BFGS updating.
It is noticed that the only difference between the limited memory quasi-Newton method and the standard quasi-Newton method is in the matrix updating.Instead of storing the matrices   , we need to store  pairs vectors {  ,   } to define   implicitly.The product   V or V    V is obtained by performing a sequence of inner products involving V and the  most recent vectors pairs {  ,   }.
In the following, we discuss the computation of the products   V and V    V, V ∈   .As the situation of (11), we need 4 multiplications to obtain   V.If   V has been computed, we only need to solve a vector product to obtain V    V which needs  multiplications.If   V has not been computed, we compute V    V directly by using (9).Consider The whole computation only requires (2 + 1) + 4 multiplications.Thus, 2 multiplications are saved in contrast to the previous method.
Let   be the current iteration point, the vectors  −1 ,  −1 ,   and matrixes  −1 ,  −1 have been obtained by the previous iteration.
We use the form of ( 9) to store   .Instead of updating   into  +1 , we update   ,   into  +1 ,  +1 .

Newton-Like Trust Region Method
In this section, we present a Newton-like trust region method for large-scale unconstrained nonconvex minimization.
Step 3. Compute Step 4. Compute Step 5. Update the trust region radius as the following: Step 6.By implementing Algorithm 1 to update   ,   into  +1 ,  +1 in order to update   into  +1 , set  :=  + 1; go to Step 1.
In Step 2, using CG-Steihaug algorithm in [3] to solve the subproblem (2), the algorithm is suitable for solving largescale unconstrained optimization.In the solving process, the products   V and V    V are computed by Algorithm 2. Then the whole computation of solving subproblem only requires () multiplications.
To give the convergence result, we need the following assumptions.
The proof is similar to Theorem 4.7 in [3] and is omitted.

Numerical Results
In this section, we apply Algorithm 3 to solve nonconvex programming problems.Preliminary numerical results to illustrate the performance of Algorithm 3 are denoted by NLMTR.The contrast tests are called NTR, which is the same as NLMTR except that   is updated by BFGS formula.All tests are implemented by using Matlab R2008a on a PC with CPU 2.00 GHz and 2.00 GB RAM.The test problem collections for nonconvex unconstrained minimization are taken from Moré et al. in [25], the CUTEr collection [26,27].These problems are listed in Table 1.
All numerical results are listed in Table 2, in which iter stands for the number of iterations, which equals the number of gradient evaluations; nf stands for the number of objective function evaluations; Prob stands for the problem label; Dim stands for the number of variables of the tested problem; cpu denotes the CPU time for solving the problems; ‖  ‖ is the terminated gradient; and  * denotes the optimal value.We compare NLMTR with NTR.The trial step   is computed by CG-steihaug algorithm [3].The matrix   of NLMTR is updated by the straightforward modified L-MBFGS formula (9).Choosing  = 0.1,  = 3.The matrices   of NTR is updated by BFGS formula in [3].The iteration is terminated by ‖  ‖ ≤  or ‖  ‖ ≤ , where  = 10 −8 .The related figures are listed in Table 2.
From Table 2, we can see that for small-scale problems, the optimal values and the gradient norms of NTR are more accurate than NLMTR.For middle-scale problems, the accuracy of NTR is higher, but the cpu time of NLMTR is shorter.For large-scale problems, the cpu time of NTR is much more than NLMTR, and for some problems NTR fails, especially when  = 5000.So NLMTR is suitable for solving large-scale nonconvex problems.