IJMMSInternational Journal of Mathematics and Mathematical Sciences1687-04250161-1712Hindawi Publishing Corporation43585110.1155/2009/435851435851Research ArticleNewton-Krylov Type Algorithm for Solving Nonlinear Least Squares ProblemsAbdel-AzizMohammedi R.1El-AlemMahmoud M.1, 2LasieckaIrena1Department of Mathematics and Computer ScienceFaculty of ScienceKuwait University P.O. 5969Safat 13060Kuwait CityKuwaitkuniv.edu.kw2Department of MathematicsFaculty of ScienceAlexandria UniversityAlexandriaEgyptalex.edu.eg200908032009200915122008020220092009Copyright © 2009This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The minimization of a quadratic function within an ellipsoidal trust region is an important subproblem for many nonlinear programming algorithms. When the number of variables is large, one of the most widely used strategies is to project the original problem into a small dimensional subspace. In this paper, we introduce an algorithm for solving nonlinear least squares problems. This algorithm is based on constructing a basis for the Krylov subspace in conjunction with a model trust region technique to choose the step. The computational step on the small dimensional subspace lies inside the trust region. The Krylov subspace is terminated such that the termination condition allows the gradient to be decreased on it. A convergence theory of this algorithm is presented. It is shown that this algorithm is globally convergent.

1. Introduction

Nonlinear least squares (NLS) problems are unconstrained optimization problems with special structures. These problems arise in many aspects such as the solution of overdetermined systems of nonlinear equations, some scientific experiments, pattern recognition, and maximum likelihood estimation. For more details about these problems . The general formulation of the NLS problem is to determine the solution xn that minimizes the functionf(x)=R(x)22=i=1m(Ri(x))2,where R(x)=(R1(x),R2(x),,Rm(x))t,Ri:n,1im and mn.

There are two general types of algorithms for solving NLS problem (1.1). The first type is most closely related to solving systems of nonlinear equations  and it leads to Gauss-Newton and Levenberg-Marquart methods . The second type is closely related to unconstrained minimization techniques . Of course the two types of algorithms are closely related to each other .

The presented algorithm is a Newton-Krylov type algorithm. It requires a fixed-size limited storage proportional to the size of the problem and relies only upon matrix vector product. It is based on the implicitly restarted Arnoldi method (IRAM) to construct a basis for the Krylov subspace and to reduce the Jacobian into a Hessenberg matrix in conjunction with a trust region strategy to control the step on that subspace .

Trust region methods for unconstrained minimization are blessed with both strong theoretical convergence properties and a good accurate results in practice. The trial computational step in these methods is to find an approximate minimizer of some model of the true objective function within a trust region for which a suitable norm of the correction lies inside a given bound. This restriction is known as the trust region constraint, and the bound on the norm is its radius. The radius is adjusted so that successive model problems minimized the true objective function within the trust region .

The trust region subproblem is the problem of finding s(Δ) so thatϕ(s(Δ))=min{ϕ(s):sΔ},where Δ is some positive constant, · is the Euclidean norm in n, andϕ(s)=gts+12stHs,where gn and Hn×n are, respectively, the gradient vector and the Hessian matrix or their approximations.

There are two different approaches to solve (1.2). These approaches are based on either solving several linear systems  or approximating the curve s(Δ) by a piecewise linear approximation “dogleg strategies” . In large-scale optimization, solving linear systems is computationally expensive. Moreover, it is not clear how to define the dogleg curves when the matrix H is singular .

Several authors have studied inexact Newton's methods for solving NLS problems . Xiaofang et al. have introduced stable factorized quassi-Newton methods for solving large-scale NLS . Dennis et al. proposed a convergence theory for structured BFGS secant method with an application for NLS . The Newton-Krylov method is an example of inexact Newton methods. Krylov techniques inside a Newton iteration in the context of system of equations have been proposed in . The recent work of Sorensen, provides an algorithm which is based on recasting the trust region subproblem into a parameterized eigenvalue problem. This algorithm provides a super linearly convergent scheme to adjust the parameter and find the optimal solution from the eigenvector of the parameterized problem, as long as the hard case does not occur .

This contribution is organized as follows. In Section 2, we introduce the structure of the NLS problem and a general view about the theory of the trust region strategies and their convergence. The statement of the algorithm and its properties is developed in Section 3. Section 4 states the assumptions and presents the role of restarting mechanism to control the dimension of the Krylov subspace. The global convergence analysis is presented in Section 5. Concluding remarks and future ideas are given in the last section.

2. Structure of the Problem

The subspace technique plays an important role in solving the NLS problem (1.1). We assume that the current iterate xk and a Krylov subspace Sk. Denote the dimension of Sk to be kj and {v1(k),v2(k),,vkj(k)} be a set of linearly independent vectors in Sk. The next iterate xk+1 is such that the increment xk+1xkSk. Thus, we have Rj(xk+Vky)=0,  for  j=1,2,,m and ykj, where the matrix Vk=[v1(k),v2(k),,vkj(k)]. The second-order Taylor expansion of R(x)22 at xk isR(xk)+Jks22+stQks,where Qkn×n is defined by Qk=j=1nRj(xk)2Rj(xk) and the jacobian matrix is Jk=J(xk)=[R1(xk),R2(xk),,Rm(xk)]t. If we consider sSk, we get the quadratic modelqk(y)=R(xk)+JkVky22+ytAky,where Akkj,kj approximates the reduced matrix VktQkVk=j=1mRj(xk)Vkt2Rj(xk)Vk.

The first order necessary conditions of (2.1) is(JktJk+Qk)s+RktJk=0.Thus, a solution of the quadratic model is a solution to an equation of the form(JktJk+Qk)s=RktJk.

The model trust region algorithm generates a sequence of points xk, and at the kth stage of the iteration the quadratic model of problem (1.2) has the formϕk(s)=JktRks+12st(JktJk+Qk)s.At this stage, an initial value for the trust region radius Δk is also available. An inner iteration is then performed which consists of using the current trust region radius, Δk, and the information contained in the quadratic model to compute a step, s(Δk). Then a comparison of the actual reduction of the objective functionaredk(Δk)=R(xk)22R(xk+sk(Δk))22,and the reduction predicted by the quadratic modelpredk(Δk)=R(xk)22qk(y)is performed. If there is a satisfactory reduction, then the step can be taken, or a possibly larger trust region used. If not, then the trust region is reduced and the inner iteration is repeated. For now, we leave unspecified what algorithm is used to form the step s(Δ), and how the trust region radius Δ is changed. We also leave unspecified the selection of (JktJk+Qk) except to restrict it to be symmetric. Details on these points will be addressed in our forth coming paper.

The solution s of the quadratic model (2.2) is a solution to an equation of the form[(JtJ+Q)+μI]s=RtJ,with μ0, μ(stsΔ2)=0 and [(JtJ+Q)+μI] is positive semidefinite. This represents the first-order necessary conditions concerning the pair μ and s, where s is a solution to model (2.2) and μ is the Lagrange multiplier associated with the constraint stsΔ2. The sufficient conditions that will ensure s to be a solution to (2.2) can be established in the following lemma which has a proof in . Lemma 2.1.

Let μ and sn satisfy [(JtJ+Q)+μI]s=RtJ with [(JtJ+Q)+μI] positive semidefinite, ifμ=0,sΔ  then  s  solves  (2.2),s=Δ,  then  s  solves  min{ϕ(s):s=Δ},μ0  with  s=Δ,  then  s  solves  (2.2).

Of course, if [(JtJ+Q)+μI] is positive definite then s is a unique solution.

The main result which is used to prove that the sequence of gradients tends to zero for modified Newton methods isfϕ(s)12RtJmin{Δ,RtJJtJ+Q},where s is a solution to (2.2). The geometric interpretation of this inequality is that for a quadratic function, any solution s to (2.2) produces a decrease fϕ(s) that is at least as much as the decrease along the steepest descent direction RtJ. A proof of this result may be found in .

3. Algorithmic Framework

The algorithm we will discuss here requires that f is twice differentiable at any point x in the domain of f. This algorithm involves two levels. In the first level we use IRA method to reduce the Hessian to a tridiagonal matrix and construct an orthonormal basis for the invariant subspace of the Hessian matrix. The second level is used to compute the step and update the trust region radius for the reduced local model (a model defined on the subspace). The desired properties of this algorithm are:

the algorithm should be well defined for a sufficiently general class of functions and it is globally convergent;

the algorithm should be invariant under linear affine scalings of the variables, that is, if we replace f(x) by f^(y)=f(s+Vy), where Vn×k is an orthonormal matrix, xn and yk, then applying the iteration to f^ with initial guess y0 satisfying x0=s+Vy0 should produce a sequence {yl} related to the sequence {xl} by xl=Vyl+s, where xl is produced by applying the algorithm to f with initial guess x0;

the algorithm should provide a decrease that is at least as large as a given multiple of the minimum decrease that would be provided by a quadratic search along the steepest descent direction;

the algorithm should give as good a decrease of the quadratic model as a direction of the negative gradient when the Hessian, (JtJ+Q), is indefinite and should force the direction to be equal to the Newton direction when (JtJ+Q) is symmetric positive definite.

The following describes a full iteration of a truncated Newton type method. Some of the previous characteristics will be obvious and the other ones will be proved in the next section.

Algorithm 3.1.

Step 0 (initialization).

Given x0n, compute (RtJ)(x0), (JtJ+Q)(x0).

Choose ξ1<ξ2, η1(0,1),η2>1, Δ1>0,k=1.

Step 1 (construction a basis for the Krylov subspace).

Choose ϵ(0,1), an initial guess s0 which can be chosen to be zero, form r0=RtJ(JtJ+Q)s0, where RtJ=(RtJ)(xn),(JtJ+Q)=(JtJ+Q)(xn), and compute τ=r0 and v1=r0/τ.

For j=1,2, until convergence, do.

Form (JtJ+Q)vj and orthogonalize it against the previous v1,v2,,vj then

h(i,j)=((JtJ+Q)vj,vi),  i=1,2,,j,

v^j+1=(JtJ+Q)vji=1jh(i,j)vi,

h(i+1,j)=v^j+1.

vj+1=v^j+1/h(i+1,j).

Compute the residual norm βj=RtJ+(JtJ+Q)sj of the solution sj that would be obtained if we stopped.

If βjϵjRtJ, where ϵj(1/𝒟j)(0,ϵ], and 𝒟j is an approximation to the condition number of (JtJ+Q)j, then set k=j and go to 3, else continue.

Step 2 (compute the trial step).

Construct the local model,ψk(y)=fk+hkty+12ytHky,where hk=Vkt(RtJ)kkj and Hk=Vkt(JtJ+Q)kVkkj×kj.

Compute the solution y to the problem min{ψk(y):yΔk} and set sk=Vky.

Step 3 (test the step and update Δk).

Evaluate fk+1=f(xk+Vky), aredk=fkf(xk+Vky) and predk=fkψ(Vky).

If (aredk/predk)<ξ1 then

begin Δk=η1Δk go to 3(i)

Comment: Do not accept the step “y”, reduce the trust region radius “Δk” and compute a new step

Else if ξ1(aredk/predk)ξ2 then

xk+1=xk+Vky

Comment: Accept the step and keep the previous trust region radius the same

Else (aredk/predk)>ξ2 then

Δk=η2Δk

Comment: Accept the step and increase the trust region radius

end if

Step 4.

Set k+1k and go to Step 1.

4. The Restarting Mechanism

In this section, we discuss the important role of the restarting mechanism to control the possibility of the failure of nonsingularity of the Hessian matrix, and introduce the assumptions under which we prove the global convergence.

Let the sequence of the iterates generated by Algorithm 3.1 be {xk}; for such sequence we assume

for all k,xk and xk+skΩ where Ωn is a convex set;

fC2(Ω) and (RtJ)=f(x)0 for all xΩ;

f(x) is bounded in norm in Ω and (JtJ+Q) is nonsingular for all xΩ;

for all sk such that xk+skΩ, the termination condition, (RtJ)k+(JTJ+Q)skϵj(RtJ)k, ϵk(0,1) is satisfied;

An immediate consequence of the global assumptions is that the Hessian matrix (JtJ+Q)(x) is bounded, that is, there exists a constant β1>0 such that (JtJ+Q)(x)<β1. Therefore, the condition numbers of the Hessian, 𝒟k=cond2((JtJ+Q)k) are bounded.

Assumption G3 that (JtJ+Q) is nonsingular is necessary since the IRA iteration is only guaranteed to converge for nonsingular systems. We first discuss the two ways in which the IRA method can breakdown. This can happen if either v^k+1=0 so that vk+1 cannot be formed, or if Hkmax is singular which means that the maximum number of IRA steps has been taken, but the final iterate cannot be formed. The first situation has often been referred to as a soft breakdown, since v^k+1=0 implies (JtJ+Q)k is nonsingular and xk is the solution . The second case is more serious in that it causes a convergence failure of the Arnoldi process. One possible recourse is to hope that (JtJ+Q)k is nonsingular for some k among 1,2,,kmax1. If such k exists, we can compute xj and then restart the algorithm using xk as the new initial guess x0. It may not be possible to do this.

To handle the problem of singular Hk, we can use a technique similar to that done in the full dimensional space. If Hk is singular, or its condition number is greater than 1/ν, where ν is the machine epsilon, then we perturb the quadratic model ψ(y) toψ^(y)=ψ(y)+12γyty=f+hty+12yt(Hk+γI)y,where γ=nνHk1. The condition number of (Hk+γI) is roughly 1/ν. For more details about this technique see reference .

We have discussed the possibility of stagnation in the linear IRA method which results in a break down in the nonlinear iteration. Sufficient conditions under which stagnation of the linear iteration never occurs are

we have to ensure that the steepest descent direction belongs to the subspace, 𝒮k. Indeed in Algorithm 3.1, the Krylov subspace will contain the steepest descent direction because v1=RtJ/RtJ and the Hessian is nonsingular;

there is no difficulty, if one required the Hessian matrix, (JtJ+Q)(x) to be positive definite for all xΩ, then the termination condition can be satisfied, and so convergence of the sequence of iterates will follow. This will be the case for some problems, but, requiring (JtJ+Q)(x) to be positive definite every where is very restrictive.

One of the main restrictions of most of the Newton-Krylov schemes is that the subspace onto which a given Newton step is projected must solve the Newton equations with a certain accuracy which is monitored by the termination condition (assumption G4). This condition is enough to essentially guarantee convergence of the trust region algorithm. Practically, the main difficulty is that one does not know in advance if the subspace chosen for projection will be good enough to guarantee this condition. Thus, kj can be chosen as large as the termination condition will eventually be satisfied, but, when kj is too large the computational cost and the storage become too high. An alternative to that is to restart the algorithm keeping (JtJ+Q) nonsingular and kj<n. Moreover, preconditioning and scaling are essential for the successful application of these schemes .

5. Global Convergence Analysis

In this section, we are going to establish some convergence properties which are possessed by Algorithm 3.1. The major differences between the proposed results and the preexisting ones arise from the fact that a lower dimensional quadratic model is used, rather than the full dimension model that is used in the preexisting results .Lemma 5.1.

Let the global assumptions hold. Then for any sΩ, one has|(RtJ)kts|s2𝒟kϵk(JtJ+Q)k(1+ϵk)𝒟k(RtJ)k>0.

Proof.

Suppose rk be the residual associated with s so that rk=(RtJ)k+(JtJ+Q)ks and (RtJ)k0. Then rkϵk(RtJ)k, and s=(JtJ+Q)k1((RtJ)krk). So,(RtJ)kts=(RtJ)kt[(JtJ+Q)k1((RtJ)krk)]=(RtJ)kt(JtJ+Q)k1(RtJ)k+(RtJ)ktrk.Hence,|(RtJ)kts|s=|(RtJ)kt(JtJ+Q)k1(RtJ)k(RtJ)ktrk|(JtJ+Q)k1((RtJ)krk)(RtJ)kt(JtJ+Q)k1(RtJ)k|(RtJ)ktrk|(JtJ+Q)k1((RtJ)krk).Next, rkϵ(RtJ)k implies |(RtJ)ktrkj|ϵk(RtJ)k2 which gives(RtJ)kt(JtJ+Q)k1(RtJ)k|(RtJ)ktrk|((JtJ+Q)k1ϵk)(RtJ)k2,(JtJ+Q)k1((RtJ)krk)(JtJ+Q)k1(RtJ)k+(JtJ+Q)k1rk(1+ϵk)(JtJ+Q)k1(RtJ)k.Introduce (5.4) into (5.3), we obtain the following:|(RtJ)kts|s((JtJ+Q)k1ϵk)(RtJ)k2(1+ϵk)(JtJ+Q)k1(RtJ)k((JtJ+Q)k1ϵk)(1+ϵk)(JtJ+Q)k1(RtJ)k=(𝒟kϵk(JtJ+Q)k(1+ϵk)𝒟k)(RtJ)k.

Inequality (5.1) can be written as|(RtJ)kts|s(RtJ)k𝒟kϵk(JtJ+Q)k(1+ϵk)𝒟k.This condition (5.6) tells us at each iteration of Algorithm 3.1, we require that the termination condition holds. Since the condition numbers, 𝒟k, are bounded from above, condition (5.6) givescosθ𝒟ϵ(JtJ+Q)k(1+ϵ)𝒟,where θ is the angle between (RtJ)k and s and 𝒟 are the upper bound of the condition numbers 𝒟k. Inequality (5.7) shows that the acute angle θ is bounded away from π/2.

The following lemma shows that the termination norm assumption G4 implies that the cosine of the angle between the gradient and the Krylov subspace is bounded below.Lemma 5.2.

Let the global assumptions hold. ThenVkt(RtJ)k𝒟kϵk(JtJ+Q)k(1+ϵk)𝒟k(RtJ)k.

Proof.

Suppose s=Vky be a vector of 𝒮k satisfying the termination condition, (RtJ)k+(JtJ+Q)ksϵk(RtJ)k,ϵk(0,1). From Lemma 5.1 and the fact that s=Vy=y, we obtain|(RtJ)ktVky|y𝒟kϵk(JtJ+Q)k(1+ϵk)𝒟k(RtJ)k.From Cauchy-Schwartz inequality, we obtain|(RtJ)ktVky|Vkt(RtJ)ky.Combining formulas (5.9) and (5.10), we obtain the result.

The following Lemma establishes that Algorithm 3.1 converges with a decrease in the quadratic model on the lower dimensional space at least equal to the decrease in the quadratic model on the full dimensional space.Lemma 5.3.

Let the global assumptions hold and let y be a solution of the minimization problem, minyΔ  ψ(y). Thenhkmin{Δk,hkHk}αk(RtJ)kmin{Δk,αk(RtJ)k(JtJ+Q)k},where αk=(𝒟kϵk(JtJ+Q)k)/(1+ϵk)𝒟k.

Proof.

Since, ψ(y)=ϕ(Vky)=fk+(RtJ)ktVky+(1/2)(Vky)t(JtJ+Q)(Vky). Thus ψ(0)=Vkt(RtJ)k. From Step 3 of Algorithm 3.1, we obtainfkfk+1ξ1(fkϕk(y))ξ12Vkt(RtJ)kmin{Δk,Vkt(RtJ)kVkt(JtJ+Q)kVk}=ξ12hkmin{Δk,hkHk}.We have to convert the lower bound of this inequality to one involving (RtJ)k rather than Vkt(RtJ)k. Using Lemma 5.2, we obtain hkαk(RtJ)k, but, Hk=Vkt(JtJ+Q)kVk(JtJ+Q)k. Substituting in (5.14), we obtainhk2min{Δk,hkHk}αk2(RtJ)kmin{Δk,αk(RtJ)k(JtJ+Q)k},which completes the proof.

The following two facts will be used in the remainder of the proof.

Fact 1.

By Taylor's theorem for any k and Δ>0, we have|aredk(Δ)predk(Δ)|=|(fkfk+1)(hktyk(Δ)12ykt(Δ)Vkt(JtJ+Q)kVkyk(Δ))|=|12ykt(Δ)Hkyk(Δ)01ykt(Δ)Hk(η)yk(Δ)(1η)dη|yk(Δ)201HkHk(η)(1η)dη.where, Hk(η)=Vkt(JtJ+Q)(xk+ηsk)Vk with sk=xk+1xk=Vyk. Thus,|aredk(Δ)predk(Δ)1|yk(Δ)2|predk(Δ)|01HkHk(η)(1η)dη.

Fact 2.

For any sequence {xk} generated by an algorithm satisfying the global assumptions, the related sequence {fk} is monotonically decreasing and bounded from below, that is, fkf as k.

The next result establishes that every limit point of the sequence {xk} satisfies the first-order necessary conditions for a minimum.Lemma 5.4.

Let the global assumptions hold and Algorithm 3.1 is applied to f(x), generating a sequence {xk},  xkΩ,  and  k=1,2,, then hk0.

Proof.

Since ϵkϵ<1 for all k, we haveαk𝒟ϵ(JtJ+Q)k(1+ϵ)𝒟=α>0k.Consider any xl with (RtJ)l0, (RtJ)(x)(RtJ)lβ1xxl. Thus, if xxk((RtJ)k/2β1), then (RtJ)(x)(RtJ)k(RtJ)l(RtJ)(x)(RtJ)k((RtJ)k/2). Let, ρ=(RtJ)k/2 and ρ={x:xxk<ρ}. There are two cases, either for all kl,xlρ or eventually {xl} leaves the ball ρ. Suppose xlρ for all kl, then glgk/2=σ¯ for all l>k. Thus by Lemma 5.3, we havepredk(Δ)ξ1ασ¯2min{Δk,ασ¯Hk}kl.Put ασ¯=σ, introduce inequality (5.19) into (5.17) and since Hk(JtJ+Q)kβ1, we obtain|aredk(Δ)predk(Δ)1|yk201HkHk(η)(1η)dηξ1σmin{Δk,σ/β1}Δk2(2β1)ξ1σmin{Δk,σ/β1}2Δkβ1ξ1σ,Δkσβ1  kl.This gives for Δk sufficiently small and kl thataredk(Δ)predk(Δ)>ξ2.In addition, we have(JtJ+Q)k1hkhkHkα(RtJ)kβ1σβ1.Inequalities (5.21) and (5.22) tell us that, for Δk sufficiently small, we cannot get a decrease in Δk in Step 3(ii) of Algorithm 3.1. It follows that Δk is bounded away from 0, but, sincefkfk+1=aredk(Δ)ξ1predk(Δ)ξ1σmin{Δk,σβ1},and since f is bounded from below, we must have Δk0 which is a contradiction. Therefore, {xk} must be outside ρ for some k>l.

Let xj+1 be the first term after xl that does not belong to ρ. From inequality (5.23), we getf(xl)f(xj+1)=k=lj(fkfk+1)j+1lξ1predk(Δ)k=ljξ1σmin{Δk,σβ1}.If Δkσ/β1 for all lkj, we havef(xl)f(xj+1)ξ1σk=ljΔkξ1σρ.If there exists at least one k with Δk>σ/β1, we getf(xm)f(xm+1)ξ1σ2β1.In either case, we havef(xm)f(xj+1)ξ1σmin{ρ,σβ1}=ξ1α(RtJ)k2min{α(RtJ)k2β1,α(RtJ)k2β1}ξα24β1(RtJ)k2.Since, f is bounded below, {fk} is a monotonically decreasing sequence (i.e.,  fkf). Hence,(RtJ)k24β1ξ1α2(fkf),(i.e.,  (RtJ)k0 as k). The proof is completed by using Lemma 5.2.

The following lemma proves that under the global assumptions, if each member of the sequence of iterates generated by Algorithm 3.1 satisfies the termination condition of the algorithm, then this sequence converges to one of its limit points.Lemma 5.5.

Let the global assumptions hold, Hk=Vkt(JtJ+Q)(xk)Vk for all k, (JtJ+Q)(x) is Lipschitz continuous with L, and x* is a limit point of the sequence {xk} with (JtJ+Q)(x*) be positive definite. Then {xk} converges q-Superlinearly to x.

Proof.

Let {xkm} be a subsequence of the sequence {xk} that converges to x, where, x is a limit of {xk}. From Lemma 5.4, (RtJ)(x)=0. Since (JtJ+Q)(x) is positive definite and (JtJ+Q)(x) is continuous there exists δ1>0 such that if xx<δ1, then (JtJ+Q)(x) is positive definite, and if xx then (RtJ)(x)0. Let, B1={x:xx<δ1}.

Since (RtJ)(x)=0, we can find δ2>0 with δ2<δ1/4 and (JtJ+Q)1(RtJ)δ1/2 for all xB2={x:xx<δ2}.

Find lo such that f(xlo)<inf{f(x):xB1B2} and xloB2. Consider any xi with ikl,  xiB2.

We claim that xi+1B2, which implies that the entire sequence beyond xkloB2. Suppose xi+1 does not belong to B2. Since fi+1<fklo, xi+1 does not belong to B1 either. SoΔixi+1xixi+1xxixδ1δ14=3δ14>δ12(JtJ+Q)i1(RtJ)iHi1hi. So, the truncated-Newton step is within the trust region, we obtain yi(Δi)=(Vit(JtJ+Q)iVi)1Vit(RtJ)i.Since yi(Δ)<δ1/2, it follows that xi+1B1, which is a contradiction. Hence, for all kklo,xkB2, and so since f(xk) is strictly decreasing sequence and x is the unique minimizer of f in B2, we have that {xk}x.

The following lemma establishes the rate of convergence of the sequence of iterates generated by Algorithm 3.1 to be q-super linear.Lemma 5.6.

Let the assumptions of Lemma 5.5 hold. Then the rate of convergence of the sequence {xk} is q-Super linear.

Proof.

To show that the convergence rate is super linear, we will show eventually that Hk1hk will always be less than Δj and hence the truncated-Newton step will always be taken. Since (JtJ+Q)(x) is positive definite, it follows that the convergence rate of {xk} to x is super linear.

To show that eventually the truncated-Newton step is always shorter than the trust region radius, we need a particular lower bound on predk(Δ). By assumptions, for all k large enough, Hk is positive definite. Hence, either the truncated-Newton step is longer than the trust region radius, or yk(Δ) is the truncated-Newton step. In either caseyk(Δ)(Vkt(JtJ+Q)jVj)1Vkt(RtJ)k(Vkt(JtJ+Q)kVk)1Vkt(RtJ)k,and so it follows that Vkt(RtJ)ksk(Δ)/(Vkt(JtJ+Q)kVk)1. By Lemma 5.3, for all k large enough, we havepredk(Δ)ξ1α2yk(Δ)(JtJ+Q)k1min{Δk,α(RtJ)k(JtJ+Q)k}.Thus,predk(Δ)ξ1α2yk(Δ)(JtJ+Q)k1min{yk(Δ),αyk(Δ)(JtJ+Q)k(JtJ+Q)k1}ξ1α22yk(Δ)2𝒟(JtJ+Q)k1.So, by the continuity of (JtJ+Q)(x), for all k large enough, we obtainpredk(Δ)ξ1α24yk(Δ)2(JtJ+Q)1(x).Finally, by the argument leading up to Fact 1 and the Lipschitz continuity assumption, we have|aredk(Δ)predk(Δ)|yk(Δ)3L2.Thus, for any Δk>0 and k large enough, we obtain|aredk(Δ)predk(Δ)1|sk(Δ)3L2.4(JtJ+Q)1(x)ξ1α2yk(Δ)2=2Lξ1α2(JtJ+Q)1(x)yk(Δ)2L(JtJ+Q)1(x)ξ1α2Δk.Thus by Step 3(ii) of Algorithm 3.1, there is a Δ¯ such that if Δk1<Δ¯, then Δk will be less than Δk1 only if Δk(Vk1t(JtJ+Q)k1Vk1)1Vk1t(RtJ)k1. It follows from the superlinear convergence of the truncated-Newton method that for xk1 close enough to x and k large enough,(Vkt(JtJ+Q)kVk)1Vkt(RtJ)k<(Vk1t(JtJ+Q)k1Vk1)1Vk1t(RtJ)k1.Now, if Δk is bounded away from 0 for all large k, then we are done. Otherwise, if for an arbitrarily large k, Δk is reduced, that is, Δk<Δk1, then we haveΔk(Vk1t(JtJ+Q)k1Vk1)1Vk1t(RtJ)k1>(Vkt(JtJ+Q)kVk)1Vkt(RtJ)k,and so the full truncated-Newton step is taken. Inductively, this occurs for all subsequence iterates and super linear convergence follows.

The next result shows that under the global assumptions every limit point of the sequence {xk} satisfies the second-order necessary conditions for a minimum.Lemma 5.7.

Let the global assumptions hold, and assume predk(Δ)α¯2(λ1(H))Δ2 for all the symmetric matrix Hkj×kj, hkj, Δ>0 and some α¯2>0. If {xk}x, then H(x) is positive semidefinite, where λ1(H) is the smallest eigenvalue of the matrix H.

Proof.

Suppose to the contrary that λ1(H(x))<0. There exists K such that if kK, then λ1(Hk)<(1/2)λ1(H(x)). For all kK and for all Δk>0, we obtainpredk(Δ)α¯2(λ1(Hk))Δk2α¯22(λ1(H(x)))Δk2.Using inequality (5.17), we obtain|aredk(Δ)predk(Δ)1|yk(Δ)2predk(Δ)01Hk(η)Hk(1η)dη2yk(Δ)2Δk201Hk(η)Hk(1η)dηα¯2(λ1(H(x))).Since the last quantity goes to zero as Δk0 and since a Newton step is never taken for k>K, it follows from Step 3(ii) of Algorithm 3.1 that for kK  Δk sufficiently small. Δk cannot be decreased further. Thus, Δ is bounded below, butaredk(Δ)ξ predk(Δ)ξα¯22(λ1(H(x)))Δk2.Since f is bounded below, aredk(Δ)0, so Δk0 which is a contradiction. Hence λ1(H(x))0. This completes the proof of the lemma.

6. Concluding Remarks

In this paper, we have shown that the implicitly restarted Arnoldi method can be combined with the Newton iteration and trust region strategy to obtain a globally convergent algorithm for solving large-scale NLS problems.

The main restriction of this scheme is that the subspace onto which a given Newton step is projected must solve the Newton equations with a certain accuracy which is monitored by the termination condition. This is enough to guarantee convergence.

This theory is sufficiently general that is hold for any algorithm that projects the problem on a lower dimensional subspace. The convergence results indicate promise for this research to solve large-scale NLS problems. Our next step is to investigate the performance of this algorithm on some NLS problems. The results will be reported in our forthcoming paper.

Acknowledgment

The authors would like to thank the editor-in-chief and the anonymous referees for their comments and useful suggestions in the improvement of the final form of this paper.

GolubG.PereyraV.Separable nonlinear least squares: the variable projection method and its applicationsInverse Problems2003192R1R2610.1088/0266-5611/19/2/201MR1991786ZBL1022.65014OrtegaJ. M.RheinboldtW. C.Iterative Solution of Nonlinear Equations in Several Variables1970New York, NY, USAAcademic Pressxx+572MR0273810ZBL0241.65046MoréJ. J.WatsonG. A.The Levenberg-Marquardt algorithm: implementation and theoryNumerical Analysis (Proc. 7th Biennial Conf., Univ. Dundee, 1977)1978630Berlin, GermanySpringer105116Lecture Notes in Mathematics10.1007/BFb0067700MR0483445ZBL0372.65022DennisJ. E.Jr.SchnabelR. B.Numerical Methods for Unconstrained Optimization and Nonlinear Equations1983Englewood Cliffs, NJ, USAPrentice-Hallxiii+378Prentice-Hall Series in Computational MathematicsMR702023ZBL0579.65058GullikssonM.SöderkvistI.WedinP.-Å.Algorithms for constrained and weighted nonlinear least squaresSIAM Journal on Optimization199771208224MR143056410.1137/S1052623493248809ZBL0867.65030Abdel-AzizM. R.Globally convergent algorithm for solving large nonlinear systems of equationsNumerical Functional Analysis and Optimization2004253-4199219MR207206510.1081/NFA-120039609ZBL1071.65071PowellM. J. D.MangasarianO.MeyerR.RobinsonS.Convergence properties of a class of minimization algorithmsNonlinear Programming, 2 (Proc. Sympos. Special Interest Group on Math. Programming, Univ. Wisconsin, Madison, Wis., 1974)1975New York, NY, USAAcademic Press127ZBL0321.90045MR0386270MoréJ. J.SorensenD. C.Computing a trust region stepSIAM Journal on Scientific and Statistical Computing19834355357210.1137/0904038MR723110ZBL0551.65042MoréJ. J.BachamA.GrotschelM.KorteB.Recent developments in algorithms and software for trust region methodsMathematical Programming: The State of the Art (Bonn, 1982)1983Berlin, GermanySpringer258287MR717404ZBL0546.90077ShultzG. A.SchnabelR. B.ByrdR. H.A family of trust-region-based algorithms for unconstrained minimization with strong global convergence propertiesSIAM Journal on Numerical Analysis19852214767MR77288210.1137/0722003ZBL0574.65061OsborneM. R.Some special nonlinear least squares problemsSIAM Journal on Numerical Analysis1975124571592MR038622210.1137/0712044ZBL0322.65007XiaofangM.Ying KitF. R.ChengxianX.Stable factorized quasi-Newton methods for nonlinear least-squares problemsJournal of Computational and Applied Mathematics20011291-2114MR182320710.1016/S0377-0427(00)00539-2ZBL0983.65071DennisJ. E.Jr.MartínezH. J.TapiaR. A.Convergence theory for the structured BFGS secant method with an application to nonlinear least squaresJournal of Optimization Theory and Applications1989612161178MR99637510.1007/BF00962795ZBL0645.65026BrownP. N.SaadY.Hybrid Krylov methods for nonlinear systems of equationsSIAM Journal on Scientific and Statistical Computing199011345048110.1137/0911026MR1047206ZBL0708.65049SorensenD. C.Minimization of a large scale quadratic function subject to an ellipsoidal constraint1994TR94-27Houston, Tex, USADepartment of Computational and Applied Mathematics, Rice UniversitySorensenD. C.Newton's method with a model trust region modificationSIAM Journal on Numerical Analysis1982192409426MR65006010.1137/0719026ZBL0483.65039MukaiH.PolakE.A second-order method for unconstrained optimizationJournal of Optimization Theory and Applications1978264501513MR52665010.1007/BF00933149ZBL0373.90068SaadY.Krylov subspace methods for solving large unsymmetric linear systemsMathematics of Computation198137155105126MR61636410.2307/2007504ZBL0474.65019NashS. G.Preconditioning of truncated-Newton methodsSIAM Journal on Scientific and Statistical Computing19856359961610.1137/0906042MR791188ZBL0592.65038