Nonlinear Lagrangian algorithm plays an important role in solving constrained optimization problems. It is known that, under appropriate conditions, the sequence generated by the first-order multiplier iteration converges superlinearly. This paper aims at analyzing the second-order multiplier iteration based on a class of nonlinear Lagrangians for solving nonlinear programming problems with inequality constraints. It is suggested that the sequence generated by the second-order multiplier iteration converges superlinearly with order at least two if in addition the Hessians of functions involved in problem are Lipschitz continuous.
1. Introduction
Lagrangians play an important role for solving constrained optimization problems. Hestenes [1] and Powell [2] introduced the proximal augmented Lagrangian for problems with equality constraints and Rockafellar [3] developed the proximal augmented Lagrangian for problems with both equality and inequality constraints.
Based on the above Lagrangians, Bertsekas [4, 5] discussed the convergence of sequence generated by the second-order multiplier iteration. The same author further improved the convergence and convergent rate of the second-order multiplier iteration using Newton’s method in 1982. Besides, Brusch [6] and Fletcher [7] first independently proposed the second-order multiplier iteration using quasi-Newton’s method, respectively. Bertsekas [8] developed new framework of quasi-Newton’s method in 1982.
Consider the following inequality constrained optimization problem:
(INP)minimizef0(x)subjecttofi(x)≥0,i=1,…,m,
where fi:ℝ2→ℝ, i=0,…,m are continuous differentiable functions.
As nonlinear Lagrangians can be used to develop dual algorithms for nonlinear programming, requiring no restrictions on primal feasibility, important contributions on this topic have been done by many authors.
Polyak and Teboulle [9] discussed a class of Lagrange functions of the form
(1)H(x,u,c)=f0(x)-c∑i=1muiψ(c-1fi(x))
for solving (INP), where c>0 is penalty parameter and ψ is twice continuous differentiable function. Furthermore, Polyak and Griva [10] proposed a general primal-dual nonlinear rescaling (PDNR) method for convex optimization with inequality constraints, and Griva and Polyak [11] developed a general primal-dual nonlinear rescaling method with dynamic scaling parameter update. Besides the works by Polyak and his coauthors, Auslender et al. [12] and Ben-Tal and Zibulevsky [13] studied other nonlinear Lagrangians and obtained interesting convergence results for convex programming problems, too. Under appropriate conditions, the sequence generated by the first-order multiplier iteration converges superlinearly.
Ren and Zhang [14] analysed the following nonlinear Lagrangians:
(2)H(x,u,c)=f0(x)-k-1∑i=1muiψ(kfi(x))
and constructed the dual algorithm based on minimizing H(x,u,k) as follows.
D-Algorithm
Step 1.
Given k>0 large enough, ε≥0 small enough, u0∈ℝ++m, and x0∈ℝn, set s=0.
Step 2.
Solve (approximately)
(3)minimizeH(x,us,k)
and obtain its (approximate) solution xs.
Step 3.
If uisfi(xs)≤ε, i=1,…,m, stop; otherwise go to Step 4.
It was shown that, under a set of conditions, dual algorithm based on this class of Lagrange is locally convergent when the penalty parameter is larger than a threshold.
In view of interpretation of the multiplier iteration as the steepest ascent method, it is natural to consider Newton’s method for maximizing the dual functional. Using known results for Newton’s method, we expect that the second-order iteration will yield a vector us+1 which is closer to u* than us. This paper aims at discussing the second-order multiplier iteration based on nonlinear Lagrangians of the form (2). It is suggested that the sequence generated by the second-order multiplier iteration converges superlinearly with order at least two if ∇2fi(x)(i=0,…,m) are Lipschitz continuous.
We introduce the following notation to end this section:
(5)∇f(x)=(∇f1(x),…,∇fm(x)),u*=(u1*,…,um*)∈ℝm,u(m-r)*=(ur+1*,…,um*)∈ℝm-r,S(y,ε)={x∈ℝn:∥x-y∥≤ε}.∇f(r)(x)=(∇f1(x),…,∇fr(x)),u(r)*=(u1*,…,ur*)∈ℝr,∥x∥=∥x∥∞=max1≤i≤n|xi|.
2. Preliminaries
Consider the inequality constrained optimization problem (INP). Let
(6)L(x,u)=f0(x)-∑i=1muifi(x)
denote the Lagrange function for problem (INP) and I(x)={i∣fi(x)=0, i=1,…,m}.
For the convenience of description in the sequel, we list the following assumptions, some of which will be used somewhere.
Functions fi(x)(i=0,…,m) are twice continuously differentiable.
For convenience of statement, we assume I(x*)={i∣fi(x*)=0, i=1,…,m}={1,…,r}.
Let (x*,u*)∈ℝn×ℝm satisfy the Kuhn-Tucker conditions
(7)∇xL(x*,u*)=0,u*≥0,ui*fi(x*)=0,i=1,…,m.
Strict complementary condition holds; that is,
(8)ui*>0fori∈I(x*).
The set of vectors {∇fi(x*)∣i∈I(x*)} are linearly independent.
For all y≠0 satisfying ∇fi(x*)Ty=0, i∈I(x*), the following inequality holds:
(9)yT∇x2L(x*,u*)y>0.
Let function ψ in H(x,u,k) defined in (2) and its derivatives satisfy the following conditions:
ψ(0)=0;
ψ′(t)>0, for all t∈(b,+∞), with -∞≤b<0, and ψ′(0)=1;
ψ′′(t)<0, for all t∈(b,+∞), with -∞≤b<0;
kψ′(kt) is bounded, where t∈(b,+∞), with -∞≤b<0, and for k>0 large enough.
The following proposition concerns properties of H(x,u,k) at a Kuhn-Tucker point (x*,u*).
Proposition 1 (see [14]).
Assume that (a)–(f) and (H1)–(H3) hold. For any k>0 and any Kuhn-Tucker point (x*,u*) the following properties are valid:
H(x*,u*,k)=L(x*,u*)=f(x*);
∇xH(x*,u*,k)=∇xL(x*,u*)=∇f(x*)-∑i=1mui*∇fi(x*)=0;
∇x2H(x*,u*,k)=∇x2L(x*,u*)-kψ′′(0)∇f(r)(x*)TU*∇f(r)(x*), where U*=diag1≤i≤r(ui*);
there exist c0>0 and μ>0 such that, for any c>c0,
(10)〈∇x2H(x*,u*,k)y,y〉≥μ〈y,y〉,∀y∈ℝnsatisfying∇f(r)(x*)Ty=0.
Let δ>0 be small enough, 0<ε<min{ui*∣i=1,…,r}, and k0 large enough satisfying (iv) of Proposition 1. For any fixed k>k0, define
(11)Uki(ε,δ)={{ui∣max{ε,ui*-δk}≤ui≤ui*+δk},i=1,…,r,{ui∣0≤ui≤δk},i=r+1,…,m,Uk(ε,δ)=Uk1(ε,δ)×…×Ukr(ε,δ)×…×Ukm(ε,δ).
For any k1>k0, we denote
(12)D(ε,δ)={(u,k)∣u∈Uk(ε,δ),k∈[k0,k1]}.
Let σ=min{fi(x*)∣r+1≤i≤m}>0, Ir is the r×r identity matrix, and 0r is the r×r zero matrix.
Theorem 2 (see [14]).
Assume that (a)–(f) and (H1)–(H4) hold. Then there exists k0>0 large enough such that, for any k1>k0, there exist ε1>0,δ>0, satisfying that for any (u,k)∈D(ε,δ), the following statements hold.
There exists a vector
(13)x^=x^(u,k)∈argmin{H(x,u,k)x∈S(x*,ε1)}.
For x^ in (i) and u^=u^(u,k)=diag1≤i≤m(ψ′(kfi(x^)))u, the following estimate is valid:
(14)max{∥x^-x*∥,∥u^-u*∥}≤ck-1∥u-u*∥,
where c>0 is a scalar independent of k0 and k1.
Function H(x,u,k) is strongly convex in a neighborhood of x^.
3. The Second-Order Multiplier Iteration
Based on the nonlinear Lagrange function H(x,u,k), we consider the dual function defined on S(x*,ε1)×ℝ+m as follows:
(15)dk(u)=inf{H(x,u,k)∣x∈S(x*,ε1)}-δ(u∣Uk(ε,δ)),
where δ(u∣Uk(ε,δ))={0ifu∈Uk(ε,δ)+∞ifu∉Uk(ε,δ) is the indicator function of Uk(ε,δ).
Lemma 3.
Assume that conditions (a)–(f) and (H1)–(H4) hold; then for any fixed k≥k0 function dk(u) is twice continuously differentiable and concave on Uk(ε,δ).
Proof.
Obviously, for k>0, function dk(u) is concave. In view of Theorem 2, for any (u,k)∈D(ε,δ), function H(x,u,k) is strong convex in the neighborhood of x^=x^(u,k). So x^(u,k) is unique minimizer of function H(x,u,k) with respect to x in the neighborhood of point x^, and dk(u)=H(x^(u,k),u,k) is smooth in Uk(ε,δ); that is, the Jacobian of dk(u) exists, and
(16)∇udk(u)=∇ux^(u,k)∇xH(x^(u,k),u,k)+∇uH(x^(u,k),u,k)=(∇u1dk(u),…,∇umdk(u))T.
For (u,k)∈D(ε,δ), matrix ∇x2H(x,u,k) is positive definite, and system ∇xH(x,u,k)=0n generates unique vector-valued function x^(u,k) satisfying x^(u*,k)=x* and
(17)∇ux^(u,k)=-∇xu2H(x^(u,k),u,k)(∇x2H(x^(u,k),u,k))-1,hhhhhhhhhhhhhhhhhhhhhh∀(u,k)∈D(ε,δ).
In view of ∇xH(x^(u,k),u,k)=0n, we have
(18)∇udk(u)=∇uH(x^(u,k),u,k)=-k-1(ψ(kf1(x^(u,k))),…,ψ(kfm(x^(u,k))))T.
It follows from (18) that
(19)∇ux2H(x^(u,k),u,k)=-(ψ′(kf1(x^(u,k)))∇f1(x^(u,k)),…,hhhhhhψ′(kfm(x^(u,k)))∇fm(x^(u,k)))=-∇f(x^(u,k))ψ′(kf(x^(u,k))),
which means
(20)∇xu2H(x^(u,k),u,k)=(∇ux2H(x^(u,k),u,k))T=-ψ′(kf(x^(u,k)))∇f(x^(u,k))T.
Thus,
(21)∇u2dk(u)=∇ux(u,k)∇ux2H(x^(u,k),u,k)=-∇xu2H(x^(u,k),u,k)(∇x2H(x^(u,k),u,k))-1×∇ux2H(x^(u,k),u,k)=-ψ′(kf(x^(u,k)))(∇f(x^(u,k)))T×(∇x2H(x^(u,k),u,k))-1×(∇f(x^(u,k)))ψ′(kf(x^(u,k))).
So,
(22)∇u2dk(u*)=-ψ′(kf(x*))(∇f(x*))T(∇x2H(x*,u*,k))-1×(∇f(x*))ψ′(kf(x*)).
Let x^(u,k) be the minimizer of H(x,u,k) in a neighborhood of x*; then we obtain that
(23)∇udk(u)=∇uH(x^(u,k),u,k)=-k-1(ψ(kf1(x^(u,k))),…,ψ(kfm(x^(u,k))))T,∇u2dk(u)=-ψ′(kf(x^(u,k)))∇f(x^(u,k))T×(∇x2H(x^(u,k),u,k))-1×∇f(x^(u,k))ψ′(kf(x^(u,k))).
In view of the interpretation of the multiplier iteration as the steepest ascent method, it is natural to consider Newton’s method for maximizing the dual functional dk which is given by
(24)us+1=us-[∇2dk(us)]-1∇dk(us).
In view of (23), this iteration can be written as
(25)us+1=us-Bk-1k-1ψ(kf(x(us,k))),
where
(26)Bk=ψ′(kf(x(us,k)))∇f(x(us,k))T×[∇x2H(x(us,k),us,k)]-1×∇f(x(us,k))ψ′(kf(x(us,k))).
We will provide a convergence and rate of convergence result for iteration (25) and (26).
For k>0 and (x,u)∈ℝn+m, we define
(27)A+(x,u)={i∣uiψ(kfi(x))>0,i=1,…,m}A-(x,u)={i∣i∉A+(x,u),i=1,…,m}.
For a given (x,u), assume (by reordering indices if necessary) that A+(x,u) contains the first r indices where r is an integer with 0≤r≤m. Define
(28)ψ+(kf(x))=(ψ(kf1(x))⋮ψ(kfr(x)))ψ-(kf(x))=(ψ(kfr+1(x))⋮ψ(kfm(x)))u+=(u1,…,ur)T,u-=(ur+1,…,um)T,H+(x,u,k)=f0(x)-k-1u+Tψ+(kf(x)).
We note that r,ψ+,ψ-,u+,u- and H+ depend on (x,u), but to simplify notation we do not show explicitly this dependence. Now, we consider Newton’s method for solving the system of necessary conditions
(29)∇xH+(x,u,k)=∇f0(x)-∑i=1ruiψ′(kfi(x))∇fi(x)=0,k-1ψ(kfi(x))=0,i=1,…,r.
Considering the extension of Newton’s method, given (x,u), we denote the next iterate by (x^,u^) where u^=(u^1,…,u^m)T. We also write
(30)u^+=(u^1,…,u^r)T,u^-=(u^r+1,…,u^m)T.
The iteration, roughly speaking, consists of setting the multipliers of the inactive constraints (j∈A-(x,u)) to zero and treating the remaining constraints as equalities. More precisely, we set u^-=0m-r and obtain x^,u^+ by solving the system
(31)(∇x2H+(x,u,k)-∇f(r)(x)ψ+′(kf(x))ψ+′(kf(x))∇f(r)(x)T0)(x^-xu^+-u+)=(-∇xH+(x,u,k)-k-1ψ+(kf(x))),
where ψ+′(kf(x))=[diagψ′(kfi(x))]i=1r.
If ∇x2H+(x,u,k) is invertible and ∇f(r)(x) has rank r, we can solve system (31) explicitly. It follows from (31) that
(32)∇x2H+(x,u,k)(x^-x)-∇f(r)(x)ψ+′(kf(x))(u^+-u+)=-∇xH+(x,u,k),(33)ψ+′(kf(x))∇f(r)(x)T(x^-x)=-k-1ψ+(kf(x)).
Premultiplying (32) with ψ+′(kf(x))∇f(r)(x)T[∇x2H+(x,u, k)]-1 and using (33), we obtain
(34)x^-x=[∇x2H+(x,u,k)]-1{∇f(r)(x)ψ+′(kf(x))(u^+-u+)hhhhhhhhhhhhhhhhh-∇xH+(x,u,c)ψ+′∇f(r)},
from which, we have
(35)u^+=u+-{ψ+′(kf(x))∇f(r)(x)T[∇x2H+(x,u,k)]-1hhhhhhhhh×∇f(r)(x)ψ+′(kf(x))}-1k-1ψ+(kf(x)).
Substitution in (32) yields
(36)x^=x-∇x2H+(x,u,k)∇xH+(x,u^,k).
Return to (25) and (26), and using the fact that ∇xH+(x(u,k),u,k)=0, we see that iteration (25) and (26) is of the form (35).
For a triple (x,u,k) for which the matrix on the left-hand side of (31) is invertible, we denote by x^(x,u,k), u^+(x,u,k) the unique solution of (31) and say that x^(x,u,k), u^+(x,u,c) are well defined.
Define
(37)u+s+1=u^+(x(us,k),us,k),u-s+1=0.
Proposition 4.
Let k be a scalar. For every triple (x,u,k), if ψ′ satisfies
(38)ψ+′2-2ψ+′+I=0,
then the vectors x^(x,u,k), u^+(x,u,k) are well defined if and only if the vectors x^(x,ψ′(kf(x))u,0), u^+(x,ψ′(kf(x))u,0) are well defined.
By calculating, we have
(41)∇xH+(x,u,k)=∇xL(x,ψ′(kf(x))u),∇x2H+(x,u,k)=∇x2L(x,ψ′(kf(x))u)-k∑i=1ruiψ′′(kfi(x))∇fi(x)∇fi(x)T.
As a result, the system (31) can be written as(42)(∇x2L(x,ψ′(kf(x))u)-k∑i=1ruiψ′′(kfi(x))∇fi(x)∇fi(x)T-∇f(r)(x)ψ+′(kf(x))ψ+′(kf(x))∇f(r)(x)T0)×(x^-xu^+-u+)=(-∇xL(x,ψ′(kf(x))u)-k-1ψ+(kf(x))).The second equation yields
(43)ψ+′(kf(x))∇f(r)(x)T(x^-x)=-k-1ψ+(kf(x)).
If we form the second-order Taylor series expansion of ψ around tk,
(44)ψ(t)=ψ(tk)+ψ′(tk)(t-tk)+12(t-tk)Tψ′′(tk)(t-tk),
we obtain
(45)ψ′(t)=ψ′(tk)+ψ′′(tk)(t-tk).
Take t=kfi(x^), tk=kfi(x), i=1,…,r, and it follows that
(46)ψ′(kfi(x))=1-k(x^-x)T∇fi(x)ψ′′(kfi(x)),hhhhhhhhhhhhhhhhhhhi=1,…,r.
Substituting (46) into (43), we have
(47)diag1≤i≤r(1-k(x^-x)T∇fi(x)ψ′′(kfi(x)))×∇f(r)(x)T(x^-x)=-k-1ψ+(kf(x))
which, when substituted into the first equation in (42), yields
(48)∇x2L(x,ψ′(kf(x))u)(x^-x)-∇f(r)(x)ψ+′(kf(x))u^++2∇f(r)(x)ψ+′(kf(x))u+-∇f(r)(x)u+=-L(x,ψ′(kf(x))u).
Thus, in view of condition ψ+′2-2ψ+′+I=0, system (42) is equivalent to
(49)(∇x2L(x,ψ′(kf(x))u)-∇f(r)(x)ψ+′(kf(x))ψ+′(kf(x))∇f(r)(x)T0)×(x^-xu^+-ψ+′(kf(x))u+)=(-∇xL(x,ψ′(kf(x))u)-k-1ψ+(kf(x))).
This shows (39) and (40).
In view of (40), we can write (37) as
(50)u+s+1=u^+(x(us,k),u~(us,k),0),u-s+1=0m-r,
where
(51)u~(us,k)=ψ′(kf(x(us,k)))us.
This means that one can carry out the second-order multiplier iteration (25), (26) in two stages. First execute the first-order iteration (51) and then the second-order iteration (50), which is part of Newton’s iteration at (x(us,k)),u~(us,k) for solving the system of necessary conditions (29).
Now, we know that x(us,k),u~(us,k) is close to (x*,u*) for (us,k) in an appropriate region of ℝm+1. Therefore, using known results for Newton’s method, we expect that (50) will yield a vector us+1 which is closer to u* than us. This argument is the basis for the proof of the following proposition.
Proposition 5.
Assume (a)–(f) hold, and let k0>0, δ>0 be as in Theorem 2. Then, given any scalar γ>0, there exists a scalar δ1 with 0<δ1≤δ such that for all (u,k)∈D1={(u,k):u∈Uk(ε,δ1),k≥k0} there holds
(52)∥(x^(u,k),u^(u,k))-(x*,u*)∥≤γk-1∥u-u*∥,
where
(53)us+1=us-Bk-1k-1ψ(kf(x(us,k))),
where
(54)Bk=ψ′(kf(x(us,k)))∇f(x(us,k))T×[∇x2H(x(us,k),us,k)]-1∇f(x(us,k))ψ′×(kf(x(us,k))).
If, in addition, ∇2fi(x), i=0,…,m are Lipschitz continuous in a neighborhood of x*, there exists a scalar γ1>0 such that, for all (u,k)∈D1, there holds
(55)∥(x^(u,k),u^(u,k))-(x*,u*)∥≤γ1k-2∥u-u*∥2.
Proof.
In view of Theorem 2, given any γ>0, there exist ε1>0,ε2>0 and M>0 such that if x(u,k)∈S(x*,ε1) and u~(u,k)∈S(u*,ε2), there holds
(56)∥(x^(x(u,k),u~(u,k),0),u^(x(u,k),u~(u,k),0))-(x*,u*)∥≤γM∥(x(u,k),u~(u,k))-(x*,u*)∥
(compare with Proposition 1.17, Bertsekas [8]). Take δ1 sufficiently small so that, for all (u,k)∈D1, we have x(u,k)∈S(x*,ε1), u~(u,k)∈S(u*,ε2), and
(57)∥(x(u,k),u~(u,k))-(x*,u*)∥≤Mk-1∥u-u*∥.
From (50), we have
(58)∥(x^(x(u,k),u~(u,k),0),u^(x(u,k),u~(u,k),0))-(x*,u*)∥≤γM·Mk-1∥u-u*∥=γk-1∥u-u*∥.
If ∇2fi(x)(i=0,…,m) are Lipschitz continuous, then there exists a γ1>0 such that for x(u,k)∈S(x*,ε) and u~(u,k)∈S(u*,ε), we have
(59)∥(x^(x(u,k),u~(u,k),0),u^(x(u,k),u~(u,k),0))-(x*,u*)∥≤γ12M2∥(x(u,k),u~(u,k))-(x*,u*)∥2≤γ12M2((Mk-1)2∥u-u*∥2+(Mk-1)2∥u-u*∥2)=γ1k-2∥u-u*∥2.
From the above analysis, we know that the sequence generated by the second-order multiplier iteration converges superlinearly with order at least two if the Hessians of functions involved in problem are Lipschitz continuous.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
This project is supported by the National Natural Science Foundation of China (Grant no. 11171138).
HestenesM. R.Multiplier and gradient methods1969453033202-s2.0-001460430810.1007/BF00927673MR0271809ZBL0208.18402PowellM. J. D.A method for nonlinear constraints in minimization problems1969New York, NY, USAAcademic Press283298MR0272403ZBL0194.47701RockafellarR. T.Augmented Lagrange multiplier functions and duality in nonconvex programming197412268285MR038416310.1137/0312021ZBL0257.90046BertsekasD. P.Multiplier methods: a survey19761221331452-s2.0-0001194769MR039583810.1016/0005-1098(76)90077-7ZBL0321.49027BertsekasD. P.On the convergence properties of second-order multiplier methods19782534434492-s2.0-3425027491410.1007/BF00932905MR508108ZBL0362.65041BruschR. B.A rapidly convergent methods for equality constrained function minimizationProceedings of the IEEE Conference on Decision and Control1973San Diego, Calif, USA8081FletcherR.An ideal penalty function for constrained optimization1975New York, NY, USAAcademic Press121163MR0389215BertsekasD. P.1982New York, NY, USAAcademic PressMR690767PolyakR. A.TeboulleM.Nonlinear rescaling and proximal-like methods in convex optimization19977622652842-s2.0-0041629347MR1427187ZBL0882.90106PolyakR. A.GrivaI.Primal-dual nonlinear rescaling method for convex optimization200412211111562-s2.0-974428694510.1023/B:JOTA.0000041733.24606.99MR2092474ZBL1129.90339GrivaI.PolyakR. A.Primal-dual nonlinear rescaling method with dynamic scaling parameter update200610622372592-s2.0-3174444049810.1007/s10107-005-0603-6MR2208083ZBL1134.90494AuslenderA.CominettiR.HaddouM.Asymptotic analysis for penalty and barrier methods in convex and linear programming199722143622-s2.0-0031072276MR143657310.1287/moor.22.1.43ZBL0872.90067Ben-TalA.ZibulevskyM.Penalty/barrier multiplier methods for convex programming problems1997723473662-s2.0-0031531865MR144362310.1137/S1052623493259215ZBL0872.90068RenY.-H.ZhangL.-W.The dual algorithm based on a class of nonlinear Lagrangians for nonlinear programmingProceedings of the 6th World Congress on Intelligent Control and Automation (WCICA '06)20069349382-s2.0-3404719385910.1109/WCICA.2006.1712481