A generalized gradient projection filter algorithm for inequality constrained optimization is presented. It has three merits. The first is that the amount of computation is lower, since the gradient matrix only needs to be computed one time at each iterate. The second is that the paper uses the filter technique instead of any penalty function for constrained programming. The third is that the algorithm is of global convergence and locally superlinear convergence under some mild conditions.
1. Introduction
The optimal problems are often discovered in the field of management, engineering design, traffic transportation, national defence, and so on. The efficient algorithms for these problems are important. We will consider the following nonlinear inequality constrained optimization problem:
(1)minf(x)s.t.cj(x)≤0,
where I={1,2,…,m}andx∈Rn; assume that f:Rn→R and cj(j∈I):Rn→R are continuously differentiable.
In 2002, Fletcher and Leyffer [1] had proposed a filter method for nonlinear inequality constrained optimization, which did not require any penalty function. The main idea is that a trial point is accepted if it improves either the objective function or the constraint violation. Fletcher et al. [2, 3] and Gonzaga et al. [4] had proved that the method was of global convergence. More recently, this method has been extended by Wächter and Biegler [5, 6] and Chin [7] to line search method and by Su [8] to the SQP method.
In this paper, we modify the method given by Wang et al. [9] and propose a generalized gradient projection filter algorithm for inequality constrained optimization with arbitrary initial point. It is organized as follows. In Section 2, we first review the filter method and some definitions of generalized gradient projection and then introduce an algorithm for problem (1). The convergence and the rate of convergence on the algorithm are discussed in Sections 3 and 4, respectively. In the last section, we shall list the numerical tests.
2. Preliminaries and a Filter Algorithm
Let h(x) be a violation function; that is,
(2)h(x)=max{0,cj(x),j∈I}.
Definition 1.
A pair (h(xk),f(xk)) obtained on iteration k dominates another pair (h(xl),f(xl)) if and only if h(xk)≤h(xl) and f(xk)≤f(xl) hold.
Definition 2.
A filter is a list of pairs (h(xk),f(xk)) such that no pair dominates any other. A pair (h(xk),f(xk)) is said to be acceptable for the filter if it is not dominated by any point in the filter.
We use F(k) to denote the set of iterations indices j(j<k) such that (h(xj),f(xj)) is an entry in the current filter. A point x is said to be “acceptable for the filter” if and only if
(3)h(x)≤(1-α2η)h(xj)orf(x)≤f(xj)-γh(xj)
holds for all j∈F(k), where γ,η∈(0,1) is close to zero and α is the step size. We may also “update the filter” which means that the pair (h(x),f(x)) is added to the list of pairs in the filter, and any pairs in the filter that are dominated by (h(x),f(x)) are removed.
However, relying solely on this criterion would result in convergence to a feasible but nonoptimal point. In order to prevent this, we employ the following sufficient reduction criterion.
We denote Δfk=f(xk)-f(xk+αdk) and Δlk=-α∇f(xk)Tdk as actual reduction and linear reduction, respectively, at f(xk). The sufficient reduction condition for f(xk) takes the form
(4)Δlk≥0,Δfk≥σΔlk,
where σ∈(0,1/2) is a preassigned parameter.
At the current iterate xk, define that J(xk)={j∈I:ϵ≤cj(xk)-h(xk)≤0},Ak=(∇cj(xk),j∈J(xk)), and cJk=(cj(xk),j∈J(xk))T, and then
(5)dk0=-Pk∇f(xk)-BkTcJk,λk=-Bk∇f(xk)+(AkTHkAk)-1cJk=λk1+λk2,
where Hk is a given symmetric positive definite matrix, λk1=-Bk∇f(xk), λk2=(AkTHkAk)-1cJk, Bk=(AkTHkAk)-1AkTHk, and Pk=Hk-HkAkBk.
Let Uk=(ukj,j∈J(xk)T), where ukj={λkj1λkj1<00λkj1≥0. Set dk1=-Pk∇f(xk)+BkTUk and dk2=-Pk∇f(xk)+BkT∥dk1∥e, where e=(1,…,1)T. Then
(6)dk=(1-ρk)dk1+ρkdk2,
where ρk=max{ρ∈(0,1]:∇f(xk)T((1-ρ)dk1+ρdk2)≤θ∇f(xk)Tdk1}, θ∈(1/2,1). We use correction direction dk if a trial point has been rejected.
The following is the algorithm.
Algorithm
Given start point x0∈Rn, ϵ0,ϵ1>0, μ=h(x0), η,γ∈(0,1), and β,t,σ∈(0,1/2). Initialize the filter Φ0={(μ,+∞)∈R2} and F(0)=∅. Set k=0.
Inner loop A:
set i=0 and ϵki=ϵ0;
if det(AkiTAki)≥ϵki, where Aki=(∇cj(xk):j∈Jki) and Jki={j∈I:ϵki≤cj(xk)-h(xk)≤0}, then set J(xk)=Jki, Ak=Aki, and ϵk=ϵki, and go to S2;
let i=i+1, ϵki=ϵki-1/2, and go to S1.2.
Compute dk0, λk by (5). If dk0=0 and λk≥0, then stop.
Test direction dk0:
if λkj≥ϵ1, and xk+dk0 is acceptable for the filter, then go to S3.2; otherwise, go to S4;
if h(xk)>0, let xk+1=xk+dk0, and go to S7; otherwise, go to S3.3;
if xk+dk0 satisfies the sufficient reduction condition (4), then let xk+1=xk+dk0, and go to S7; otherwise, go to S4.
Compute dk by (6) and set α=1.
Inner loop B:
if xk+αdk is acceptable for the filter, go to S5.2; otherwise, go to S5.3;
if Δfk<σΔlk, go to S5.3; otherwise, go to S6;
set α=tα, and go to S5.1.
Set αk=α and xk+1=xk+αdk.
Update filter F(k) to F(k+1). Update Hk to Hk+1 by a quasi-Newton method. Set k=k+1, and back to S1.
3. Global Convergence of the Algorithm
In this section, we assume that the following conditions hold.
{∇cj(x),j∈J(x)} is linearly independent of any x∈Rn.
For any k and d, a∥d∥2≤dTHk-1d≤b∥d∥2 holds, where 0<a≤b are constants.
Sequence {xk} generated by the algorithm remains in a closed, bounded subset Ω⊂Rn.
f(x) and ci(x)(i=1,2,…,m) are twice differentiable in Ω; that is, Mminf≤λ(∇2f(x))≤Mmaxf,Mminc≤λ(∇2ci(x))≤Mmaxc.
Similar to [9], the following theorem and lemma hold.
Theorem 3.
If dk0=0 and λk≥0 hold, then xk is a KKT point of problem (1).
Lemma 4.
Consider
(7)dk0=0,λk≥0⟺dk1=0.
According to [8], the following lemma holds.
Lemma 5.
The inner loop A will terminate in finite times.
Lemma 6.
If xk is not a KKT point of problem (1), there must exist ∇f(xk)Tdk<0 and ∇cj(xk)Tdk<0,j∈J(xk).
Proof.
Since xk is not a KKT point, we have either dk0≠0 or j∈J(xk) such that λkj1<0. Thus
(8)∇f(xk)Tdk≤θ∇f(xk)Tdk1≤θ[-(dk0)THk-1dk0-∑λkj1<0(λkj1)2]<0
holds. From Lemma 4, we know that dk1≠0. Therefore
(9)AJkTdk1=Uk≤0,AJkTdk2=-∥dk1∥e<0.
That is, ∇cj(xk)Tdk<0(j∈J(xk)) hold.
Lemma 7.
Let x∞ be the cluster point of {xk} generated by algorithm. If x∞ is not the KKT point of problem (1), there exists α->0, such that Δfk≥σΔlk holds when α≤α-.
Proof.
From the definition of ρ and the assumption (A2), we have
(10)Δlk≥-αθ∇f(xk)Tdk1=-αθ[-(dk0)THk-1dk0-∑λkj1<0(λkj1)2]≥αθ(dk0)THk-1dk0≥αθa2∥dk0∥2.
Since
(11)|Δfk-Δlk|=|f(xk)-f(xk+αdk)+α∇f(xk)Tdk|≤12α2Mmaxf∥dk∥2,
we have
(12)|Δfk-ΔlkΔlk|≤(1/2)α2Mmaxf∥dk∥2(αθa/2)∥dk0∥2=αMmaxf∥dk∥2θa∥dk0∥2.
It is easy to learn that Δfk≥σΔlk holds when α≤α-=(1-σ)θa∥dk0∥2/Mmaxf∥dk∥2.
Lemma 8.
The inner loop B will end in finite times.
Proof.
From Lemma 7, we have that Δfk≥σΔlk holds when α≤α-. By contradiction, if the conclusion is false, then the algorithm will run infinitely between S5.1 and S5.3, so we have α→0 and xk+αdk not acceptable for the filter. We consider it in the following two cases.
Case 1 (h(xk)=0). From Lemma 6, we have ∇f(xk)Tdk<0 and ∇cj(xk)Tdk<0,j∈J(xk). So when
(13)α≤min{-∇f(xk)Tdk(1/2)Mmaxf∥dk∥2,minj∈J(xk){-∇cj(xk)Tdk(1/2)Mmaxc∥dk∥2}},
it is easy to get that
(14)f(xk+αdk)≤f(xk)+α∇f(xk)Tdk+12α2Mmaxf∥dk∥2≤f(xk)=f(xk)-γh(xk),h(xk+αdk)≤max{120,cj(xk)+α∇cj(xk)Tdk+12α2Mmaxc∥dk∥2}≤max{0,cj(xk)}=(1-α2η)max{0,cj(xk)}=(1-α2η)h(xk).
It proves that xk+αdk is acceptable for the filter.
Case 2 (h(xk)>0). Similarly, when
(15)α≤minj∈J(xk){-∇cj(xk)Tdk(1/2)Mmaxc∥dk∥2+ηcj(xk)},
it is easy to learn that
(16)h(xk+αdk)≤max{0,cj(xk)+α∇cj(xk)Tdk+12α2Mmaxc∥dk∥2}≤(1-α2η)h(xk).
Since xk is acceptable for the filter, so for all j∈F(k-1), h(xk)≤h(xj) or f(xk)≤f(xj)-γh(xj) holds. From xk+αdk that is not acceptable for the filter, we have
(17)h(xk+αdk)>(1-α2η)h(xj),(18)f(xk+αdk)>f(xj)-γh(xj)
hold. If h(xk)≤h(xj), then
(19)h(xk+αdk)≤(1-α2η)h(xk)≤(1-α2η)h(xj),
which contradicts (17). If f(xk)≤f(xj)-γh(xj), then when α≤-∇f(xk)Tdk/(1/2)Mmaxf∥dk∥2, it is easy to learn that
(20)f(xk+αdk)≤f(xk)+α∇f(xk)Tdk+12α2Mmaxf∥dk∥2≤f(xk)≤f(xj)-γh(xj),
which contradicts (18).
Based on the above analysis, we can see that the claim holds.
By the above statement, we can see that the algorithm is implementable. Now we turn on to prove the global convergence of the algorithm.
Theorem 9.
Let the assumptions hold and Mminf>0. Suppose x∞ be the cluster point of {xk} generated by algorithm. There exist two possible cases. (i) The iteration terminates at a KKT point. (ii) Any accumulation point of {xk} is a KKT point.
Proof.
we only need to proof case (ii). Since x∞ is the cluster point generated by algorithm, let {xk}k∈K be any thinner subsequences converging to x∞.
We will first show that x∞ is a feasible point. Assume that h(xk)→h(x∞)>0 for k∈K. Let i and j be any two adjacent indices in K where i<j. If h(x∞)>0, then there exists k′∈K such that for all i≥k′ and because xj is acceptable to the filter, we have
(21)f(xj)≤f(xi)-γh(xi).
Since {f(xk)}k∈K is a monotonically decreasing subsequence for k≥k′ and is bounded below, therefore for i,j∈K, i,j≥k′, and i<j,
(22)∑i,j∈KΔfij=∑i,j∈K(f(xi)-f(xj))
is bounded above. However, since f(xj)≤f(xi)-γh(xi), therefore by summing over all indices i,j∈K, i,j≥k′, and i<j,
(23)∑i,j∈KΔfij≥γ∑i∈Kh(xi)⟶+∞,
which contradicts the fact that ∑i,j∈KΔfij is bounded above. Thus h(x∞)=0, hence x∞ is feasible.
Next we need to show that x∞ is a KKT point. By the construction of algorithm, there are two cases: one generates the sequence {xk} from xk+1=xk+dk0, and the other generates it from xk+1=xk+αdk. We prove that claim according to the two cases.
Case 1. Suppose that there are infinite points gotten by xk+1=xk+dk0. Since Δfk≥σΔlk, we have
(24)f(xk)-f(xk+dk0)=-∇f(xk)Tdk0-12(dk0)T∇2f(y)dk0≥-σ∇f(xk)Tdk0.
Thus ∇f(xk)Tdk0≤-(1/2)(dk0)T∇2f(y)dk0/(1-σ) holds. Since f is bounded below, then
(25)+∞>∑k=0∞f(xk)-f(xk+1)≥-∑k=0∞∇f(xk)Tdk0≥12∑k=0∞(dk0)T∇2f(y)dk01-σ≥Mminf2(1-σ)∑k=0∞∥dk0∥2.
Thus ∑k=0∞∥dk0∥2<+∞, which means ∥dk0∥→0. Since x∞ is a feasible point, x∞ is a KKT point.
Case 2. Suppose that there are infinite points gotten by xk+1=xk+αdk. Since Δfk≥σΔlk, we have
(26)0=limk→∞f(xk)-f(xk+αdk)≥-limk→∞σα∇f(xk)Tdk≥-limk→∞α-σ∇f(xk)Tdk≥0,
which means that ∇f(xk)Tdk→0. Since
(27)∇f(xk)Tdk≤θ∇f(xk)Tdk1≤θ[-(dk0)THk-1dk0-∑λkj1<0(λkj1)2]<0
we have ∥dk0∥→0 and λkj≥0, and since x∞ is a feasible point, x∞ is a KKT point.
Combined Case 1 and Case 2, we can see that the claim holds.
4. The Rate of Convergence
In this section, we discuss the convergent rate of the algorithm. We need the following strong assumptions.
The second-order sufficiently conditions hold, that is, dT∇xx2L(x∞,λ∞)d, for all d∈ker∇cJ^(x∞)∖{0}, where L(x,λ)=f(x)+λTc(x), c(x)=(c1(x),…,cm(x))T,J^(x∞)={j∈J(x∞):(λ∞)j>0}, and (x∞,λ∞) is the KKT pair of problem (1).
Consider ∥(Hk-1-∇xx2L(x∞,λ∞))dk0∥=o(∥dk0∥).
Theorem 10.
Suppose that assumptions (A1)–(A6) hold; then xk+1=xk+dk0 for large enough k. Therefore the algorithm is superlinearly convergent.
Proof.
Suppose that xk is acceptable for the filter; we will show that for large enough k, xk+1=xk+dk0 is acceptable for the filter and satisfies the sufficient reduction condition.
First we need to prove that xk+1=xk+dk0 is acceptable for the filter. If h(xk+dk0)≤(1-η)h(xk), then xk+1=xk+dk0 is already acceptable for the filter. Else we need to show that f(xk+dk0)≤f(xk)-γh(xk). Let sk=f(xk+dk0)-f(xk)+γh(xk); it holds that
(28)sk≤∇f(xk)Tdk0+12(dk0)T∇2f(xk)dk0+γh(xk+dk0)1-η+o(∥dk0∥2)≤∇f(xk)Tdk0+12(dk0)T∇2f(xk)dk0+γ2(1-η)∑j=1m(dk0)T∇2cj(xk)dk0+o(∥dk0∥2).
From ∇f(xk)Tdk0=λkTcJk-(dk0)THk-1dk0, we have
(29)sk≤λkTcJk-(dk0)THk-1dk0+12(dk0)T∇2f(xk)dk0+γ2(1-η)∑j=1m(dk0)T∇2cj(xk)dk0+o(∥dk0∥2).
Since λkj≥ϵ1, set ϵ1=γ/(1-η), and then
(30)sk≤λkTcJk-(dk0)THk-1dk0+12(dk0)T∇2f(xk)dk0+12∑j=1mλkj(dk0)T∇2cj(xk)dk0+o(∥dk0∥2)=-(dk0)THk-1dk0+λkTcJk+12(dk0)T∇xx2L(xk,λk)dk0+o(∥dk0∥2).
According to xk→x∞, λk→λ∞≥0, and cj(xk)→cj(x∞)≤0 and assumptions (A2), (A3), and (A5), then
(31)sk≤-a2∥dk0∥2+12(dk0)T(∇xx2L(xk,λk)-∇xx2L(x∞,λ∞))dk0+o(∥dk0∥2)+12(dk0)T(∇xx2L(x∞,λ∞)-Hk-1)dk0≤-a2∥dk0∥2+o(∥dk0∥2)≤0.
Hence, for large enough k, xk+1=xk+dk0 is acceptable for the filter.
Now we are going to show that when k is large enough, xk+1=xk+dk0 satisfies the sufficient reduction condition Δfk≥σΔlk. Let tk=f(xk)-f(xk+dk0)-σΔlk; then we have
(32)tk=(σ-1)(λkTcJk-(dk0)THk-1dk0)-12(dk0)T∇2f(xk)dk0-o(∥dk0∥2)≥(σ-1)(λkTcJk-(dk0)THk-1dk0)-12(dk0)T∇xx2L(xk,λk)dk0-o(∥dk0∥2).
Since cj(xk)→cj(x∞)≤0 and assumptions (A3), and (A5), then
(33)tk≥(σ-1)λkTcJk-(σ-12)(dk0)THk-1dk0-12(dk0)T(∇xx2L(xk,λk)-∇xx2L(x∞,λ∞))dk0-12(dk0)T(∇xx2L(x∞,λ∞)-Hk-1)dk0-o(∥dk0∥2)≥a2(12-σ)∥dk0∥2-o(∥dk0∥2)≥0.
Hence, for large enough k, xk+1=xk+dk0 satisfies the sufficient reduction condition.
Based on Theorem 10, we can see, when k is large enough that the algorithm will implement the Newton steps and will not change; thus the algorithm is superlinearly convergent.
5. Numerical Test
In this section, we give some numerical results according to our algorithm. We update the matrix Hk by BFGS formulation and the algorithm parameters are set as H0=I∈Rn×n,γ=0.1,η=0.1,andσ=0.01.
Example 11.
One has
(34)minf(x)=0.1{0.44x13x22+10x1+0.592x1x23}s.t.-1+8.62x23x1≤0,
where x0=(2.5,2.5),x∞=(1.2867,0.5305), and iterate = 16.
Example 12 (see [8]).
Consider
(35)minf(x)=x12+x22+x32+x42s.t.6-x12-x22-x32-x42≤0,
where x0=(2,2,2,2),x∞=(1.2247,1.2247,1.2247,1.2247), and iterate = 14.
Example 13 (see [10]).
One has
(36)minf(x)=-50(x12+x22+x32+x42+x52)-10.5x1-7.5x2-3.5x3-2.5x4-1.5x5-10x6s.t.6x1+3x2+3x3+2x4+x5≤6.510x1+10x3+x6≤200≤xi≤1,i=1,2,3,4,5;x6≥0.
x∞=(0,1,0,1,1,20) is a minimizer with an objective value f*=-361.5. We choose the initial point x0=(1,1,1,1,1,10), iterate = 6.
We choose the initial point x0=(1,1,1,1). x∞=(0.2896,0.9150,2.1798,0.6265) is a minimizer with an objective value f*=-50.1192, iterate = 40.
Acknowledgment
This research was supported by the National Natural Science Foundation of China (no. 11271128).
FletcherR.LeyfferS.Nonlinear programming without a penalty function200291223926910.1007/s101070100244MR1875517ZBL1049.90088FletcherR.LeyfferS.TointP. L.On the global convergence of a filter-SQP algorithm2002131445910.1137/S105262340038081XMR1922433ZBL1029.65063FletcherR.GouldN. I. M.LeyfferS.TointP. L.WächterA.Global convergence of a trust-region SQP-filter algorithm for general nonlinear programming200213363565910.1137/S1052623499357258MR1972208ZBL1038.90076GonzagaC. C.KarasE.VantiM.A globally convergent filter method for nonlinear programming200314364666910.1137/S1052623401399320MR2085935ZBL1079.90129WächterA.BieglerL. T.Line search filter methods for nonlinear programming: motivation and global convergence200516113110.1137/S1052623403426556MR2177767ZBL1114.90128WächterA.BieglerL. T.Line search filter methods for nonlinear programming: local convergence2005161324810.1137/S1052623403426544MR2177768ZBL1115.90056ChinC. M.A global convergence theory of a filter line search method for nonlinear programming2002SuK.A globally and superlinearly convergent modified SQP-filter method200841220321710.1007/s10898-007-9219-0MR2398932ZBL1153.90023WangW.ZhangL.-S.XuY.-F.A revised conjugate gradient projection algorithm for inequality constrained optimizations2005232217224MR2118058FloudasC. A.PardalosP. M.AdjimanC. S.EspositoW. R.GümüşZ. H.HardingS. T.KlepeisJ. L.MeyerC. A.SchweigerC. A.199933Dordrecht, The NetherlandsKluwer Academic Publishersxvi+441MR1718483SchittkowskiK.1987282Berlin, GermanySpringervi+26110.1007/978-3-642-61582-5MR1117683