A Filter Algorithm with Inexact Line Search

A filter algorithm with inexact line search is proposed for solving nonlinear programming problems. The filter is constructed by employing the norm of the gradient of the Lagrangian function to the infeasibility measure. Transition to superlinear local convergence is showed for the proposed filter algorithm without second-order correction. Under mild conditions, the global convergence can also be derived. Numerical experiments show the efficiency of the algorithm.


Introduction
Fletcher and Leyffer 1 proposed filter methods in 2002 offering an alternative to traditional merit functions in solving nonlinear programming problems NLPs .The underlying concept is that a trial point is accepted provided that there is a sufficient decrease of the objective function or the constraint violation.Filter methods avoid the difficulty of determining a suitable value of the penalty parameter in the merit function.The promising numerical results in 1 led to a growing interest in filter methods in recent years.Two variants of trustregion filter sequential quadratic programming SQP method were proposed by Fletcher et al. 2, 3 .Chin and Fletcher 4 developed filter method to sequential linear programming strategy that takes equality-constrained quadratic programming steps.Ribeiro et al. 5 proposed a general filter algorithm that does not depend on the particular method used for the step of computation.Ulbrich 6

The Algorithm Mechanism
We describe and analyze the line-search filter method for NLP with equality constraints.State it as where the objective function f : Ê n → Ê and the constraints c i are assumed to be continuously differentiable, and E {i | i 1, 2, . . ., m}.
The corresponding Lagrangian function is where the vector λ corresponds to the Lagrange multiplier.The Karush-Kuhn-Tucker KKT conditions for 2.1 are ∇f x λ T ∇c x 0, c x 0.

2.3
For a given initial estimate x 0 , the line-search algorithm generates a sequence of iterates x k by x k 1 x k α k d k as the estimates of the solution for 2.1 .Here, the search direction d k is computed from the linearization at x k of the KKT conditions 2.3 : where the symmetric matrix W k denotes the Hessian ∇ 2 xx L x k , λ k of 2.2 or a positive definite approximation to it.
After a search direction d k has been computed, the step size α k ∈ 0, 1 is determined by a backtracking line-search procedure, where a decreasing sequence of step size α k is tried until some acceptable criteria are satisfied.Generally, the acceptable criteria are constructed by a condition that if the current trial point x k can provide sufficient reduction of a merit function.The filter method proposed by Fletcher and Leyffer 1 offers an alternative to merit functions.In this paper, the filter notion is defined as follows.
Here, we define V x : c x 2 ∇L x 2 as the l 2 norm of the infeasibility measure.That is, we modify the infeasibility measure in filter, with this modification, the superlinear convergence is possibly derived.Strictly, it is a stationarity measure.However, we still call it infeasibility measure according to its function.In the rest of paper, the norm is always computed by l 2 norm excepting special noting.

Definition 2.2.
A filter is a list of pairs V l , f l such that no pair dominates any other.A point V k , f k is said to be acceptable for inclusion in the filter if it is not dominated by any point in the filter.
When a pair V k , f k is said to be acceptable to the filter, we also say the iterate x k is acceptable to the filter.In filter notion, a trial point x k 1 is accepted if it improves feasibility or improves the objective function.So, it is noted that filter criteria is less demanding than traditional penalty function.When improving optimality, the norm of the gradient of the Lagrangian function will tend to zero, so it offers a more precise analysis for the objective function.
However, this simple filter concept is not sufficient to guarantee global convergence.Fletcher et al. 3 replace this condition by requiring that the next iterate provides at least as much progress in one of the measure V or f that corresponds to a small fraction of the current infeasibility measure.Here, we use the similar technique to our filter.Formally, we say that a trial point can be accepted to the current iterate x k or the filter if for some fixed constants β, γ ∈ 0, 1 , and V x l , f x l are points in current filter.In practical implementation, the constants β close to 1 and γ close to 0. However, the criteria 2.5a and 2.5b may make a trial point always provides sufficient reduction of the infeasibility measure alone, and not the objective function.To prevent this, we apply a technique proposed in Wächter and Biegler 10 to a different sufficient reduction criteria.The switching condition is where δ > 0, e 2 > 1, e 1 ≥ 2e 2 .If the condition 2.6 holds, we replace the filter condition 2.5b as an inexact line-search condition, that is, the Armijo type condition where η ∈ 0, 1/2 is a constant.If 2.6 holds but not 2.7 , the trial points are still determined by 2.5a and 2.5b .If a trial point x k can be accepted at a step size by 2.7 , we refer to x k as an f type iterate and the corresponding α k as an f step size.

The Algorithm Analysis
By the right part of switching condition 2.6 , it ensures that the improvement to the objective function by the Armijo condition 2.7 is sufficiently large compared to the current infeasibility measure V x k .Thus, if iterate points remote from the feasible region, the decrease of the objective function can be sufficient.By setting e 2 > 1, the progress predicted by the line model −α k ∇f T k d k e 1 of f can be a power of the infeasibility measure V x .The choice of e 1 ≥ 2e 2 makes it possible that a full step can be accepted by 2.7 when it closes to a local solution.
In this paper, we denote the filter as a set F k containing all iterates accepted by 2.5a and 2.5b .During the optimization, if the f type switching condition 2.6 holds and the Armijo condition 2.7 is satisfied, the trial point is determined by 2.7 not by 2.5a and 2.5b , and the value of the objective function is strictly decreased.This can prevents cycling of the algorithm see 10 .
If the linear system 2.4 is incompatible, no search direction d k can be found and the algorithm switches to a feasibility restoration phase.In it, we try to decrease the infeasibility measure to find a new iterate x k 1 that satisfies 2.4 and is acceptable to the current filter.Similarly, in case ∇f T k d k < 0, the sufficient decrease criteria for the objective function in 2.5b may not be satisfied.To inspect where no admissible step size α can be found and the feasibility restoration phase has to be invoked, we can consider reducing α and define

2.8
If the trial step size α < α min k , the algorithm turns to the feasibility restoration phase.
By α min k , it is ensured that the algorithm does not switch to the feasibility restoration phase as long as 2.6 holds for a step size α < α k and that the backtracking line-search procedure is finite.Thus, for a trial point x k , the algorithm eventually either delivers a new iterate x k 1 or reverts to the feasibility restoration phase.Once finding a feasible direction, the algorithm still implements the normal algorithm.
Of course, the feasibility restoration phase may not always be possible to find a point satisfying the filter-accepted criteria and the compatible condition.It may converge to a nonzero local minimizer of the infeasibility measure and indicate that the algorithm fails.In this paper, we do not specify the particular procedure for the feasibility restoration phase.Any method for dealing with a nonlinear algebraic system can be used to implement a feasibility restoration phase.

The Algorithm
We are now in the place to state the overall algorithm.
Step 4. Compute the search direction d k from 2.4 .If the system 2.4 is incompatible, go to the feasibility restoration phase in Step 7.
1 If α k < α min k , go to Step 7. Otherwise, compute the new trial point x k 1 x k α k d k . 2 If the conditions 2.6 and 2.7 hold, accept the trial step and go to Step 6, otherwise set x k x k 1 , go to Step 5 3 .
3 In case where no α k make 2.7 hold, if x k 1 can be accepted to the filter, augment the filter by 4 Compute α k 1 τα k , go back to Step 5 1 .
Step 6. Increase the iteration counter k ← k 1 and go back to Step 4.
Step 7. Feasibility restoration phase: by decreasing the infeasibility of V to find a new iterate x k 1 such that 2.4 is compatible.And if 2.7 holds at x k 1 , continue with the normal algorithm in Step 6; if 2.5a and 2.5b hold at x k 1 , augment the filter by and then continue with the normal algorithm in Step 6; if the feasibility restoration phase cannot find such a point, stop with insuccess.
Remark 2.4.In contrast to SQP method with trust-region technique, the actual step does not necessarily satisfy the linearization of the constraints.
Remark 2.5.Practical experience shows that the filter allows a large degree of nonmonotonicity and this can be advantageous to some problems.
Remark 2.6.To prevent the situation in which a sequence of points for which are f type iterative point with V k → ∞ is accepted, we set an upper bound V max on the infeasibility measure function V .
For further specific implementation details of Algorithm 2.3, see Section 4.

Global Convergence
In this section, we give a global convergence analysis of Algorithm 2.3.We refer to the global convergence analysis of Wächter and Biegler 10 in some places.First, state the necessary assumptions.
Assumption A1.Let all iterates x k are in a nonempty closed and bounded set S of Ê n .

Assumption A 2. The functions f and c are twice continuously differentiable on an open set containing S.
Assumption A 3. The matrix W k is positive definite on the null space of the Jacobian ∇c x k and uniformly bounded for all k, and the Lagrange multiply λ is bounded for all k.
Assumptions A1 and A2 are the standard assumptions.Assumption A3 plays an important role to obtain the convergence result and ensures that the algorithm is implementable.
For stating conveniently, we define a set J {i | x i is accepted to the filter}.In addition, sometimes, it is need to revise W k to keep it positive definite by some updating methods such as damped BFGS formula 15 or revised Broyden's method 16 .
From Assumptions A1-A3, we can get where M d , M λ , M W , and m W are constants.
Proof.By the linear system 2.4 and 3.1 ,

3.3
Lemma 3.1 shows that the search direction is a descent direction for the objective function when the trial points are sufficiently close to feasible region.Lemma 3.2.Suppose Assumptions A1-A3 hold, and that there exists an infinite subsequence {x k i } of {x k } such that conditions 2.6 and 2.7 hold.Then Proof.From Assumptions A1 and A2, we know that ∇f is bounded.Hence, it has with 3.1 that there exists a constant M m > 0 such that As 1 − 1/e 1 > 0, we have

3.8
Hence, for c 1 : ηδ 1/e 1 δ/M e 1 m 1−1/e 1 , an integer K and all j 1, 2, . .., 3.9 Since f x K j is bounded below as j → ∞, the series on the right hand side in the last line of 3.8 is bounded, then implies the conclusion.
Proof.Here, we refer to the proof of 2, Lemma 3.3 .If the conclusion is not true, there exists an infinite subsequence {k j } ⊂ {k i } ⊂ J such that for all j and for some > 0. This means that no other V, f pair can be added to the filter at a later stage within the region or with the intersection of this region with for some constants f min ≤ f x k .Now, the area of each of these regions is 1−β γ 2 .Hence, the set S 0 ∪ { V, f | f ≤ M f } is completely covered by at most a finite number of such regions, for any M f ≥ f min .Since the pairs V k j , f k j keep on being added to the filter, f k j tends to infinity when i tends to infinity.Without loss of generality, assume that f k j 1 ≥ f k j for all j is sufficiently large.But 2.5a and 3.11 imply that V k j 1 ≤ βV k j , 3.14 so V k j → 0, which contradicts 3.11 .Then, this latter assumption is not true and the conclusion follows.
Next, we show that if {x k } is bounded, there exists at least one limit point of the iterative points is a first-order optimal point for 2.1 .Lemma 3.4.Suppose Assumptions A1-A3 hold.Let {x k i } be a subsequence with ∇f T k i d k i < − 2 for a constant 2 > 0 independent of k i .Then, there exists a constant α > 0 so that for all k i and α k i < α,

3.15
Proof.From Assumptions A1 and A2, d T ∇ 2 f x d ≤ c f d 2 for some constant c f > 0. Thus, it follows from the Taylor Theorem and 3.1 that , where y 1 denotes some point on the line segment from x k i to x k i α k i d k i .Then the conclusion follows.
Lemma 3.5.Suppose Assumptions A1-A3 hold, and the filter is augmented only a finite number of times, then 3.17 Proof.Since the filter is augmented only a finite number of times, there exists an integer K 1 so that for all iterates {x k } k>K 1 the filter is not augmented.If the claim is not true, there must exist a subsequence {x k i } and a constant > 0 so that And from Lemmas 3.2 and 3.4, it has V x k i ≤ and

3.18
Since f x k i is bounded below and monotonically decreasing for all k ≥ K 2 , one can conclude that lim i → ∞ α k i 0. This means that for k i > K 2 the step size α 1 has not been accepted.So, we can get a α k i < 1 such that a trial point

3.21
for some constant c V , where y 2 denotes some point on the line segment from x k i to x k i αd k i .Since lim i → ∞ α k i 0 and lim i → ∞ V x k i 0 by Lemmas 3.2 and 3.3, it has V x k i 1 < V min for k i sufficiently large, so 3.19 is not true.In case 3.20 , since α k i → 0 for sufficiently large k i , we have α k i ≤ α with α from Lemma 3.4, that is, 3.20 can not be satisfied.Then the conclusion follows.
Lemma 3.6.Suppose Assumptions A1-A3 hold.Let {x k i } be a subsequence of {x k } with ∇f T k i d k i ≤ − 2 for a constant 2 > 0 independent of k i .Then, there exists trial points can be accepted to the filter.
Proof.The mechanisms of Algorithm 2.3 ensure that the first iterate can be accepted to the filter.Next, we can assume that V x k , f x k is acceptable to the kth filter and

3.24
Hence, we have

3.25
The claim then follows from 3.25 .
The last Lemma 3.6 shows, for case V x k > 0, Algorithm 2.3 either accepts a new iterate to the filter or switches to the feasibility restoration phase.For case V x k 0 and the algorithm does not stop at a KKT point, then ∇f T k d k < 0, α min k 0, and the Armijo condition 2.7 is satisfied for sufficiently small step size α k , so an f type iterate is accepted.Hence, the inner loop in Step 5 always terminates in a finite number of trial steps, and Algorithm 2.3 is well defined.Lemma 3.7.Suppose Assumptions A1-A3 hold.Let {x k i } be a subsequence with d k i ≥ for a constant > 0 independent of k i .Then, there exists an sufficient large integer K such that for all k i > K the algorithm can generate some trial points either be accepted to the filter or be f type steps.
If V x k i 0, the f type switching condition 2.6 is true, there must exist iterates for which are f type iterates.For the remaining iterates with V x k i > 0, if < τc 3 V x k i , 3.28 as well as

3.29
Now choose an arbitrary k i ≥ K with V x k i > 0 and define

3.30
Lemmas 3.4 and 3.6 then imply that a trial step size α k i ≤ c 4 satisfies both

3.31
Since α > τα k > τα min k by the definition of α min k , the method does not switch to the feasibility restoration phase for those trial step sizes.Then the claim follows.
Based on the above lemmas, we can give the main global convergence result.

Theorem 3.8. Suppose Assumptions A1-A3 hold, then
that is, there exits a limit point x of {x k } which is a first-order optimal point for 2.1 .
Proof.3.32 follows from Lemmas 3.2 and 3.3.If the filter is augmented a finite number of times, then the claim 3.33 holds from Lemma 3.5.For either case, there exists a subsequence {x k i } so that k i ∈ J for all i.Suppose the conclusion 3.33 is not true, there must exist a subsequence {x k j } of {x k i } such that d k j ≥ for some constant > 0. Hence by Lemmas 3.1 and 3.3, it has ∇f T k j d k j < − 2 and lim i → ∞ V x k j 0 for all k j .Then by Lemma 3.7, when α < min{α, c 2 , c 3 V x k j }, the algorithm can generate a f type iterate, that is, the filter is not augmented, this contradicts the choice of {x k j }, so that 3.33 holds.

Local Convergence
In this section, we show the local convergence of Algorithm 2.3.
Assumption A4.The iterates x k converge to a point x that satisfies ∇c x has full-row rank .

3.34
Assumption A 5. There is a neighborhood Remark 3.9.Under Assumption A4, the point x is a strict local minimum of 2.1 .
Remark 3.10.Under Assumptions A4 and A5, it is well known that with the choice x k 1 x k d k , the sequence {x k } converges q-superlinearly to x and that the convergence is qquadratic if ∇ xx f and ∇ xx c i are lipschitz continuous in a neighborhood of x.That is, for any given ζ ∈ 0, 1 , x j ∈ N x , j k, k 1, . .., and x j 1 x j d j , it has

3.35
We use the proof techniques of local convergence in 6 .In proof, define l ρ x, λ L x, λ ρ/2 c x 2 2 and l ρ x, λ f x ρ/2 V x with ρ is a parameter.

3.37
Proof.Let x ∈ N x .Using Taylor's Theorem and ∇ x l ρ x, λ 0, ∇ λ l ρ x, λ 0, we have with some x , λ on the line segment between x, λ and x, λ

3.38
Obviously, it has

3.40
with a constant r > 0, see 15, Theorem 17.5 .Suppose λ x is Lipschitz continuous and L y is the Lipschitz constant of λ, and ρ > 0 is a constant.Let ρ 0 : max{ρ, 4L 2 y /r} for all x with x − x ≤ σ, it has by continuity

3.41
Thus, we obtain for all x ∈ N σ x by 3.38 , 3.39 , and 3.41

3.42
It is obvious for all s > 0 that

3.44
Since c x 0, l ρ x, λ l ρ 0 x, λ f x , and by 3.44 it has

3.45
Mathematical Problems in Engineering that is the left inequality in 3.36 .For the right inequality in 3.36 , it is obvious from 3.38 that for all x ∈ N x , with a constant t > 0. This proves the inequality 3.36 .
Choose K large enough such that for j ≥ k ≥ K, c x j → 0. We can assume c x < 1.Then, if ρ 0 is large enough, it has from 3.44 and l ρ x, λ l ρ 0 x, λ f x that

3.47
By an analogue of 3.45 holds for l ρ , this proves the left inequality in 3.37 .On the other hand, it has

3.48
Since f x and c x are twice continuously differentiable on closed set S, we have . This shows the right inequality in 3.37 possibly after increasing t.Lemma 3.12.Let x satisfy Assumptions A4 and A5.Then for any ζ ∈ 0, 1 and M ≥ 1, there is an index K such that for all k ≥ K, with 3.49 the points x j 1 , j k, k 1, . .., with α k 1 are acceptable to

3.50
Proof.Let N x as in Assumption A5, ζ 1/2, ρ 0 and N σ x ⊂ N x be given by Lemma 3.11.For all k ≥ K 1 , K 1 is a sufficient large integer, and choose ρ ≥ ρ 0 so large that

3.51
Then, it has from 3.37 in Lemma 3.11

3.53
By V > 0 and continuity there exists 0 < σ 1 < σ such that V x ≤ βV for all x ∈ N σ 1 x , so the point x is acceptable to F K 1 .Since x k → x, x k ∈ N σ 1 x for all k ≥ K 2 > K 1 , K 2 is an integer.By 3.35 , we can choose σ 1 so small that for all k ≥ K 2 , the sequence {x j } j≥k with α k 1 converges linearly with a contraction factor of at least

3.54
Suppose an arbitrary k 2 ≥ K 2 such that 3.49 holds, and set σ 2 : x k − x .By 3.54 , it has x , so by 3.52

3.55
And, by 3.52 and 3.54

3.56
Next, suppose x, λ with x ∈ N σ 2 x is not acceptable to V l , f l then V x > βV l , f x γV x > f l .

3.57
Thus, it has with 3.51 and 3.52

3.58
This shows with 3.55 that

Mathematical Problems in Engineering
This produces a contradiction to 3.56 , so x k 1 is acceptable to F k ∪ V k , f k .Then, the acceptability of V j 1 , f j 1 for F l follows by induction.
Next, we show that the sequence {x j } j≥k with α k 1 can make the sufficient decreasing condition 2.7 hold.Lemma 3.13.Suppose Assumptions A1-A3 hold.Let x satisfy Assumptions A4 and A5 and let K be as in Lemma 3.11.Then for all k > K the sequence {x j } j≥K with α j 1 satisfies f x j 1 ≤ f x j ηα j ∇f T j d j .

3.60
Proof.Suppose α j ∇f T j d j < 0 and α j ∇f T j d j e 1 < −δ V x j e 2 hold.By α j 1, thus ∇f T j d j < − δ 1/e 1 V x j e 2 /e 1 .

3.61
On the other hand, with α j 1 the assertion f x j 1 ≤ f x j η∇f T j d j yields where x on the line segment between x j and x j 1 .
Obviously, we can prove the conclusion if with K > 0 large enough and for all j ≥ k ≥ K the following holds

3.64
Since c j ∇c T j d j 0, d j −∇c T j ∇c j ∇c T j −1 c j .By Assumption A4, ∇c j has full-row rank, there exists c d > 0 such that

3.66
Choose K large enough such that V x j ≤ 1 for all j ≥ k ≥ K.By e 1 > 2e 2 and 3.66 it has for a constant c 5 : 2δ 1/e 1 1 − η /c f c 2 d , we can choose suitable parameters such that the last inequality holds.Thus,

3.68
This completes the proof.
Theorem 3.14.Suppose Assumptions A1-A5 hold.Then, there exists K > 0 such that Algorithm 2.3 takes steps with α k 1 for all k ≥ K, that is,

3.69
In particular, x k converges q-superlinearly to x.If ∇ xx f and ∇ xx c i are Lipschitz continuous in a neighborhood of x then x k converges q-quadratically.
Proof.Since Assumptions A4 and A5 hold, x k → x with x satisfying A4.By Lemmas 3.12 and 3.13, the iterate x k 1 , λ k 1 x k d k , λ x k d k is acceptable to the filter F k ∪ V k , f k and satisfies the sufficient decreasing condition 2.7 .Thus, the trial iterate x k d k , λ x k d k is accepted by the algorithm and it has That is in both cases the algorithm takes the steps with α k 1.And according to Remark 3.10, {x k } converges q-superlinearly to x.

Numerical Experience
In this section, we give some numerical results of Algorithm 2.3.We take some CUTE problems 17 , which are available freely on NEOS, to test our algorithm.The test codes are edited in MATLAB.The details about the implementation are described as follows.b The optimal residual is defined as res max ∇f x k ∇c x k λ , V x k .

4.1
That is, the algorithm terminates when res < .
c W k is updated by damped BFGS formula 15 .
The detailed results of the numerical test on small-scale problems are summarized in Table 2.For comparison purposes, we also give the numerical results of tridimensional linesearch filter solver Tri-filter in Shen et al. 12 in Table 2.The row headers in Tables 2 and 3 are presented in Table 1.
The results in Table 2 indicate that Algorithm 2.3 has a good effect.
In addition, we also test some mid-scale problems.And we compare the numerical results which are summarized in Table 3 in Algorithm 2.3 and SNOPT solver in Gill et al. 18 since no mid-scale problems are given in trifilter solver.
From Table 3, we find the efficiency of Algorithm 2.3 is also improved significantly.From both Tables 2 and 3, in general, the behavior of the proposed algorithm is rather stable.Finally, we may conclude that, as far as our limited computational experience is concerned, the proposed algorithm is well comparable to trifilter solver and SNOPT solver.

1 δ
α from Lemma 3.4 and c 2 , c 3 from Lemma 3.6, it implies with e 2 > argued superlinear local convergence of a filter SQP method.Ulbrich et al. 7 and Wächter and Biegler 8 applied filter technique to interior method and achieved the global convergence to first-order critical point.Wächter and Biegler 9, 10 proposed a line-search filter method and applied it to different algorithm framework.Gould et al. 11 and Shen et al. 12 developed new multidimensional filter technique.

Table 1 :
Description on headers.

Table 2 :
Numerical results of small-scale problems.

Table 3 :
Numerical results of mid-scale problems.