A Projected Hessian Gauss-newton Algorithm for Solving Systems of Nonlinear Equations and Inequalities

Solving systems of nonlinear equations and inequalities is of critical importance in many engineering problems. In general, the existence of inequalities in the problem adds to its difficulty. We propose a new projected Hessian Gauss-Newton algorithm for solving general nonlinear systems of equalities and inequalities. The algorithm uses the projected Gauss-Newton Hessian in conjunction with an active set strategy that identifies active inequalities and a trust-region globalization strategy that ensures convergence from any starting point. We also present a global convergence theory for the proposed algorithm. 1. Introduction. The solution of systems of nonlinear equations and inequalities is the goal of many scientific and engineering applications. In constrained optimization problems this task is often required to ensure that the feasibility set is not empty. This is critical specially in algorithms that are based on the assumption that the feasibility region is not empty, for example, interior-point methods, (cf. El-Bakry, Tapia, Tsuchiya, and Zhang [6]). In this paper, we provide a new algorithm for effectively solving such systems. In this paper, we consider the problem of finding a solution x that satisfies the following set of nonlinear equalities and inequalities:


Introduction.
The solution of systems of nonlinear equations and inequalities is the goal of many scientific and engineering applications.In constrained optimization problems this task is often required to ensure that the feasibility set is not empty.This is critical specially in algorithms that are based on the assumption that the feasibility region is not empty, for example, interior-point methods, (cf.El-Bakry, Tapia, Tsuchiya, and Zhang [6]).In this paper, we provide a new algorithm for effectively solving such systems.
In this paper, we consider the problem of finding a solution x that satisfies the following set of nonlinear equalities and inequalities: where x ∈ n , F = (f 1 ,f 2 ,...,f m ) T , and G = (g 1 ,g 2 ,...,g p ) T .We assume that m ≤ n and no restriction is imposed on p.The functions f i , i = 1,...,m and g j , j = 1,...,p are assumed to be twice continuously differentiable.We define the indicator matrix W (x) ∈ p×p , whose diagonal entries are Using this matrix, problem (1.1) can be transformed to the following equivalent equality constrained optimization problem: minimize W (x)G(x) 2 2 , subject to F(x) = 0. (1. 3) The matrix W (x) is discontinuous; however, the function W (x)G(x) is Lipschitz continuous and the function G(x) T W (x)G(x) is continuously differentiable (Dennis, El-Alem, and Williamson [3]).Our proposed algorithm is iterative.We start with a point x 0 ∈ n .Then the algorithm generates a sequence of iterates {x k }.One iteration of the algorithm produces, at the point x k , a point x k+1 = x k + s k that is a better approximation to a solution x than x k .We measure the progress towards a solution using the 2 exact penalty function associated with problem (1.3); Φ(x; r ) = G(x) T W (x)G(x) + r F (x) T F(x), (1.4) where r > 0 is a parameter.Our algorithm uses the trust-region globalization strategy to ensure that, from any starting point x 0 , the sequence of iterates {x k } generated by the algorithm converges to a solution of problem (1.3).The basic idea of the trustregion algorithms is as follows.Approximate problem (1.3) by a model trust-region subproblem.The trial step is obtained by solving this subproblem.The trial step is tested and the trust-region radius is updated accordingly.If the step is accepted, then we set x k+1 = x k + s k .Otherwise, the radius of the trust-region is decreased and another trial step is computed using the new value of the trust-region radius.Detailed description of this algorithm is given in Section 2.
The following notations are used throughout the rest of the paper.Subscripted functions are function values at particular points; for example, F k = F(x k ), G k = G(x k ) and so on.However, the arguments of the functions are not abbreviated when emphasizing the dependence of the functions on their arguments.The ith component of a vector V (x k ) is denoted by either (V k ) i or V i (x k ).We use the same symbol 0 to denote the real number zero, the zero vector, and the zero matrix.Finally, all norms used in this paper are 2 -norms.

Algorithmic framework.
This section is devoted to present the detailed description of our trust-region algorithm for solving problem (1.1).The reduced Hessian approach is used to compute a trial step s k .In particular, the trial step s k is decomposed into two orthogonal components; the normal component u k and the tangential component v k .The trial step s k has the form s k = u k +v k = u k +Z k vk , where Z k is a matrix whose columns form an orthonormal basis for the null space of ∇F T k .The matrix Z(x) can be obtained from the QR factorization of ∇F(x) as follows: where Y (x) is the matrix whose columns form an orthonormal basis for the column space of ∇F(x) and R(x) is an m × m upper triangular matrix.When ∇F(x) has full column rank, the matrix Y (x) ∈ n×m , Z(x) ∈ n×(n−m) , and R(x) is nonsingular.Using this factorization, the first-order necessary conditions for a point x to be a stationary point of problem (1.3) can be written in the following form: We obtain the normal component u k by solving the following trust-region subproblem: for some σ ∈ (0, 1), where ∆ k is the trust-region radius.Given the normal component u k , we compute the tangential component v k by solving the following trust-region subproblem: Once the trial step is computed, we test it to determine whether it is accepted.To do that, we compare the actual reduction in the merit function in moving from x k to x k + s k versus the predicted reduction.We define the actual reduction as (2.5) The predicted reduction in the merit function is defined to be After computing s k , the parameter r k is updated to ensure that Pred k ≥ 0. Our way of updating r k is presented in Step 3 of Algorithm 2.1.
After updating r k , the step is tested for acceptance.If Ared k / Pred k < η 1 , where η 1 ∈ (0, 1) is a small fixed constant, then the step is rejected.In this case, the radius of the trust-region ∆ k is decreased by setting ∆ k = α 1 s k , where α 1 ∈ (0, 1), and another trial step is computed using the new trust-region radius.
If Ared k / Pred k ≥ η 1 , then the step is accepted and the trust-region radius is updated.Our way of updating ∆ k is presented in Step 4 of Algorithm 2.1.
Finally, we use the first-order necessary conditions (2.2) to terminate the algorithm.The algorithm is terminated when, for some ε tol > 0, Now, we present the algorithmic framework of our proposed method.
Step 2 (compute a trial step). If Step 3 (update the parameter r k ). (a Step 4 (test the step and update Step 5. Set k = k + 1 and go to Step 1. It is worth mentioning that either direct or iterative methods can be used to solve the trust-region subproblems arising in the above algorithm.Recently Abdel-Aziz [2] proposed a new method for computing the projection using implicitly restarted Lanczos method which proved to be very successful in solving nonlinear structural eigensystems (see Abdel-Aziz [1]).

Convergence analysis.
In this section, we present our global convergence theory.However, for the global convergence results to follow, we require some assumptions to be satisfied by the problem.Let {x k } be the sequence of points generated by the algorithm and let Ω be a convex subset of n that contains all iterates x k and x k +s k , for all trial steps s k examined in the course of the algorithm.On the set Ω, the following assumptions are imposed.

Important lemmas.
We present some important lemmas that are needed in the subsequent proofs.Lemma 3.1.Let assumptions (A1), (A2), and (A3) hold, then at any iteration k, where K 1 is a positive constant independent of k.
Proof.The proof is similar to the proof of [4, Lemma 7.1].
In the following lemma, we prove that, at any iteration k, the predicted reduction in the 2-norm of the linearized constraints is at least equal to the decrease obtained by the Cauchy step.

Lemma 3.2. Let assumptions (A1), (A2), and (A3) hold, then the normal component u k of the trial step satisfies
where K 2 is a positive constant independent of k.

Lemma 3.3. Let assumptions (A1), (A2), and (A3) hold. Then the predicted decrease obtained by the trial step satisfies
where K 2 is as in Lemma 3.2.
Proof.The proof follows directly from our way of updating the parameter r k and Lemma 3.2.
The following lemma provides a lower bound on the decrease obtained by the step
Proof.From the way of computing the step v k , it satisfies the fraction of Cauchy decrease condition.Hence, there exists a constant K 3 > 0 independent of k, such that (3.5) See Moré [7] for a proof.We also have, The proof follows from inequalities (3.5) and (3.6).
Lemma 3.5.At any iteration k, W k+1 = W k + D k , where D(x k ) ∈ p×p is a diagonal matrix whose diagonal entries are Proof.The proof is straightforward.
Lemma 3.6.Let assumptions (A1) and (A3) hold.At any iteration k, there exists a positive constant K 4 independent of k, such that where D k ∈ p×p is the diagonal matrix whose diagonal entries are given in (3.7).
The following two lemmas give upper bounds on the difference between the actual reduction and the predicted reduction.
Proof.The proof follows directly from Lemma 3.7, the fact that r k ≥ 1, and the problem assumptions.
The following lemma gives a lower bound on the predicted decrease obtained by the trial step.Lemma 3.9.Let assumptions (A1), (A2), and (A3) hold, then there exists a constant K 7 that does not depend on k such that, for all k, where K 3 is as in Lemma 3.4.
Proof.The proof follows directly from the definition of Pred k and Lemmas 3.1 and 3.4.Now, we prove several results that are crucial to our global convergence theory.The following lemma demonstrate that as long as F k > 0, an acceptable step must be found.In other words, the algorithm cannot loop infinitely without finding an acceptable step.To state this result, we need to introduce one more notation that is used throughout the rest of the paper.The jth trial iterate of iteration k is denoted by k j .Lemma 3.10.Let assumptions (A1), (A2), and (A3) hold.If F k ≥ ε, where ε is any positive constant, then an acceptable step is found after finitely many trials.
Proof.The proof is by contradiction.Assume that F k ≥ ε > 0 and there is no finite j that satisfies the condition Ared k j / Pred k j ≥ η 1 .Hence, for all j, we have

.14)
A contradiction arises if we let ∆ k j goes to zero.This completes the proof.
Lemma 3.11.Let assumptions (A1), (A2), and (A3) hold.For all trial steps j of any iteration k generated by the algorithm, ∆ k j satisfies where K 8 is a positive constant that does not depend on k or j.
Proof.Consider any iterate k j .If the previous step was accepted; that is, if j = 1, then ∆ k ≥ ∆ min .If we take ν = sup x∈Ω F(x) , then we have Now, assume that j > 1.We have from the way of updating the trust-region radius, δ k j ≥ α 1 s k j−1 .But the trial step s k j−1 was rejected.Hence, This contradiction implies that it must be the case that s k j−1 ≥ min{(1 − η 1 )K 2 /4K 6 , 1} F k .Therefore, we have Inequalities (3.16) and (3.18) prove the lemma.
Lemma 3.12.Let assumptions (A1), (A2), and (A3) hold.Then, for any iterate k j at which the parameter r k j is increased, where K 9 is a positive constant that does not depend on k or j.

Lemma 3.13. Let assumptions (A1), (A2), and (A3
, where ε 0 is a positive constant, then there exists two constants K 10 and K 11 that depend on ε 0 but do not depend on k or j, such that, if Proof.The proof is similar to the proof of Lemmas 7.7 and 7.8 of Dennis, El-Alem, and Maciel [4].

Global convergence theorem.
In this section, we prove our global convergence result for Algorithm 2.1.The following lemma proves that for the iteration sequence at which r k is increased the corresponding sequence of F k converges to zero.Lemma 3.14.Let assumptions (A1), (A2), and (A3) hold.If r k → ∞, then where {k i } indexes the iterates at which the parameter r k is increased.
Proof.The proof follows directly from Lemma 3.12 and assumption (A3).
In the following theorem, we prove that the sequence { F k } converges to zero.Proof.Suppose that there exists an infinite subsequence of indices {k j } indexing iterates that satisfy F k j ≥ ε.From Lemma 3.10, there exists an infinite sequence of acceptable steps.Without loss of generality, we assume that all members of the sequence {k j } are acceptable iterates.
We consider two cases.If {r k } is bounded, then there exists an integer k such that for all k ≥ k, r k = r .Therefore, from Lemmas 3.3 and 3.11, we have for any k ∈ {k j } and k ≥ k, Since all the iterates of {k j } are acceptable, then for any k ∈ {k j }, we have Hence, from the above two inequalities, we have This gives a contradiction with the fact that {Φ k } is bounded when {r k } is bounded.Now, consider the case when {r k } is unbounded.Hence, there exists an infinite number of iterates {k i } at which the parameter r k is increased.From Lemma 3.14, for k sufficiently large, the two sequences {k i } and {k j } do not have common elements.Let k and k be two consecutive iterates at which the parameter r k is increased and k < k j < k, where k j ∈ {k j }.The parameter r k is the same for all iterates k that satisfy k ≤ k < k.Since all the iterates of {k j } are acceptable, then for all k j ∈ {k j }, (3.26) From Lemmas 3.3 and 3.11 and inequality (3.26), we have If we sum over all iterates that lie between k and k, we have k−1 Therefore, Since r k → ∞, then for k sufficiently large, we have But this contradicts Lemma 3.14.Hence, in both cases, we have a contradiction.Thus, the supposition is wrong and the theorem is proved.
Theorem 3.16.Let assumptions (A1), (A2), and (A3) hold.Then the iteration sequence satisfies lim inf Proof.The proof is by contradiction.Suppose that Z T k ∇G k W k G k > ε holds for all k generated by the algorithm.
Assume that there exists an infinite subsequence {k i } such that F k i > α∆ k i , where α is a positive constant.For later use, we take α = K 10 .Since F k → 0, we have lim (3.32) Consider any iterate k j ∈ {k i }.There are two cases to consider.First, consider the case where the sequence {r k } is unbounded.We have for the rejected trial step j − 1 of iteration k, F k > K 10 ∆ k j = α 1 K 10 s k j−1 .Using inequalities (3.3) and (3.9) and the fact that the trial step s k j−1 was rejected, we have (3.33) Because {r k } is unbounded, there exists an iterate k sufficiently large such that for all k ≥ k, we have This implies that for all k ≥ k, From the way of updating the trust-region radius, we have This gives a contradiction.So ∆ k j cannot go to zero in this case.Second, consider the case when the sequence {r k } is bounded.There exists an integer k and a constant r such that for all k ≥ k, r k = r .Let k ≥ k and consider a trial step j of iteration k, such that F k > K 10 ∆ k j , where K 10 is as in Lemma 3.13.
If j = 1, then from our way of updating the trust-region radius, we have ∆ k j ≥ ∆ min .Hence ∆ k j is bounded in this case.If j > 1 and for l = 1,...,j, then for the rejected trial step j − 1 of iteration k, we have Hence ∆ k j is bounded in this case too.If j > 1 and (3.37) does not hold for all l, there exists an integer m such that (3.37) holds for l = m + 1,...,j and for l = 1,...,m.As in the above case, we can write But from our way of updating the trust-region radius, we have Now, using (3.40),Lemma 3.13, and the fact that the trial step s k m is rejected, we can write

.44)
This implies that ∆ k j is bounded in this case too.Hence ∆ k j is bounded in all cases.
This contradiction implies that for k j sufficiently large, all the iterates satisfy F k ≤ K 10 ∆ k j .This implies, using Lemma 3.13, that {r k } is bounded.Letting k j ≥ k and using Lemma 3.13, we have This implies that the radius of the trust-region is not bounded below.
If we consider an iteration k j > k and the previous step was accepted; that is, j = 1, then ∆ k 1 ≥ ∆ min .Hence ∆ k j is bounded in this case.
Assume that j > 1, that is, there exists at least one rejected trial step.For the rejected trial step s k j−1 , using Lemmas 3.8 and 3.13, we must have (3.47) From the way of updating the trust-region radius, we have

.48)
Hence ∆ k j is bounded.But this contradicts (3.46).The supposition is wrong.This completes the proof of the theorem.
From Theorems 3.15 and 3.16, we conclude that, given any ε tol > 0, the algorithm terminates because Z T k ∇G k W k G k + F k < ε tol , for some k finite.

Concluding remarks.
We have introduced a new trust-region algorithm for solving nonlinear systems of equalities and inequalities.Our active set is simple and natural.It is similar to Dennis, El-Alem, and Williamson's approach for treating the active constraints [3].At every iteration, the step is computed by solving two simple trustregion subproblems similar to those for unconstrained optimization.
Our theory proves that the algorithm is globally convergent in the sense that, in the limit, a subsequence {k j } of the iteration sequence generated by the algorithm satisfies the first-order conditions (2.2).We point out here that our convergence result only gives first-order stationary-point convergence.This means that there may be a case where a subsequence of the iteration sequence satisfies (2.2) but does not necessarily satisfies (1.1).This can only happen when the corresponding sequence of matrices {Z T k j ∇G k j W k j } does not have full column rank in the limit.