Convergence of a short-step primal-dual algorithm based on the Gauss-Newton direction

We prove the theoretical convergence of a short-step, approximate 
path-following, interior-point primal-dual algorithm for 
semidefinite programs based on the Gauss-Newton direction 
obtained from minimizing the norm of the perturbed optimality 
conditions. This is the first proof of convergence for the 
Gauss-Newton direction in this context. It assumes strict 
complementarity and uniqueness of the optimal solution as well as 
an estimate of the smallest singular value of the Jacobian.


The Gauss-Newton direction
The purpose of this paper is to develop a convergence proof for an infeasible interior-point algorithm based on the Gauss-Newton direction introduced in [3].This is the first proof of convergence for this direction although an algorithm based on a projected and scaled Gauss-Newton direction was demonstrated in [1].The approach is novel in that the proof relies only on classical results of nonlinear optimization.As a result, the iterates are not explicitly maintained feasible, nor even positive definite; we rather maintain the weaker condition that the Jacobian of the optimality conditions is full rank.Moreover, our measure of distance to the central path combines feasibility and complementarity.The main result appears in Theorem 3. where b ∈ R m and S n ⊂ R n×n is the vector space of symmetric matrices of order n equipped with the inner product X, Y := trace(XY ).For M, N ∈ R m×n , the inner product is M, N := trace(M t N), and the corresponding (Frobenius) matrix norm is denoted by M = M F = trace(M t M).The operator A is linear and defined as for matrices A 1 , . . ., A m ∈ S n .Finally, S n + represents the cone of positive semidefinite matrices and S n ++ the cone of positive definite matrices.We assume the existence of a point (X 0 , y 0 , Z 0 ) such that If such a point exists, it is well known that both the primal and dual problems have optimal solutions and that the optimal values are equal.We write the perturbed optimality conditions for the primal-dual pair (1.1) and (1.2) as a function of a continuation parameter µ ≥ 0: (1.5) To simplify the statements of the algorithm and of the following results, we define the following central path defining function and merit function, respectively: (1.6b)

S. Kruk and H. Wolkowicz 519
Assumption 1.1.The following assumptions hold throughout the paper.
(ii) The operator A is surjective.
Under Assumption 1.1, for every µ > 0, there is a unique solution in which we denote by (X µ , y µ , Z µ ).This set of solutions is called the central path.The limit point of the central path corresponding to µ → 0 is the solution of the semidefinite pair (1.1) and (1.2).
The algorithm described in this paper approximately follows the central path by attempting to solve F µ (X, y, Z) = 0 for decreasing values of µ.This is common to all path-following algorithms.The novelty of the approach described here is to treat this approximation subproblem as a nonlinear equation and to apply classical tools.
One major difference from standard practice resulting from this point of view is the relation between the iterates and the barrier parameter.The scalar µ is not updated using the iterates as the case (µ = τ( Z, X /n)) usually is, but it is rather reduced by a factor τ < 1 at every step (µ ← τµ).In consequence, the initial point (X 0 , y 0 , Z 0 ) depends on µ 0 rather than the reverse.Another important difference is that no attempt is made to dampen the step to maintain the iterates within the cone of positive definite matrices.The algorithm only maintains the weaker full-rank condition on the Jacobian.
The function F µ is nonlinear.We can find its zeroes by transforming the problem into minimizing the Frobenius norm, namely, min F µ (X, y, Z) to which we apply the Gauss-Newton method: from a well-centered point (X, y, Z) with initial µ > 0, we fix a target τµ, for some τ ∈ (0, 1), and reduce F τµ (X, y, Z) by finding the least squares solution of the Gauss-Newton equation, namely, the least squares solution of for a direction (dX, dy, dZ).We use this direction as the step to obtain the next iterate.For more details, see Algorithm 1.1.We explain later the Given µ 0 > 0 (initial barrier parameter) Given > 0 (merit function tolerance) Find X 0 , y 0 , Z 0 (must satisfy (3.16)) X = X 0 , y = y 0 , Z = Z 0 (initial iterate) µ = µ 0 (initial barrier parameter) Choose 0 < τ < 1 (chosen according to (3.10)) while max{τµ, F τµ (s) } > do Find least squares solution of requirement on the initial point and the choice of τ.We denote the Jacobian F µ (X, y, Z) : and F µ (X, y, Z) is the operator norm on the underlying vector space.
The following result, shown in [3], is stated here for convenience.
Lemma 1.2.Under Assumption 1.1, the Jacobian F µ (X, y, Z) has full column rank for all X, Z ∈ S n ++ .Moreover, it has full column rank at the optimal solution of (1.1) and (1.2).

Merit function and central path
This section describes some relations between the value of our chosen merit function F τµ and the distance of the iterate to the central path.Note that we do not assume that the iterates are primal or dual feasible.Our measure of distance to the central path combines estimates of both infeasibility and complementarity.The section also describes the progress of the Gauss-Newton direction in minimizing F τµ .The results are of a technical nature and used as building blocks of the convergence proof given in Section 3.

S. Kruk and H. Wolkowicz 521
We begin this section with a well-known result about approximations of inverses, often referred to as the Banach lemma.For a proof see [2].
Lemma 2.1.Suppose that M ∈ R n×n and M < 1.Then I − M is nonsingular and Since the Gauss-Newton direction is obtained from an overdetermined system of equations, pseudoinverses allow succinct expressions of the solution.Namely, the least squares solution to To generalize to Gauss-Newton's method some well-known results about Newton's method, we require a bound on the norm of the pseudoinverse.
Lemma 2.2.Suppose that A ∈ R m×n and B ∈ R n×m , where m ≥ n, and assume that BA is nonsingular.Then (2.2)

Proof. Define the singular value decompositions
B , and let Σ A and Σ B be the nonzero diagonal blocks of Σ A and Σ B , respectively.Then (2.3) This implies that all the singular values of Q 1 are 522 Convergence of a Gauss-Newton interior-point algorithm at most 1 and all the singular values of Q −1 1 are at least 1.Therefore, the required bound on the norm of the Moore-Penrose inverse.
From Lemmas 2.1 and 2.2, we can obtain the following result about approximation of pseudoinverses.
Lemma 2.3.Suppose that A is an approximation to the pseudoinverse of A in the sense that I − AA < 1.Then Proof.Consider that I − AA < 1 is the required condition of Lemma 2.1.Therefore, we can write where the first inequality is obtained from Lemma 2.2.
Essentially, from this bound on the norm of approximate pseudoinverses, we can establish a relation between the distance to the central path of an iterate (X, y, Z) and the current value of our merit function F τµ (X, y, Z) .To simplify the result, we first establish Lipschitz continuity of the first derivative.
Lemma 2.4.The operator F τµ (s) is Lipschitz continuous with a constant 1.
S. Kruk (2.13) From the last inequality, we get 524 Convergence of a Gauss-Newton interior-point algorithm Then, from Lemma 2.3 with the identification A = F τµ (s) and A = F τµ (s τµ ) † , and from (2.14), we obtain our second required inequality.For the third inequality (2.8c), we use the fundamental theorem of calculus to express Take norms on both sides to get (2.17) From these relations between the central path and our merit function, we obtain a radius of quadratic convergence to a point on the central path as well as a decrease of the merit function.
Theorem 2.7.Let σ min and σ max be, respectively, the smallest and largest singular values of F τµ (s τµ ).Under Assumption 1.1, there is a δ > 0 such that, for all s c such that s c − s τµ < δ, the Gauss-Newton step is well defined and converges to s τµ at a rate such that Moreover, we can choose δ as long as δ < σ min /2.
Proof.Let δ be small enough so that the hypothesis (2.9) of Lemma 2.5 holds, that is, δ <σ min /2.First, we express the error on the iterate both before and after the step, then, by the fundamental theorem of calculus and the fact that F τµ (s c ) is full column rank (and hence, that (2.23)

Convergence of the algorithm
At this point, we have established all the necessary relations between our merit function and the distance between an iterate and the central path.
The current section describes the convergence of Algorithm 1.1.For easy reference, we repeat the definitions of the two canonical points s µ and s τµ on the central path.They satisfy The general idea of the algorithm is that from an iterate s k , close enough to s µ , we can choose a target on the central path s τµ in such a way that the next iterate s k+1 , obtained from the Gauss-Newton direction, is now close enough to s τµ for the process to be repeated (see Figure 3.1).The proof is in three parts.First, we estimate the distance between two points on the central paths in terms of the required radius of convergence.
(1) If we choose 0 < τ < 1 such that which implies that s µ is within half the radius of quadratic convergence of s τµ .

.5)
In this case, s µ is within half the radius of guaranteed constant decrease of the merit function in (2.27).
Proof.First, note that a straightforward calculation based on the definition of s µ in (3.1) yields By inequality (2.8d), Let τ satisfy (3.2) to get which, by Theorem 2.7, yields one half of the quadratic radius of convergence.The proof of part (2) of the lemma is similar.
We now estimate the distance to the new target after a Gauss-Newton step.
Lemma 3.2.Let σ min and σ max be, respectively, the smallest and largest singular values of F τµ (s τµ ).Let s µ and s τµ satisfy (3.1).Suppose that the point s c is well centered in the sense that and choose τ to satisfy

S. Kruk and H. Wolkowicz 529
as in Lemma 3.1.Then, after one Gauss-Newton step, the new point s + will be within half the radius of convergence of s τµ , that is, Moreover, the merit function is reduced to Proof.By hypothesis and by Lemma 3.1, Therefore, which is within the radius of quadratic convergence of s τµ .After one Gauss-Newton step, by Theorem 2.7, we get Therefore, the new point is within half the radius of convergence of s τµ , and the procedure can be repeated.The constant reduction of the merit function follows from Corollary 2.8.
We now present the main result of the paper, the convergence proof for Algorithm 1.1.Theorem 3.3.Suppose there exist a tolerance > 0, an initial barrier parameter µ 0 > , and Z 0 , X 0 ∈ S n ++ such that s 0 = (X 0 , y 0 , Z 0 ) is a well-centered starting 530 Convergence of a Gauss-Newton interior-point algorithm point within half the quadratic convergence radius of s µ 0 in Theorem 2.7: Suppose, moreover, that s 0 is within half the radius of guaranteed constant decrease of the merit function given in Corollary 2.8: where 0 < σ min (resp., σ max ) is smaller than the smallest (resp., larger than the largest) singular value of F ωµ 0 (s ωµ 0 ), for all /µ 0 < ω < 1.
If τ (small) is chosen satisfying (3.10), that is, and τ ≥ max{0, 1 − α}, 0 < τ < 1, then Algorithm 1.1 produces a sequence s k converging to s, which is -optimal in the following sense: and the number of iterations k depends on τ: (1) if 0 < τ ≤ 1/2, the number of iterations is which results in the desired bound on s k − s τ k µ 0 if F τ k µ 0 (s k ) ≤ .From the constant decrease guarantee, we get (by adding and subtracting the multiple of the identity in the third term in the norm) (3.24) We will bound each of the two terms in brackets in the last line above by /2.From here onward, log will indicate log 2 .For the first term, we get where x is the ceiling operator.It produces the smallest integer larger than or equal to x.For the second term in the brackets, we use the form 532 Convergence of a Gauss-Newton interior-point algorithm in (3.23) while considering the case τ ≤ 1/2.We get For the case τ > 1/2, we use the form (3.24) to get

.30)
where the direction of the inequality changed since τ < 1.Therefore, we can obtain F τ k µ 0 (s k ) ≤ by choosing k using each of the lower bounds given in (3.25), (3.28), and (3.30).This guarantees that we are close to the central path.
We finally need to be close to optimality, µ 0 τ k ≤ .This is equivalent to

Towards a long-step algorithm
The algorithm, as presented, is not practical.The assumptions that the initial iterate satisfies the conditions of Theorem 3.3 and that we need an estimate of the smallest singular values are significant.But the singular values are used, throughout the paper, only to show the existence of a radius of convergence.A practical version of the algorithm would more likely try some value for τ, compute the step and the value of the merit function, then reduce τ if the merit function reduction is not sufficient.Since we have shown the existence of a radius where the merit function is halved, (3.12), such a scheme will necessarily converge.We presented the algorithm without these practical encumbrances to clarify the presentation.The Gauss-Newton direction for solving semidefinite programs was introduced in [3] without a proof of convergence, but with experimental results that warranted more research.Then, in [1], a scaled version of the direction was used in an algorithm shown to be polynomially convergent.The algorithm and the convergence proof presented in this paper are new in that the direction is used without any scaling and the algorithm never explicitly forces the iterates to remain within the positive definite cone.Moreover, the measure used to quantify the distance of the iterates to the central path (1.6b) estimates both the infeasibility and the complementarity and seems perfectly adapted to infeasible interiorpoint algorithms.It would be interesting to see how this measure can be used for different directions.
The dependence on the smallest singular value of the Jacobian for choosing τ, though unsurprising in the context, should be relaxed to some other more easily estimated function of the data (possibly some condition measure [4]).But the ultimate goal of this avenue of research is to establish polynomial convergence of an infeasible algorithm using long steps, that is, not restricted to a narrow neighbourhood of the central path.Both experimental data and preliminary results suggest the possibility of such an algorithm.
Take norms on both sides and use the Lipschitz continuity of F τµ to get Corollary 2.8.Let σ min and σ max be, respectively, the smallest and largest singular values of F τµ (s τµ ).Under Assumption 1.1, there is a δ > 0, where, for all s c such that s c − s τµ < δ, Proof.Consider inequality (2.8d) at the point s + to obtainF τµ s + ≤ 2σ max s + − s τµ .Therefore, we need (s c − s τµ ) < δ, with δ as defined in (2.27), to obtain the required decrease.
Proof.First, we note, by Corollary 2.6, that the required constant σ min exists.By Lemma 2.5,