An Interior Point Method for Solving Semideﬁnite Programs Using Cutting Planes and Weighted Analytic Centers

We investigate solving semideﬁnite programs (cid:2) SDPs (cid:3) with an interior point method called SDP-CUT, which utilizes weighted analytic centers and cutting plane constraints. SDP-CUT iteratively reﬁnes the feasible region to achieve the optimal solution. The algorithm uses Newton’s method to compute the weighted analytic center. We investigate di ﬀ erent stepsize determining techniques. We found that using Newton’s method with exact line search is generally the best implementation of the algorithm. We have also compared our algorithm to the SDPT3 method and found that SDP-CUT initially gets into the neighborhood of the optimal solution in less iterations on all our test problems. SDP-CUT also took less iterations to reach optimality on many of the problems. However, SDPT3 required less iterations on most of the test problems and less time on all the problems. Some theoretical properties of the convergence of SDP-CUT are also discussed.


Introduction
We consider the semidefinite programming problem SDP as in 1 : Several interior point methods have been developed for solving SDPs see 2, 4, 5 . When developing our interior point method in this paper, we assume a strictly feasible interior point x 0 is known. There exist methods for finding feasible or near feasible points see [6][7][8] . The algorithm we develop uses cutting planes and weighted analytic centers, and it iteratively refines the feasible region until the weighted analytic center approaches the optimal solution. We call our algorithm SDP-CUT. The cutting plane technique was pioneered by Gomory 9 , Kelley 10 , and also Cheney and Goldstein 11 for solving integer programming problems. An algorithm similar to our SDP-CUT for linear programs LPs was proposed by Renegar in his 1988 paper 12 .
SDP-CUT is implemented using Newton's method and different line search techniques. We found that Newton's method with exact line search is the best implementation of our algorithm. The effects of a weight vector w ∈ R on the algorithm are also studied. A larger weight yields a faster and more accurate solution in theory, but in practice, too large a weight may cause numerical errors. We also experienced numerical errors in computing the Hessian matrices in SDP-CUT, when solving very large problems.
Since finding an interior point for an SDP problem is equivalent to solving another SDP problem, we decided to consider test problems with a known interior point and used that point as a starting point for SDP-CUT. We find SDPT3 to be an ideal method for comparison with SDP-CUT because it is known to be efficient and it allows the user to input a starting point. We found SDP-CUT was closer to the actual solution than SDPT3 for the initial iterations on all our test problems. SDP-CUT seems to slow down during later iterations to reach optimality. On the other hand, SDPT3 took less time on all the problems and less iterations on most of them. All codes were written in MATLAB version 7.9.0.

Weighted Analytic Center
This section discusses weighted analytic center as given in 13 and how to compute it. Other notions of weighted center for semidefinite programming are described in 14 .
Given a weight vector ω ∈ R n , the weighted barrier function for our system of LMIs 1.2 is defined as follows:

2.1
Note that as x approaches the boundary of the feasible region, φ ω x approaches ∞. We assume the set {diag A The function φ ω is analytic and strictly convex over R see 13,15 . The weighted analytic center is defined to be x ac ω argmin {φ ω x | x ∈ R n }. We call x ac 1 the analytic center, where 1 1, 1, . . . , 1 . In the case of linear constraints and some other more general LMIs, each weight pushes the analytic center away from the boundary of the corresponding constraint. The gradient ∇φ ω x and the Hessian ∇ 2 φ ω x are given by the following: for i 1, . . . , n and k 1, . . . , n

2.2
We will use the following specific example, SDP 2.3 , to describe our method: Weighted analytic centers for the LMI system 1.2 can be computed with the WAC-NEWTON algorithm described below. Algorithm 1 .
An efficient way to calculate the search vector s l in WAC-NEWTON is by finding the Cholesky factorization of ∇ 2 φ ω y l and using it to solve the linear system ∇ 2 φ ω y l s l −∇φ ω y l . This is how s l is calculated in the SDP-CUT algorithm to be discussed in the next section and throughout this paper. Figure 1 shows the contours of φ ω x , the effect of the weight vector ω 3, 1 on the barrier function, and the weighted analytic center. The figure uses the feasible region defined by SDP 2.3 and the associated barrier function. The weight of 3 on constraint 1 pushes the analytic center away from the boundary of constraint 1 .
INPUT: point y 0 ∈ int R , weight vector ω ∈ R q , tolerance WTOL > 0, and maximum number of iterations MAX Set l 0 while l < MAX do 1. Compute the direction vector s l − ∇ 2 φ ω y l −1 ∇φ ω y l 2. Compute the Newton decrement d s T l ∇ 2 φ ω y l s l 3. Compute stepsize α l 4. y l 1 y l α l s l if d < WTOL then break while 5. l ← l 1 OUTPUT: x ac ω y l Algorithm 1: WAC-NEWTON.

The SDP-CUT Algorithm
This section describes the development of SDP-CUT. We also discuss WAC-NEWTON * , which implements Newton's method for finding the weighted analytic center for the new system defined in SDP-CUT. The section finishes with algorithms for computing the Newton stepsize CONSTANT, ELS, and BACKTRACKING . Refer again to the example SDP 2.3 . Here, we illustrate one iteration of SDP-CUT. Figure 2 shows the setup. We have the feasible region, an initial point x * x ac 1 , the weighted analytic center of our new feasible region made up of the the cutting constraint and the original LMI constraints. Figure 3 also has the contour lines of the barrier function. The weight on the cutting constraint can be changed from 1 to other larger values.
Given a system of LMI's 1.2 and an objective function as in the primal SDP problem 1.1 , we can numerically solve the problem by iteratively reducing the feasible set. Denote the current feasible region determined by our system of LMIs by R k , and suppose we know a strictly feasible point x * k in R k . Initially, we set R 0 R. We can find a new feasible region R k 1 ⊂ R k , where for all x ∈ R k 1 , c T x ≤ c T x * k . We do this by adding a new constraint cut The is added to c T x * k to ensure that x * k is "strictly feasible" in the new system. Given weight w ∈ R , we define a new barrier function to account for this new constraint: The gradient ∇φ * w x and the Hessian ∇ 2 φ * w x for this new barrier function are given by the following: for i 1, . . . , n and k 1, . . . , n: 3.3 Remark 3.1. Note that in the definition of φ * w x in 3.1 , weight 1, 1, . . . , 1 is used on the original LMI constraints. The weight w is used only on the new cutting constraint in order to push our point toward the optimal solution. see Algorithm 2 .
If successful, SDP-CUT terminates with an optimal solution x * cut and optimal objective function value p * cut . Rather than moving the plane in Step 1 of SDP-CUT, the point x * k could also be moved in the direction of − c to obtain a different starting point x * k − γc for some small γ > 0, instead of x * k . However, this can be problematic as we approach the optimal solution and the feasible region gets small. The point x * k − γc may pass over the entire remaining feasible region and thus fail to be feasible. If instead, we move the plane as originally suggested in SDP-CUT, x * k will always be strictly feasible. Note that since the objective is to maximize c T x and x * k is an interior point, then − c is a feasible direction from x * k . We denote by WAC-NEWTON * , the WAC-NEWTON algorithm applied to φ * w x 3.1 for determining x * k 1 in the SDP-CUT algorithm. WAC-NEWTON * will return the weighted analytic center of the current feasible region, which will be our next iterate for SDP-CUT. In WAC-NEWTON * , the stepsize α l can be computed in a variety of different ways as discussed in the next section.

Line Searches: Computing the Newton Stepsize in WAC-NEWTON * Algorithm
We describe different options for computing the Newton stepsizes in the WAC-NEWTON * algorithm. The algorithm first computes the direction vector s l . The Newton stepsize α l determines how far we should move in the direction of s l from the point y l . The pure Newton's method uses a constant stepsize α l 1. We will refer to this method of choosing the stepsize as "CONSTANT." The CONSTANT algorithm has the advantage of not using computational time in deciding what stepsize to use. The disadvantage of Newton's method with CONSTANT is that it usually results in the need to perform more iterations of WAC-NEWTON * , and it is possible to move out the feasible region. To get α l with exact line search, we solve the one-dimensional optimization problem: Journal of Applied Mathematics 7 where g α φ * w y l αs l . Solving 3.4 is relatively easy and may cut the number of WAC-NEWTON * iterations in SDP-CUT, which are computationally harder in comparison 3.9 The function g α is convex function over R, and the exact stepsize α l can be found using the Newton's method. Equations 3.7 and 3.9 give two different ways of computing g α , when finding the exact stepsize. If one decides to use 3.7 , the first and second derivatives of g α are required and given by

3.11
If 3.9 is used, the derivatives are simply given by 3.13 We will use ELS-MAT to denote the exact line search computations done using 3.7 , and ELS-EIG if computations are done from 3.9 . In addition to constant stepsize 1 and exact line search, another way that was considered in computing the Newton stepsize was backtracking. This method involves starting with a stepsize > 1 and then, decreasing the stepsize until a stopping condition is met see 5,16 . This technique guarantees a sufficient decrease in g α , often starting from α 1, in practice. see Algorithm 3 . It is known that Newton's method converges quadratically close to the solution. In our case, WAC-NEWTON * converges rapidly for starting points that are not close to the boundary of the feasible region R k . Sometimes, we encounter difficulty when the starting point is too close to the boundary. Note that each time we make a new cut, our starting point is near the boundary. It is also the case, when SDP-CUT iterates approaches optimality. When close to the boundary, our direction vector s often is very small, and thus a stepsize of α 1 may not make much progress. So, in these cases, the CONSTANT stepsize may spend many iterations, while making little progress, and each iteration wasting gradient and Hessian computations. Using the ELS algorithm, we find the proper stepsize and move out INPUT: 0 < β < 1 and 0 < γ < 0.5 Set α 1 while g α > g 0 αγ ∇ϕ w y l T s l do α ←− βα OUTPUT: α of this problem area in just one iteration. Consider the plots in Figure 4, which show the situation described above. Here, again, we are using SDP Example 2.3 and the point shown in Figure 2.
As we can see from Figure 4 on the left, the best choice of stepsize is much greater than 1. In Figure 4 on the right, we can see that we have now moved into an area where a stepsize of 1 is reasonable. BACKTRACKING will not help with this problem. BACKTRACKING only helps when the optimal stepsize is less than 1. If we try to adapt BACKTRACKING to help in the circumstance described above, we must initialize the BACKTRACKING stepsize to a large number, which creates two problems. First, when the optimal stepsize is close to 1, or smaller, we will waste time backtracking. Secondly, if we use a large initial stepsize, this may cause BACKTRACKING to send our point outside the feasible region, causing our algorithm to diverge. As it will be further shown in the next section, it does appear that ELS has a positive effect on SDP-CUT, while BACKTRACKING does not. Figure 5 has plots comparing how far CONSTANT and ELS move our point towards the optimal solution at each iteration. The plots highlight the "wasted iteration" problem with CONSTANT, which occurs when the optimal stepsize ismuch great than 1.

Experiment I: SDP-CUT Implementations
We have implemented SDP-CUT in four ways, varying in how the Newton stepsize is computed: CONSTANT, ELS-MAT, ELS-EIG, and BACKTRACKING. We will compare the performance of these four implementations against various test problems. For each problem, the SDP-CUT parameters were set at 10 −6 , STOL 10 −12 , WTOL 10 −10 , w 7 and MAX 100.

Performance as q Varies
Here, we left the number of variables constant at n 2 and varied the number of constraints q from 2 to 10. The matrices have their sizes fixed at m j 5 for all j. For each value of q, 10 SDP problems were randomly generated, and SDP-CUT was used to solve each problem. The number of SDP-CUT iterations which we also call WAC-NEWTON * iterations and run time for SDP-CUT were recorded for each problem. Figures 6 and 7 give plots of the averages with q on the horizontal axis. The vertical axis in Figure 6 contains the total SDP-CUT iterations. In Figure 7, the vertical axis gives the total run time of SDP-CUT in seconds. Figure 6 shows there is not a correlation between the number of LMI constraints and the number of SDP-CUT iterations WAC-NEWTON * iterations needed to find the optimal solution. Figure 6 also shows that ELS-EIG and ELS-MAT both effectively reduce the number of SDP-CUT iterations needed, while BACKTRACKING has about the same number of iterations as CONSTANT. Figure 7 shows that in the case of n 2, ELS-MAT and ELS-EIG run the fastest, while BACKTRACKING runs far slower than anything else. Time increases as q increases due to the gradient and Hessian computations. As we can see in formulas 3.2 and 3.3 , for each constraint, we must perform matrix inverses, dot products, and multiplications. This makes each iteration computationally harder, but does not affect the iterations needed as is seen in Figure 6. BACKTRACKING's time increases the most with q due to the eigenvalue computation needed for the stopping condition. BACKTRACKING performs these extra computations, but SDP-CUT does not benefit from them, and consequently BACKTRACKING has the highest run time.

Performance as n Varies
Here, we left the number of constraints constant at q 3 and varied the number of variables n from 2 to 10. The matrices sizes were set at m j 5 for all j. For each value of n, 10 SDP problems were randomly generated, and SDP-CUT was used to solve each problem. The number of SDP-CUT iterations WAC-NEWTON * iterations and run time for SDP-CUT were recorded for each problem. Figures 8, 9, and 10 are plots of the averages with n on the horizontal axis. The vertical axis contains the SDP-CUT iterations WAC-NEWTON * iterations needed in Figure 8. The vertical gives the run time of SDP-CUT in seconds in Figure 9. There is also a third plot, Figure 10, which shows n on the horizontal axis and the average time required to perform an iteration of SDP-CUT on the vertical axis.   Figure 8 also shows a positive correlation between the number of SDP-CUT iterations WAC-NEWTON * iterations and n. From Figure 9, we see that the exact line search reduces the total time needed for our algorithm, especially as n gets larger. This is important, because as n grows, the number of SDP-CUT iterations and the time required per iteration both increase see Figure 10 . For small n, CONSTANT is quicker per SDP-CUT iteration, but as n grows there is no noticeable difference in the time per iteration between CONSTANT or either ELS implementation. This is because the computations needed for the optimization problem 3.4 becomes so small compared to the computations need to compute ∇φ * w x and ∇ 2 φ * w x .

Performance as m Varies
Here of the averages with m on the horizontal axis. The vertical axis in Figure 11 contains the total SDP-CUT iterations WAC-NEWTON * iterations used to solve the problem. In Figure 12, the vertical axis gives the total run time of SDP-CUT in seconds. There is also a third plot, Figure 13, which shows m on the horizontal axis and the time required to perform an iteration of SDP-CUT on the vertical axis. Figure 11, again, shows that ELS reduces the required number of SDP-CUT iterations WAC NEWTON * iterations . However, as the size of the constraint matrices becomes large, the computations needed for the exact line search grow. Figure 12 shows the ELS-MAT is the fastest in the range of matrix sizes we tested. Unlike our other experiments, CONSTANT beats ELS-EIG. The reason for this is that the time per iteration of ELS-EIG grows very rapidly as is seen in Figure 13. This occurs because the computation of eigenvalues becomes increasingly difficult as the matrix grows in size.
ELS appears to be the best implementation of SDP-CUT. ELS is very effective in reducing the number of WAC NEWTON * iterations needed. ELS-MAT outperformed ELS-EIG in general. The eigenvalues allow for fast computation of the gradient and Hessian of g α once the eigenvalues are obtained see 3.12 and 3.13 . However, the large cost of computing the eigenvalues outweighs this benefit, especially as the matrices become large. ELS-MAT uses a slower means of computing the gradient and Hessian of g α , but has no eigenvalue calculation overhead see 3.10 and 3.11 . The slowest operation needed is matrix inversion. Matrix inversion becomes harder as the matrices get large, but it scales much better than finding the eigenvalues. It is conceivable that very large matrix situations could arise in which both ELS-EIG and ELS-MAT are too expensive; in this case it may be best to use CONSTANT. We were unable to find any cases in our test problems in which BACKTRACKING was beneficial. From now onwards, SDP-CUT will always refer to SDP-CUT with ELS-MAT implementation.

Convergence and the Weight w, Leveling Off Effect
Increasing the weight w will decrease the number of SDP-CUT iterations needed to find the optimal solution. A larger weight will also allow us to get closer to the optimal solution. However, if w becomes too large, numerical errors will prevent convergence. Here, we will discuss the role w plays in the rate of convergence, as well as how large we can make w before SDP-CUT fails. Figure 4 graphically demonstrates the convergence of SDP-CUT on our example SDP 2.3 as the weight w varies. From the plots in Figure 14, we can visibly see that by increasing the weight, SDP-CUT moves toward the optimal solution faster. In attempts to see how fast SDP-CUT converges and what effect the weight has, we performed the following experiment. We ran SDP-CUT on the example and kept track of the iterates x * k at each iteration. Below is a plot of x * − x * k shown on a log scale against iterations as we varied the weight w 1, 3, 5, 7, 9, where x * is the optimal solution. We see in Figure 15 that the distance from the estimate to the actual solution decreases exponentially, and linearly when shown on the log scale. We found that in most cases our implementation of SDP-CUT breaks down around w 10.
As we can see, increasing the weight means quicker convergence. However, Figure 15 only shows the first 10 iterations. As we approach the optimal solution, the rate of convergence slows and "levels off." This is caused by the feasible region becoming very small and by moving the cutting constraint back by , and thus the SDP-CUT iterates stay in approximately the same place. This problem is demonstrated in Figure 16. We see that larger weights allow us to get closer to the optimal solution, but even with a larger weight our convergence "levels off." One way to get closer to the optimal solution before the weighted analytic centers start to "levels off," is to use a smaller . The difficulty with decreasing is if becomes too small, our SDP-CUT iterates come too close to the boundary, giving rise to "numerical problems," especially in computing the Hessian matrices. We found that SDP-CUT worked with 10 −6 , but failed when 10 −7 in example SDP 2.3 . The "numerical problems" are due to the fact that near the boundary, one of the LMI constraint matrices is near-singular, and we need to compute matrix inverses. Figure 17 shows what the "leveling off" effect looks like over the feasible region of SDP problem 2.3 . Notice that the sequence of iterates x * k , from SDP-CUT, reaches a limit before it reaches the boundary of the region.

Experiment II: SDP-CUT versus SDPT3
In this section, the algorithms SDP-CUT and the well-known SDPT3 17 method are compared on a variety of SDP test problems. Since SDPT3 is known to be efficient, we also used it to find the best possible estimation of the actual optimal objective function values. Another reason for using SDPT3 is its flexibility to allow the user to input a starting point.
For the experiment, we tested 20 SDP problems and solved each with SDP-CUT and SDPT3. Each SDP problem was randomly generated, where the dimension n and number of constraints q are random integers on the intervals 2, 20 and 1, 20 respectively. The size m j of each LMI is a random integer on 1, 10  for all of them. For all problems, we used tolerances STOL 10 −3 , WTOL 10 −6 , 10 −7 , and a weight of w 7 was used with SDP-CUT. For the first 10 test problems, the iteration limit was MAX 5. For the remaining 10 problem, the iteration limit is raised to MAX 100. We record the total iterations performed for each algorithm as well as the time taken. The last two columns of the table below show the error in the calculated optimal objective function value compared to the actual value p * actual obtained from SDPT3 with no restriction on the number of iterations. In the table below, the suffixes cut and SDPT3 are used to distinguish between values pertaining to SDP-CUT and SDPT3, respectively. N, T , and E are used to denote the number of iterations, time, and absolute error, respectively. For example, E cut |p * cut − p * actual |. We see in Table 1 that after 5 iterations, SDP-CUT gives a more accurate answer in all the first 10 test problems. However, if allowed to run for more iterations all the algorithms successfully found the optimal solutions, but SDPT3 took less number of iterations to reach optimality in 6 out of the 10 problems. Also SDPT3, took less time in all the 20 test problems. We believe the difference in times is partly because SDPT3 has over the years been optimized to run efficiently, while SDP-CUT is at a beginning of its development. It is interesting to note that SDP-CUT took fewer iterations in 4 out of the 10 problems. We need to do more work to make SDP-CUT more efficient. We noticed that the Hessian matrix ∇ 2 φ * w x in SDP-CUT is prone to errors in the case of problems that are very large. In those cases, ∇ 2 φ * w x loses its positive definiteness and thus its Cholesky factorization is not possible. That is the main reason why we could not run SDP-CUT on the SDP test problems at SDPLIB 18 . For example, SDP-CUT was not successfully in solving truss2 of SDPLIB, due to numerical problems when computing the Hessian matrices. This problem has n 58 variables and q 133 constraints. However, SDP-CUT was successful in solving truss1, but with w 5.

Convergence of SDP-CUT
Let {x * k } be the sequence of estimates for x * generated by SDP-CUT. Let R k be the feasible region after k-cuts. Thus, x * k is the weighted analytic center of R k for all k > 0. Let {∂ k } be the sequence defined by ∂ k c T x * k − x * k 1 . The following theorem shows that when ||c|| and ||x * k − x * k 1 || are both small, then ∂ k is also small. Note that when ∂ k becomes very small, SDP-CUT is no longer making progress on the objective function values. This is especially true in the case of "leveling off" effect discussed in Section 4. This also indicates that our stopping criterion x * k − x * k 1 < STOL does not always mean convergence to the optimal solution. Theorem 6.1. If x * k − x * k 1 < STOL, then ∂ k < c STOL. We see that the weighted analytic center x 1 , x 2 converges to the optimal solution 0, −1 as w → ∞. This example suggests the following conjecture.

Scaling the Vector c
Theorem 6.1 shows the dependence of ∂ k on ||c||. Here, we will consider the effect of scaling c on SDP-CUT. If we scale c by ν > 1 such that we are now trying to optimize νc T x, we do not change the optimal solution x * . We will compare how close SDP-CUT gets when optimizing for various values of ν. Figure 19 is a plot of the results ||x * − x * cut || on the vertical axis and the scaling factor ν is on the horizontal axis. We used example SDP 2.3 with STOL 10 −12 , MAX 100, w 10. We can see in Figure 19 that scaling the objective vector c by a factor ν > 1 does have a positive effect on the accuracy of SDP-CUT.

Conclusion
We have developed and studied a new interior point method, SDP-CUT, for solving SDPs. We found that implementing SDP-CUT using Newton's method with exact line search, in general, gives the best performance. SDP-CUT appears to be very effective in getting into the neighborhood of the optimal solution as seen in problems 1-10 in Table 1. However, there is the "leveling off" effect shown in Figure 16. Future work on SDP-CUT can involve investigating Conjecture 6.2. Implementations that can handle a larger weight w or smaller will result in better performance.
Another area for improvement is the stopping criteria. Currently, the stopping criteria is ||x * k − x * k−1 || < STOL, which lets us know if we are no longer making progress. However, this does not always guarantee optimality of the solution. We have also shown that scaling the objective vector c by some factor ν > 1 could allow SDP-CUT to give a better solution. Also, it is important to investigate how to make all the other components of SDP-CUT work well and efficiently. For example, we would like to find a way to compute the Hessian ∇ 2 φ * w x ik more efficiently and without errors. The way we compute the stepsizes in Newton's method is an area that requires further attention. The exact line search we preferred over the others is probably too expensive and a better way may be possible. These improvements would allow us to test SDP-CUT on larger problems and compare it with other methods. It is also of interest to study what kind of problems SDP-CUT would work well or fail.