An Efficient Approach to Solve the Large-Scale Semidefinite Programming Problems

Solving the large-scale problems with semidefinite programming SDP constraints is of great importance in modeling and model reduction of complex system, dynamical system, optimal control, computer vision, and machine learning. However, existing SDP solvers are of large complexities and thus unavailable to deal with large-scale problems. In this paper, we solve SDP using matrix generation, which is an extension of the classical column generation. The exponentiated gradient algorithm is also used to solve the special structure subproblem of matrix generation. The numerical experiments show that our approach is efficient and scales very well with the problem dimension. Furthermore, the proposed algorithm is applied for a clustering problem. The experimental results on real datasets imply that the proposed approach outperforms the traditional interior-point SDP solvers in terms of efficiency and scalability.


Introduction
Semidefinite programming SDP is a technique widely used in modeling of complex systems and some important issues in computer vision and machine learning.Examples of application include model reduction 1 , modeling of nonlinear systems 2, 3 , optimal control 4 , clustering 5-7 , robust Euclidean embedding 8 , kernel matrix learning 9 , and metric learning 10 .It can be seen in 5, 7 that SDP relaxation can produce more accurate estimates than spectral methods.However, the SDP optimization suffers extremely high time complexity.Current SDP solvers utilizing interior-point methods scale as O n 4.5 or worse Mathematical Problems in Engineering 1, 6, 10 , where n is the number of rows or columns of the semidefinite matrix.As a result, SDP can only run on small datasets.It is meaningful to propose a scalable SDP solver.
Inspired by column generation 11 , which is a classical technology in optimization literature for solving large scale problem, Shen et al. proposed matrix generation in 10 for solving a semidefinite metric learning problem.By using matrix generation, the particular semidefinite program in metric learning was converted into a sequence of linear programs, and it is possible to use the well-developed linear programming technology.
In this paper, we propose the matrix generation-based iteration approach to solve general SDP optimization problems.At each iteration, the exponentiated gradient EG algorithm 12 is applied to solve the special structure subproblem.Experiments are also carried out on some real datasets to evaluate the proposed method's efficacy and efficiency.
The method proposed here can be seen as an extension of column generation to solve SDP problems.The proposed matrix generation method also has the drawback of column generation.It converges slowly of one wants to achieve high accuracy.This is the so-called "tailing effect".In practice, for many applications, we do not need a very accurate solution.Typically, a moderately accurate solution suffices.On the other hand, the proposed method can also be considered as a generalization of EG to the matrix case.EG is used to solve problems with a vector optimization variable, and the vector must be on a simplex.Here, we generalize EG in the sense that the proposed method solves optimization with a semidefinite matrix whose eigenvalues are on a simplex.
We present our main results in the next section.

Notation
The following notations will be used throughout the paper:

Solving Optimization Problem with SDP Constraints Using Matrix Generation
We are interested in the following general convex problem: Here, f • is a convex function defined in Ë .We need to extend the Fenchel conjugate 13 to matrices.For a function f is the lower semicontinuous and convex function Now, we impose the following two conditions on f • : ii the gradient f x is continuous and monotonically increasing at each direction.
In other words, f • is differentiable and strictly convex.In this case, the Fenchel conjugate is also called Legendre transform and has the following important result: By analogy, we can define a Fenchel conjugate for a matrix as in the vector space Ê n .f * • defined in 2.4 must be lower semicontinuous and convex, because f * • is a supremum of linear continuous functions of U.
We are ready to reformulate our problem 2.1 .It is proved in 10 that a p.s.d.matrix can always be decomposed as a linear convex combination of a set of rank-one matrices, we decompose X ∈ Ë into a convex combination of a set of dyads

2.6
Note that here, θ and X are the optimization variables.Although X is redundant in 2.6 , we need to keep it to arrive at the important Lagrange dual problem later.
This above problems is still very hard to solve, since it has nonconvex rank constraints, and the variable J is indefinite, because there are an indefinite number of rank-one matrices.However, if we somehow know matrices Z j j 1, . . .a priori, then all the constraints imposed on Z j j 1, . . .can be dropped, and the problem becomes a linear program.
We apply matrix generation to solve the problem.Matrix generation is an extension of column generation to nonpolyhedral semidefinite constraints for solving difficult large-scale optimization problems 10 .It is a method to avoid considering all variables of a problem explicitly, and only a small subset of the entire variable set is considered.Once the problem with the small subset variables is solved, the question of if there are any other variables that can be included to improve the solution should be answered.For convex programs, the primal variables correspond to the dual constraints.The dual can be used to solve the above subproblem.
The Lagrangian is with p 0, and the dual is where we have defined ϕ with Essentially the dual is max We now only consider a small subset of the variables in the primal; that is, only a subset of Z denoted by Z is used.The LP solved using Z is usually termed restricted master problem RMP .Because the primal variables correspond to the dual constraints, solving RMP is equivalent to solve a relaxed version of the dual problem 10 .If all the constraints that we have not added to the dual problem are kept, solving the restricted problem is equivalent to solve the original problem.Otherwise, if there exists at least one constraint that is violated, the violated constraints correspond to variables in primal that are not in RMP.Adding these variables to RMP leads to a new RMP problem that needs to be reoptimized.Here, we have a iterative algorithm that either finds a new Z such that where t is the solution of the current restricted problem, or it guarantees that such a Z does not exist.To make convergence fast, we find the one by solving the following optimization problem:

2.12
The optimal value Z is exactly vv , where v is the eigenvector of U that corresponds to the smallest eigenvalue.For the time being, Z is added to the dyads, and the only variable to estimate is θ.The primal can be writhen as We generalize the EG algorithm to the matrix case and use it to solve the above problem.At optimality, because of 2.3 , we have the connection between the primal and dual variables U f X .

2.14
We also know that at least one of the dual constraints takes the equality at optimality, which enable us to obtain t t min t can be used as the iteration stopping criteria.

The Matrix Generation
We propose the implementation of the algorithm for the general convex problem 2.1 .To be more general, we extend the SDP constraint where k is a positive integer.The detailed implementation of the algorithm is proposed in Algorithm 1.We can observe, at each iteration, that a new matrix is generated, hence the algorithm is named matrix generation.Besides, at each iteration, only the most significant eigenvalue and its corresponding eigenvector are needed.This makes the algorithm efficient.

The EG Algorithm
In 12 , it is shown that EG style updates can converge very quickly compared to other methods.It has been successfully applied to structured prediction problems such as structured support vector machines and conditional random field.We generalize the EG algorithm to matrix to solve the special structure convex optimization problem in MG algorithm.We first give a brief introduction to the EG algorithm 12, 14, 15 .The EG algorithm efficiently solves the convex optimization problem 16

17
Initialize: i The maximum number of integrations J max .
ii The pre-set tolerance value ε e.g., 10 −5 .iii t * 0 0. Iteration 1: While iteration J 2, 3, . . ., J max do 1 If abs t * J − t * J−1 < ε then break problem solved ; 2 Find a new Z J by finding the eigenvector v corresponding to the smallest eigenvalue of , by solving the following spectral structure problem by EG algorithm as described in Section 2.3.3: where f θ is obtained from f X by substituting X with Output: The p.s.d.matrix X J j 1 θ j Z j and minimal value f X .

Evaluation of the Algorithm
In this section, we evaluate the convergence, running speed, and memory consumption of the algorithm by solving min where X F ij X 2 ij is the Frobenius norm, K is a n × n symmetric p.s.d.matrix, and k is a positive integer.This problem is important in the issue of the affinity matrix normalization of spectral clustering 17 .
The above problem is approximate to min where δ is a scalar, for example, δ 10 3 .Before reporting the results, we compare the proposed algorithm with the convex optimization solver, namely, SDPT3 18 which is used as internal solvers in the disciplined convex programming software CVX 19, 20 .The CVX toolbox can solve standard problems ii The maximum number of integrations i max While iteration i 1, 2, . . ., i max do 1 Generate the sequence {θ i } with: where τ i is step-size, which can be determined by τ i 2 log J/L f 1/ √ i following 15 , some stopping criteria are met.Output: θ i and f θ .

Algorithm 2:
The EG algorithm: solving the special structure problem shown in 2.17 .such as linear programs LPs , quadratic programs QPs , second-order cone programs SOCPs , and semidefinite programs SDPs .

Convergence
We randomly generate a 50 × 50 p.s.d.matrix K in 2.18 ; let k be 5.In Figure 1, we plot the optimal value obtained by the proposed algorithm at each iteration the blue curve .The red dash line shows the ground truth obtained by directly solving the original problem in 2.16 using the CVX toolbox.We can see that the algorithm converges to the near-optimal solutions quickly.

Running Time
We solve the problem in 2.16 using the proposed algorithm and the CVX toolbox, respectively, and the running time is shown in Figure 2. The blue curve is the running time of the proposed algorithm versus the matrix dimension, and the red curve is the running time of CVX versus the matrix dimension.We can see that the running time of CVX increases quickly with the matrix dimension and the running time of the proposed algorithm is a fraction of the CVX and scales very well with the matrix dimension.The proposed algorithm has a clear advantage on computational efficiency.One reason is that at each iteration, only the most significant eigenvalue and its corresponding eigenvector are needed in the proposed algorithm.

Memory Consumption
Figure 3 shows the memory consumption of the proposed algorithm and the CXV toolbox.The blue curve is the approximate memory consumption of the proposed algorithm versus the matrix dimension, and the red curve is the approximate memory consumption of CVX versus the matrix dimension.It can be seen that the memory consumption of CVX increases quickly with the matrix dimension n and the memory consumption of the proposed algorithm is a fraction of the CVX and scales very well with the dimension.In summary, the proposed algorithm can converge to the near-optimal solutions quickly and scale very well with the dimension, which are very important for solving largescale SDP problems.As a result, the proposed method is potential to be applied to computer vision to deal with large matrix of 3D visual data 21 and clustering in data analysis.In next section, we will show how to apply the proposed method to clustering problems.

The Application
The problem of partitioning data points into a number of distinct sets, known as the clustering problem, is one of the most fundamental and significant topics in data analysis and machine learning.From an optimization viewpoint, clustering aims at the minimal sumof-squares MSSC based on some definition of similarly or distance.But the variable is a 0-1 discrete indicator matrix Y ∈ Ê m×n n is the number of samples, and m is the number of clusters , thus the optimization becomes a mixed integer programming with nonlinear objective, which is NP-hard.The computation time of NP-hard problems is too heavy to use in practical engineering applications and should be accelerated by some approximation methods.K-means and spectral clustering are the two most well-known algorithms to approximately solve the clustering problem.However, the former is merely able to achieve local optimum through an indirect way, and the later, which can obtain a global optimum through eigenvalue decomposition, is unfortunately a weak relaxation of the N-cut problem 7 .Recent research of semidefinite programming SDP relaxation on clustering proposed that the minimal sum-of-squares MSSC 22 is possible to be relaxed as SDP problems, which is more direct than K-means and tighter than spectral clustering 7 .We solve the problem in 3.2 using the algorithm described in Algorithm 1.To test the algorithm, we carry out some experiments on four real datesets collected from UCI machine learning repository http://www.ics.uci.edu/∼mlearn/MLRepository.html.The datasets are listed in Table 1 together with some their characteristics.k, Size, and Dim are the class number, the number of instances, and the number of attributes of each dataset, respectively.All the experiments are done by using Matlab on a personal computer with a 2.93 GHz processor and a 2 G memory.The accuracy of clustering each dataset of our algorithm is compared with K-means, spectral clustering, and CVX here CVX means the SDP problem in 3.2 is solved using the CVX toolbox .The results are shown in Table 1.We can see that i the proposed algorithm performs better than K-means, ii the proposed algorithm achieves higher accuracy than the spectral clustering that is because the proposed algorithm is the SDP relaxation of the clustring problem and is tighter than the spectral relaxation, iii the proposed algorithm is as good as CVX.This means it can solve the SDP problem as accurately as the CVX toolbox.
We also compare the speed of the proposed algorithm with CVX, and the results are shown in Table 2.It shows that the proposed algorithm is much faster than CVX.
As observed from the experiments, it can be found that the proposed algorithm performs very well on these datasets and can efficiently solve the 0-1 SDP problem for clustering.Both speed and accuracy are improved as compared with traditional interior-point SDP solvers.We plan to further extend the approach to more general problems and look for other applications such as modeling 23, 24 and stabilizing some kinds of dynamical systems 25 model reduction.

Conclusion
This paper proposed an approach to solve the optimization problem with SDP constraints using matrix generation and exponentiated gradient.The process is faster and more scalable than the traditional SDP approach.The results of numerical experiments show that the method scales very well with the problem dimensions.We also applied the approach to solve the SDP relaxation clustering problem.Experimental results on real datasets show that the algorithm outperforms much in both speed and accuracy as compared with interior-point SDP solvers.
i a bold lower-case letter x : a column vector, ii an upper-case letter X : a matrix, iii A 0: a positive semidefinite p.s.d.matrix, iv a b: the component-wise inequality between two vectors, v Ê m×n : the vector space of real matrices of size m × n, vi Ë: the space of real matrices, vii Ë n : the space of symmetric matrices of size n × n, viii Ë n : the space of symmetric positive semidefinite matrices of size n × n, ix A, B Tr A B : the inner product defined on the above spaces, x Tr • : the trace of a matrix, xi Δ n : x ∈ Ê n | x 0, 1 x 1: the n-dimensional simplex set for vectors, xii P n : {X ∈ Ë n | X 0, Tr X 1}: the density matrix set, xiii Q n : {X ∈ Ë n | X 0, Tr X 1, rank X 1}: the dyad set of matrices.

Algorithm 1 :
The matrix generation.under the assumption that the object function f • is a convex Lipschitz continuous function with Lipschitz constant L f w.r.t. a fixed given norm • .The mathematical definition of L f is that |f x − f z | ≤ L f x − z holds for any x, z in the domain of f • .The detail of the EG algorithm is shown in Algorithm 2.

Figure 1 :
Figure 1:The convergence of the proposed algorithm.The blue curve shows the optimal value obtained by the proposed algorithm at each iteration, and the dash line shows the ground truth obtained by directly solving the original problem in 2.16 using the CVX toolbox.

Figure 2 :
Figure 2: Running time of the proposed algorithm.The blue curve shows the running time of the proposed algorithm versus the matrix dimension, and the red curve shows the running time of CVX versus the matrix dimension.

Figure 3 :
Figure 3: Memory consumption of the proposed algorithm.The blue curve shows the approximate memory consumption of the proposed algorithm versus the matrix dimension, and the red curve shows the approximate memory consumption of CVX versus the matrix dimension.

Table 1 :
UCI datasets used together with some characteristics and the clustering results achieved using the different methods.

Table 2 :
Time consumptions seconds of the proposed algorithm and CVX on clustering the selected UCI datasets.is the so-called affinity matrix whose entries represent the similarities or closeness among the entities in the dataset, and m is the class number of the dataset.According to 22 , it is reasonable to relax the nonnegative constraint in 3.1