An Efficient Method for Convex Constrained Rank Minimization Problems Based on DC Programming

The constrained rankminimization problemhas various applications inmany fields includingmachine learning, control, and signal processing. In this paper, we consider the convex constrained rank minimization problem. By introducing a new variable and penalizing an equality constraint to objective function, we reformulate the convex objective function with a rank constraint as a difference of convex functions based on the closed-form solutions, which can be reformulated as DC programming. A stepwise linear approximative algorithm is provided for solving the reformulatedmodel.Theperformance of ourmethod is tested by applying it to affine rank minimization problems and max-cut problems. Numerical results demonstrate that the method is effective and of high recoverability and results on max-cut show that the method is feasible, which provides better lower bounds and lower rank solutions compared with improved approximation algorithm using semidefinite programming, and they are close to the results of the latest researches.


Introduction
In recent decades, with the increase of the system acquisition and data services, the explosion of data poses challenges to storage, transmission, and processing, as well as device design.The burgeoning theory of sparse recovery reveals the outstanding sensing ability on the large scales of highdimensional data.Recently, a new theory called compressed sensing (or compressive sampling (CS)) has appeared.And this theory is praised as a big deal in signal processing area.This approach can acquire the signals while compressing data properly.Its sampling frequency is lower than that of Nyquist, which makes the collections of high resolution signal possible.One noticeable merit of this approach is its ability to combine traditional data collection and data compression into a union when facing sparse representation signals.This merit means that sampling frequency of signals, time and computational cost consuming on data processing, and expense for data storage and transmission are all greatly reduced.Thus this approach leads signal processing to a new age.However, same with CS, Low Rank Matrix theory can also solve signal reconstruction problems that are related to theory and practice.It is also carried out through exerting dimension reduction treatments on unknown multidimension signals validly.And then use sparse concepts in a more broad sense to the problems.That is to say, reconstruct the original high-dimensional signals array completely or approximately by a small number of values measured by sparse (or reducing dimensions) sampling.Considering that both of these two theories are closely related to intrinsic sparsity of signals, we generally view them as sparse recovery problems.The main goal of sparse recovery problems is using less linear measurement data to recovery high-dimensional sparse signal.Its core mainly includes three steps: signal sparse representation, linear dimension reduction measure, and the nonlinear reconstruction of signal.The three parts for the sparse recovery theory are closely related; each part will have a significant impact on the quality of signal reconstruction.However, the nonlinear reconstruction of signal can be seen as a special case of rank minimization problem to be solved.
Constrained rank minimization problems have attracted a great deal of attention over recent years, which aim to find sparse solutions of a system.The problems are widely applicable in a variety of fields including machine learning [1,2], control [3,4], Euclidean embedding [5], image processing [6], and finance [7,8], to name but a few.There are some typical examples of constrained rank minimization problems; for example, one is affine matrix rank minimization problems [5,9,10] where  ∈  × is the decision variable and the linear map A :  × →   and vector  ∈   are known.
Another is max-cut problems [25] in combinatorial optimization min Tr () where  =   ,  = ( 1 , . . .,   )  ,  = (1/4)(diag() − ),  = (1, 1, . . ., 1)  ,  is the weight matrix, and so on.As a matter of fact, these problems can be written as a common convex constrained rank minimization model; that is, min  () where  :  × →  is a continuously differentiable convex function, rank() is the rank of a matrix ,  ≥ 0 is a given integer, Γ is a closed convex set, and Ω is a closed unitarily invariant convex set.According to [26], a set  ⊆  × is a unitarily invariant set if where   denotes the set of all unitary matrices in  × .Various types of methods have been proposed to solve (3) or its special cases with different types of Γ ∩ Ω.
When Γ ∩ Ω is  × , (3) is the unconstrained rank minimization problem.One approach is to solve this case by convex relaxation of rank(⋅).Cai et al. [9] proposed singular value thresholding (SVT) algorithm; Ma et al. [10] proposed fixed point continuation (FPC) algorithm.Another approach is to solve rank constraint directly.Jain et al. [27] proposed simple and efficient algorithm Singular Value Projection (SVP) based on the projected gradient algorithm.Haldar and Hernando [28] proposed an alternating least squares approach.Keshavan et al. [29] proposed an algorithm based on optimization over a Grassmann manifold.
When Γ ∩ Ω is a convex set, (3) is a convex constrained rank minimization problem.One approach is to solve it by replacing rank constraint with other constraints, such as trace constraint and norm constraint.Based on this, several heuristic methods have been proposed in literature; for example, Nesterov et al. [30] proposed interior-point polynomial algorithms; Weimin [31] proposed an adaptive seminuclear norm regularization approach; in addition, semidefinite programming (SDP) has also been applied; see [32].Another approach is to solve rank constraint directly.Grigoriadis and Beran [33] proposed alternating projection algorithms for linear matrix inequalities problems.Mohseni et al. [24] proposed penalty decomposition (PD) method based on penalty function.Gao and Sun [34] recently proposed the majorized penalty approach.Burer et al. [35] proposed nonlinear programming (NLP) reformulation approach.
In this paper, we consider problem (3).We introduce a new variable and penalize equality constraints to objective function.By this technique, we can obtain closed-form solution and use it to reformulate the objective function in problem (3) as a difference of convex functions; then (3) is reformulated as DC programming which can be solved via linear approximation method.A function is called DC if it can be represented as the difference of two convex functions.Mathematical programming problems dealing with DC functions are called DC programming problems.Our method is different from original PD [26]; for one thing we penalize equality constraints to objective function and keep other constraints, while PD penalizes all except rank constraint to objective function, including equality and inequality constraints; for another each subproblem of PD is approximately solved by a block coordinate descent method, while we approximate problem (3) to a convex optimization to be solved finally.Compared with PD method, our method uses the closed-form solutions to remove rank constraint, while PD needs to consider rank constraint in each iteration.It is known that rank constraint is a nonconvex and discontinuous constraint, which is not easy to deal.And due to the remaining convex constraints, the formulated programming can be solved by solving a series of convex programming.In our method, we use linear function instead of the last convex function to formulate the programming to convex programming.We test the performance of our method by applying it to affine rank minimization problems and maxcut problems.Numerical results show that the algorithm is feasible and effective.
The rest of this paper is organized as follows.In Section 1.1, we introduce the notation that is used throughout the paper.Our method is shown in Section 2. In Section 3, we apply it to some practical problems to test its performance.
1.1.Notations.In this paper, the symbol   denotes the dimensional Euclidean space, and the set of all × matrices with real entries is denoted by  × .Given matrices  and  in  × , the standard inner product is defined by ⟨, ⟩ fl Tr(  ), where Tr(⋅) denotes the trace of a matrix.The Frobenius norm of a real matrix  is defined as ‖‖  fl √ Tr(  ).‖‖ * denotes the nuclear norm of , that is, the sum of singular values of .The rank of a matrix  is denoted by rank().We use   to denote the identity matrix, whose dimension is .Throughout this paper, we always assume that the singular values are arranged in nonincreasing order, that is,  1 ≥  2 ≥ ⋅ ⋅ ⋅ ≥   > 0 =  +1 = ⋅ ⋅ ⋅ =  min{,} . denotes the subdifferential of the function .We denote the adjoint of  by  * .‖  ‖ () is used to denote Ky Fan -norms.Let  Ω be the projection onto the closed convex set Ω.

An Efficient Algorithm Based on DC Programming
In this section, we first reformulated problem (3) as a DC programming problem with the convex constrained set, which can be solved via stepwise linear approximation method.
Then, the convergence is provided.
Note that   = (  )  ; thus they have the same nonzero singular values, and then (  ) = (  ) holds.Together with (12), we have It is known that for arbitrary matrices ,  ∈  × , Further, That is, where   and   are symmetric matrices.It is also known that, for arbitrary Hermitian matrices   ,   ∈  × ,  = 1, 2, . . ., , the following holds (see Exercise II.1.15 in [24]) Thus Together with ( 13) and ( 16), (18) yields Hence, and we have Moreover where According to above theorem, (10) can be rewritten as Since the objective function of ( 23) is a difference of convex functions, the constrained set Γ ∩ Ω is a closed convex constraint, so ( 23) is DC programming.Next, we will use stepwise linear approximate method to solve it.
Stepwise linear approximative algorithm for solving model ( 23) can be described as follows.

Remark
(1) Subproblem ( 26) can be done by solving a series of convex subproblems.The efficient method for convex programming is multiple and it has perfect theory guarantees.
(a) When Γ ∩ Ω is  × , closed-form solutions can be obtained; for example, when solving model (1),  +1 = ( *  + ) −1 (  (  )    ) are obtained by the first-order optimality conditions (b) When Γ ∩ Ω is a general convex set, the closedform solution is hard to get, but subproblem ( 26) can be solved by some convex optimization software programs (such as CVX and CPLEX).
According to different closed convex constraint sets, we can pick and choose suitable convex programming algorithm to combine with our method.The toolbox listed in this paper is just an alternative choice.
(2) The stop rule that we adopted here is ‖ +1 −   ‖  ≤  with some  small enough.

Convergence Analysis.
In this subsection, we analyze the convergence properties of the sequence {  } generated by above algorithm.Before we prove the main convergence result, we give the following lemma which is analogous to Lemma 1 in [10].
Lemma 2. For any  1 and where   is the pre- singular values approximate of   ( = 1, 2).
Proof.Without loss of generality, we assume  ≤  and the singular value decomposition where And we note that where Applying (32) to above three terms, we have Mathematical Problems in Engineering Hence, without loss of generality, assuming  ≤  ≤ , we have which implies that Next, we claim the value of iterations of stepwise linear approximative algorithm is nonincreasing.Theorem 3. Let {  } be the iterative sequence generated by stepwise linear approximative algorithm.Then {(  )} is monotonically nonincreasing.
Let  =  +1 ,  =   in above inequality; we have Therefore, Consequently, We immediately obtain the conclusion as desired.
Next, we show the convergence of iterative sequence generated by stepwise linear approximative algorithm when solving (23).Theorem 4 shows the convergence when solving (23) with Γ ∩ Ω =  × .Theorem 4. Let Υ be the set of stable points of problem (23) with no constraints; then the iterative sequence {  } generated by linear approximative algorithm converges to some   ∈ Υ when  → ∞.
Proof.Generally speaking, subproblem (26) can be solved by projecting a solution obtained via solving a problem with no constraints onto a closed convex set; that is,  +1 =  Γ∩Ω ( +1 ), where  +1 ∈ arg min{(,   )}.According to Theorem 4, the sequence {‖  −  opt ‖  } is monotonically nonincreasing and convergent when  → ∞; together with the well known fact that  is a nonexpansive operator, we have where  opt is the projection of  opt onto Γ ∩ Ω.Clearly,  opt ∈ .

Numerical Results
In this section, we demonstrate the performance of our method proposed in Section 2 by applying it to solve affine rank minimization problems (image reconstruction and matrix completion) and max-cut problems.The codes implemented in this section are written in MATLAB and all experiments are performed in MATLAB R2013a running Windows 7.

Image Reconstruction Problems.
In this subsection, we apply our method to solve one case of affine rank minimization problems.It can be formulated as where  ∈  × is the decision variable and the linear map A :  × →   and vector  ∈   are known.Clearly, (47) is a special case of problem (3).Hence, our method can be suitably applied to (47).Recht et al. [39] have demonstrated that when we sample linear maps from a class of probability distributions, then they would obey the Restricted Isometry Property.There are two ingredients for a random linear map to be nearly isometric.First, it must be isometric in expectation.Second, the probability of large distortions of length must be exponentially small.In our numerical experiments, we use four nearly isometric measurements.The first one is the ensemble with independent, identically distributed (i.i.d.) Gaussian entries.The second one has entries sampled from an i.i.d.symmetric Bernoulli distribution, which we call Bernoulli.The third one has zeros in two-thirds of the entries and others sampled from an i.i.d.symmetric Bernoulli distribution, which we call Sparse Bernoulli.The last one has an orthonormal basis, which we call Random Projection.Now we conduct numerical experiments to test the performance of our method for solving (47).To illustrate the scaling of low rank recovery for a particular matrix , consider the MIT logo presented in [39].The image has 46 rows and 81 columns (the total number of pixels is 46 × 81 = 3726).Since the logo only has 5 distinct rows, it has rank 5. We sample it using Gaussian i.i.d., Bernoulli i.i.d., Sparse Bernoulli i.i.d., and Random Projection measurements matrix with the number of linear constraints ranging between 700 and 2400.We declared MIT logo  to be recovered if ‖ −  opt ‖  /‖‖  ≤ 10 −3 ; that is, SNR ≥60 dB (SNR = −20 log(‖ −  opt ‖  /‖‖  )).Figures 1 and 2 show the results of our numerical experiments.1200 constraints.Binary requires the minimal constraints about 950.The Gauss and Sparse Binary measurements reach their maximal SNR of 220 dB at around 1500 constraints; the Binary measurements can reach its maximal SNR of 180 dB at around 1400 constraints.Random Projection has more large phase transition point but has the best recovery error.We observe a sharp transition to perfect recovery near 1000 measurements lower than 2( +  − ).Compared with Figure 2, which shows results of nuclear norm heuristic method in [39] and the method that is a kind of convex relaxations, there is a sharp transition to perfect recovery near 1200 measurements which is approximately equal to 2(  − ).The comparison shows our method has better performance that it can recover an image with less constraints; namely, it needs less samples.In addition, we observe that SNR of nuclear norm heuristic method achieves 120 dB when the number of measurements approximately equals 1500, while our method just needs 1200 measurements, which decrease by 20% compared with 1500.If the number of measurements is 1500, accuracy of our method would be higher.Among precisions of four different measurement matrixes, the smallest one is precision of Random Projection.However, the precision value is also higher than 120 dB.The performance illustrates that our method is more accurate.Observing the curve of between 1200 and 1500, we can find that our curve rises smoothly compared with the curve in the same interval in Figure 2; this shows that our method is stable for reconstructing image.
Figure 3 shows recovered images under four different measurement matrices with 800, 900, and 1000 linear constraints.And Figure 4 shows recovered images by our method and that by SeDuMi.We chose to use this interior-point method because it yields the highest accuracy.Distinctly, the recovered images by SeDuMi have some shadow, while ours is clear.Moreover, as far as time is concerned, our method is more fast and the computational time is not extremely sensitive to  compared to SeDuMi.The reason is that closedform solutions can be obtained when we solve this problem.
Comparisons are shown in Table 1.

Matrix Completion Problems.
In this subsection, we apply our method to matrix completion problems.It can be reformulated as where  and  are both  ×  matrices and Ω is a subset of index pairs (, ).Up to now, numerous methods were proposed to solve the matrix completion problem (see, e.g., [9,10,29,31]).It is easy to see that problem (48) is a special case of general rank minimization problem (3) with In our experiments, we first created random matrices   ∈  × and   ∈  × with i.i.d.Gaussian entries and then set  =      .We sampled a subset Ω of  entries uniformly at random.We compare against singular value thresholding (SVT) method [9], FPC, FPCA [10], and the OptSpace (OPT) method [29] using as a stop criterion.The maximum number of iterations is no more than 1500 in all methods; other parameters are default.We report result averaged over 10 runs.We use SR = /() to denote sampling ratio and FR = ( +  − )/ to represent the dimension of the set of rank  matrices divided by the number of measurements.NS and AT denote number of matrices that are recovered successfully and average times (seconds) that are successfully solved, respectively.RE represents the average relative error of the successful recovered matrices.We use the relative error      −  *     \ ‖‖  (50) to estimate the closeness of  * to , where  * is the optimal solution to (48) produced by our method.We declared  to be recovered if the relative error was less than 10 −3 .Table 2 shows that our method has the close performance with OptSpace in column of NS.These performances are better than those of SVT, FPC, and FPCA.However, SVT, FPC, and FPCA cannot obtain the optimal matrix within 1500 iterations which shows that our method has advantage in finding the neighborhood of optimal solution.The results do not mean that these algorithms are invalid, but the maximum   number of iterations is limited to 1500.When the rank of matrix is large (i.e.,  ≥ 5), the average time of our algorithm is less than that of OptSpace.In some cases, the relative errors of DC are minimal.When the dimension of matrix becomes larger, DC method has a better performance of NS and similar relative errors with that of OptSpace.In terms of time, our time is longer than that of OptSpace, but shorter than that of FPC and SVT.Tables 3 and 4 show the above results.In conclusion, our method has better performance in "hard" problems; that is, FR > 0.34.The problems involved matrices that are not of very low rank, according to [10].Table 5 gives the numerical results for problems with large dimensions.In these numerical experiments, the maximum number of iterations is chosen by default of each compared method.We observe that our method can recover problems successfully as other four methods, while our method is  slower with the increase of dimension.However, our method is faster than FPC.Though SVT, FPCA, and OptSpace outperform our method in terms of speed, they are specifically developed method for matrix completion problems while our method can be applied to more class of problems.To prove this, we will apply our method to max-cut problems.We now address the initialization and termination criteria for our method when applied to (51).We set  0 =   ; in addition, we choose the initial penalty parameter  to be 0.01 and set the parameter  to be 1.1.And we use ‖ +1 −   ‖  ≤  as the termination criteria with  = 10 −5 .
In our numerical experiments, we first test several small scale max-cut problems; then we perform our method on the standard "" set problems written by Professor Ginaldi [40].In all, we consider 10 problems ranging in size from 800 variables to 1000 variables.The results are shown in Tables 6 and 7, in which "" and "" represent corresponding dimension of graph and the rank of solution, respectively.The "obv" denotes objective values of optimal matrix solutions operated by our method.
In Table 6, "KV" denotes known optimal objective values and "UB" presents upper bounds for max-cut by the semidefinite relaxation.It is easy to observe that our method is effective for solving these problems and it has computed the global optimal solutions of all problems.Moreover, we obtain lower rank solution compared with semidefinite relaxation method.
Table 7 shows the numerical results for standard "" set problems."SUB" denotes upper bounds by spectral bundle method proposed by Helmberg and Rendl [41]."GW-cut" denotes the approximative algorithm proposed by Goemans and Williamson [42]."DC" denotes results generated by our method, DC + is the results of DC with neighborhood approximation algorithm, and the right most column, titled BK, gives best results as we have known according to [43,44].We use "-" to indicate that we do not obtain the rank.From Table 3 we can note that our optimal values are close to the results of latest researches, better than values given by GW-cut, and they are all less than the corresponding upper bounds.For the problems, we all get the lower rank solution compared with GW-cut; moreover, most solutions satisfy the constraints rank = 1.The results prove that our algorithm is effective when used to solve max-cut problems.

Concluding Remarks
In this paper we used closed-form of penalty function to reformulate convex constrained rank minimization problems as DC programming, which could be solved by stepwise linear approximative method.Under some suitable assumptions, we showed that accumulation point of the sequence generated by the stepwise linear approximative method was a stable point.The numerical results on affine rank minimization and max-cut problems demonstrate that our method is generally comparable or superior to some existing methods in terms of solution quality, and it can be applied to a class of problems.However, the speed of our method needs to be improved.

Figure 1 :
Figure 1: Performance curves under four different measurement matrices: SNR versus the number of linear constraints.

Figure 1 TheFigure 2 :
Figure 2: Performance curves of nuclear norm heuristic method under four different measurement matrices: SNR versus the number of linear constraints.

Figure 3 :
Figure 3: Recovered images under four measurement matrices with different numbers of linear constraints.From left to right: the original, results under Gauss, Binary, Sparse Binary, and Random Projection measurement.From top to bottom:  = 800,  = 900, and  = 1000.

Figure 4 :
Figure 4: Recovered images under different numbers of linear constraints.From left to right: the original, results under our method, and results by SeDuMi.From top to bottom:  = 800,  = 900, and  = 1000.

Table 5 :
Numerical results for problems with different dimension.

Table 6 :
Numerical results for famous max-cut problems.

Table 7 :
Numerical results for G problems.