A Symmetric Rank-one Quasi Newton Method for Non-negative Matrix Factorization

As we all known, the nonnegative matrix factorization (NMF) is a dimension reduction method that has been widely used in image processing, text compressing and signal processing etc. In this paper, an algorithm for nonnegative matrix approximation is proposed. This method mainly bases on the active set and the quasi-Newton type algorithm, by using the symmetric rank-one and negative curvature direction technologies to approximate the Hessian matrix. Our method improves the recent results of those methods in [Pattern Recognition, 45(2012)3557-3565; SIAM J. Sci. Comput., 33(6)(2011)3261-3281; Neural Computation, 19(10)(2007)2756-2779, etc.]. Moreover, the object function decreases faster than many other NMF methods. In addition, some numerical experiments are presented in the synthetic data, imaging processing and text clustering. By comparing with the other six nonnegative matrix approximation methods, our experiments confirm to our analysis.


Introduction
An NMF problem is to decompose a nonnegative matrix V ∈ R n×m into two nonnegative matrix W ∈ R n×k and H ∈ R k×m , such that the W H approximate to V as well as possible. To measure the distance between W H and V , there are many methods such as the Kullback Leibler divergence, Bregman divergence, Frobenius divergence, etc. Due to the favorable property of the Frobinius divergence, many methods are presented based on it. The Frobenius divergence is as follows: subject to W ij ≥ 0 ,H ij ≥ 0 for all i and j.
In the last decade, numerous methods have been proposed to deal with the NMF problem in Eq.(1.1). Most of these methods can be classified into two classes i.e., alternating one-step gradient descent and alternating least squares.
• The alternating one-step gradient is to alternating W and H with one step. The most well known one is Lee's multiplicative update algorithm [11], which alternates W and H by the following rules: Suppose we have obtained the lth matrix W l and H l , then • The frame work of the alternating least squares [4] is as follows: (1) Initialize H ∈ R k×m with a nonnegative matrix; (2) Solving the following problem repeatedly until a stopping criterion is satisfied: where H is fixed,and where W is fixed.
Obviously, the above two methods both satisfied  [3,5,6], active set method [7,8,9]. Most of the methods are based on the solution of nonnegative least squares.
Note that the Frobenius norm of a matrix is just the sum of the Euclidiean distance over columns (or rows). Then solving Eq(1.5) can be boiled down to solve a series of Nonnegative least squares (NNLS) problems of the following forms: where h i , V i , i = 1, 2, · · · , m are the columns of H and V , respectively.
Next, for convenience, we write the least squares by omitting their subscripts. For example, If x is the optimal solution of Eq.(1.9), then x satisfies the following equation: x ≥ 0, (1.10) which is called the KKT optimal condition [26].
This tells us that we can try to find a decent method which after finite iterations the numerical solution will satisfy the condition (1.10).
The remained part of this paper is organized as follows. In the second part we will propose our algorithm for a single right hand vector non-negative least squares. In addition, the symmetric rank-one quasi-Newton method for NMF will be discussed in the third part. In the fourth part we present the numerical experiments on both synthetic data and real world data. And the last part is to be the conclusion part, in this part we will forecast the future work for the NMF.

A New Method for Nonnegative Least Squares Problems
The non-negative least squares can be regarded as a quadratic programming with box constrains, whose supper constrains can be considered infinite. Recently, C. Lin proposed the projected algorithm for NNLS in [2]. Later in [5], D. Kim, S. Sra and I. S. Ddillon have proposed the fnmae method for non-negative least square programming based on the quasi-Newton and active set method. Few years later, PingHua Gong [14] improved the fnmae algorithm by using the symmetric property of the Hessian matrix of the object function f (W, H). In this paper we will use the symmetric rank-one quasi-Newton line search technology approximate the Hessian matrix. In addition, we will combine the block active method with the quasi Newton method, then use the symmetric rank-one technology to modify the BFGS in Lin [2].

A Symmetric Rank-one Quasi Newton Method
For the NNLS problem Eq.(1.9), our method is iterative and in each iteration we partition the variables into two part, namely the inactive set and the active set. Suppose the current iteration is l, then we are going to compute the l + 1th iteration.
Based on the idea of fixed set in [5] and the theory in [25], we denote an inactive set as follows.
where ǫ l = min(ǫ, ||h l − ∇g(h l )|| 2 F ), ǫ is a small positive scalar. And [∇g(h)] i denotes the ith element of the gradient ∇g(h) of the object function g(h).
For convenience, in some places we will slightly abuse notations, and say that h l i ∈ I l + whenever i ∈ I k + .
Denote the active variables and the inactive variables at the current iteration by ξ l and η l , respectively. Assuming that the h l and ∇g(h l ) are partitioned as follows: where ξ l / ∈ I l + and η l ∈ I l + . That is to say ξ l is the active part of the current iteration. Then we compute the projection ξ by the following equation: where α ≥ 0 is the iteration step-size,D l is a gradient scaling matrix, P [a] is the positive projection, i.e., Then we update the current result h l by the following rules where the right equation uses the fact that η l is fixed to zeros, it can be comprehended as that this part satisfies the KKT optimal condition Eq.(1.10), then in the next iteration we need not to update this part. After updating the h l+1 we can compute the ∇g(h l+1 ) and updating the active set I l+1 + to obtain the next iterate result. In the whole iteration we will adhere to the iteration rule until the result satisfies the stop criterion.

The Approximate toD l
As the size of ξ l and η l changes at each iteration, the computation of the matrixD l is not an easy task. Note that the curvature information of ξ l is received from the curvature information of h l . Due to this fact, the gradient matrixD l can be obtained by taking the proper sub-matrix of the matrixD to avoid this task. WhereD is a matrix that carry the curvature information of the vector h l . In realization of the method we can try to eliminate the curvature information of η l . Then the update rule (2.5) can be regarded as follow: (2.6) In [2], C. Lin used the BFGS update method to approximate the Hessian matrix of the object function g(h). The BFGS method is well-established, and only uses the gradient information of the object function. But BFGS is time consuming. Many researchers have experimentally observed that the symmetric rank-one(SR1) rule performs better than BFGS quasi-Newton update rules [21]. In this paper we will use the SR1 rule to approximate the Hessian matrix of g(h) to improve the the converge speed.
Suppose H l is the current approximate of the Hessian matrix, then using the SR1 to approximate the next Hessian matrix, we have that where ω l = ∇g(h l+1 ) − ∇g(h l ) and v l = h l+1 − h l . Let D l be the inverse of the H l , then it can be achieved by using the Sherman-Morrison-Woodbury formula [27]. We can obtain that

Line Search Strategy for
Step-size From (2.5), α > 0, we are considering to find α with the largest function reduction. For this purpose, we choose the searching rule as follows: Thus, we obtain the following symmetric rank-one NNLS algorithm (see Algorithm I).
compute the active set using the rule (2.1); Step 3. % update the current solution compute the active set using Eq. Step 4.
if the current result reach the stop criterion then stop, the current solution is the final solution; else goto Step 2 and Step 3.

A Symmetric Rank-one Quasi Newton Method for NMF
In the former part, we have discussed the algorithm for NNLS, then in this part we are going to talk about the algorithm for NMF. Note that, in Section 1, we have analysed that the NMF problem can be resolved into several non-negative least squares. Due to this property, we can apply the symmetric rank-one method to our NMF problem.

Applying the SR1 Directly to NNLS Subproblems
For Eq.(1.5), we can do vectorization of the matrix H, and obtain the where tr(B) is the trace of matrix B. Then using the quasi-Newton idea, the iteration of vec(H) is where α l is the step-size of the current iteration,D is the gradient scaling matrix and ∇g(H) is the gradient of (3.1).
As we all known, Eq.(3.1) is actually a non-negative least square problem, then we can apply the method discussed in the former part to this problem directly. First let the inactive set of (3.1) be defined as where ǫ l can be obtained by the same method of the former part. In Eq.(3.2), the gradient scaling matrixD l is approximated by the symmetric rank-one quasi Newton method. The step-size α l is done by the line search. For Eq.(1.4), we can do by the same method. The object function corresponding to W is

The Algorithm for NMF
Next, based on the Algorithm I, we give the symmetric Rank-one quasi-Newton algorithm for NMF as follows.

Numerical Experiments
In this part, we present some experiment results from both rand data and real world data and compare our method to the other six following algorithms.
In our experiments, we initialize all the method randomly, and show plots of the relative error against the number of iteration or the time of iteration, where the relative error of approximation is ||V − W H|| F /||V || F .

Synthetic Data Experiment
We generate the synthetic data randomly, in our method we test the randomly generated matrix V ∈ R n×m (n = 200, m = 40), the approximate rank is k = 10 and another matrix V ∈ R n×m (n = 2000, m = 800), which we set the approximate rank k = 5, 10, 20, respectively. In this experiment we all done ten times of the random matrix, then we take the average result of the ten experiment data.   Figure 1 explains the random matrix V ∈ R 200×40 for the rank k = 10, and from the figure we can learn that, comparing with fnmae [5], the object function decreases more per iteration, and our method costs less time to reach the same relative error. As a whole the relative errors of approximate of the SR1, fnmae and pnm are very similar.     The numerical results show that our symmetric rank-one quasi-Newton method improves the efficiency of the quasi-Newton method every iteration, and cuts down the time of each iteration. Comparing with other methods, our SR1 method decreases    faster in each iteration. And the nnmalq method decreases slow in every iteration, so become less competitive. Since in our experiment the error of nmf method is much lager than the other six methods mentioned above, we did not plot the error curve of this method.

Application to imaging processing
NMF was originally motivated by Lee and Seung [11] using an image processing application. Many others have also considered NMF to image processing, face recognition application, model recognition application and signal processing application. In this part we are going to do some numerical experiments of image processing application. Our experiment is done on four random choosen faces 1 and the size of each face image is 92 × 112 with 256 gray per pixel. We give out the the reconstructed image of running 10, 20, 50 iterations, respectively. In this experiment, we set the approximate rank k = 14, and the images are taken randomly from the 5 random initial matrix in each each iteration. The first image in each row is the original image , followed by reconstructions obtained by SR1, fnmae, pnm, AS, alsq, nnmalq and nmf. From figures 7 and 8 , we can learn that, after 10 and 20 iterations SR1 can reconstruct better than fnmae, AS, pnm procedures, and much better than alsq, nnmalq and nmf procedures. Figure 9 illustrates that, the images obtained via SR1, AS, fnmae, pnm methods have similar quality, and the image obtained by alsq is still vague.

Application to Text Clustering
In order to test the application to text analysis, we are going to apply our method to text clustering, and compare with the other methods. We show the numerical results of the above methods except the nmf method on four text datasets, which are high dimensional and sparse. The text data is collected from newspapers, such us LA times, San Jose Mercury and so on 2 . In the numerical experiments we all set the approximate rank k to the number of classes, and we all initialize the matrix randomly. As the same with the random data, we present the relative error of approximate against the number of iterations. We present the comparison of the six methods mentioned above except the nmf method in the left hand plot. The right hand plot is to outstand the comparison of the SR1, AS and pnm methods. From figures 10-13, we can known that with the same random initial matrix, at the beginning several iterations our method is much better than pnm, and decrease much faster than the other methods. After 5 iterations, the quality of the methods are more and more similar.

Conclusion
In this paper we present an algorithm for NMF, it is different from the fnmae method [5] in the following aspects: (1) The active set in our method is a relax form, while the active set in [5] is an hard form. The fact that an active set of hard for exhibits undesirable discontinuity at the boundary of the constraint set has been showed in Bertsekas [25]. This is harm to the convergence rate, so we use the relax form to avoid this problem.
(2) In addition, the fnmae method [5] using the BFGS method to approximate the Hessian matrix, we use the symmetric rank one method to approximate the hessian matrix. The symmetric rank one method has been experimentally shown better than the BFGS method. What is more, our method is also different from the pnm method in [14] for the aspect that the approach of the approximating the Hessian matrix. In [14], using the symmetric property of the Hessian matrix approximates the inverse of the Hessian matrix by the Choleskey factorization, while in our method we approximate the inverse of the Hessian matrix by the symmetric rank-one method. Numerical experiments show that the object function decreases more per iteration in our method than in the pnm method [14] in some cases.
All in all, we propose the symmetric rank one quasi Newton method for the NMF. This method maintains the decrease speed per iteration and decrease the computational time. From the experimental results both the synthetic data and real world data, our method performs very well. Due to the wide use of the non-negative matrix factorization, to learn more effective methods is necessary. In addition, there are many factors which influence the efficiency of the algorithm for NMF, such as the approximate rank, sparseness and so on. So researching more efficiency algorithm for NMF by approximating rank and the stop criterion is necessary in the future.