Graph Regularized Nonnegative Matrix Factorization with Sparse Coding

In this paper, we propose a sparseness constraint NMF method, named graph regularized matrix factorization with sparse coding (GRNMF SC). By combining manifold learning and sparse coding techniques together, GRNMF SC can efficiently extract the basic vectors from the data space, which preserves the intrinsic manifold structure and also the local features of original data. The target function of our method is easy to propose, while the solving procedures are really nontrivial; in the paper we gave the detailed derivation of solving the target function and also a strict proof of its convergence, which is a key contribution of the paper. Comparedwith sparseness constrainedNMFandGNMFalgorithms,GRNMF SCcan learnmuch sparser representation of the data and can also preserve the geometrical structure of the data, which endow it with powerful discriminating ability. Furthermore, the GRNMF SC is generalized as supervised and unsupervised models to meet different demands. Experimental results demonstrate encouraging results of GRNMF SC on image recognition and clustering when comparing with the other state-of-the-art NMF methods.


Introduction
Previous studies have shown that there is a psychological and physiological evidence for parts-based representation in the human brain [1][2][3][4][5].NMF is such kind of parts-based matrix factorization methods, which can find out the local features from the original data in nonnegative sense.Indeed, the nonnegative constraint leads to a parts-based representation because it allows only additive combinations of components, not subtractive ones.Generally, NMF method is to find two nonnegative matrices whose product provides a good approximation to the original matrix, and, at the same time, it can also learn the parts of objects, which make them very important in some real applications, for example, in face recognition [6][7][8][9] and document clustering [10,11] fields.
However, the standard NMF [5,6] algorithm has several limitations, which has been discussed extensively.One of the notable limitations of standard NMF is that it does not always result in completely parts-based representations.Researchers tried to solve this problem by incorporating sparseness constraints [12][13][14].These approaches extended the NMF framework to include an adjustable sparseness parameter to learn more localized representation.However, previous sparseness constraint NMF approaches paid main attention to the sparseness property, while ignoring preserving the intrinsic geometric structure of the original data, which is vital for classification and clustering.
Recent research shows that when the data is sampled from a probability distribution that resides in or nearby to a submanifold of the ambient space, manifold learning [15][16][17][18][19] can be used to preserve the intrinsic (geometrical) structure.In order to preserve the intrinsic structure of the original data, He and Cai proposed Graph Regularized NMF (GNMF) methods [20,21], which incorporated local preserving projection (LPP) technique [22] to NMF framework.
The experimental results showed that GNMF achieved higher recognition rates and better clustering effect in some popular facial databases (e.g., ORL and YALE database) comparing with previous sparseness constraint NMF [20,21].It means for some datasets, which have apparent geometrical structure, that GNMF really works.While, GNMF still has disadvantage, it cannot ensure the sparseness of the factorization results, which limited the discriminative ability and also increased the computational expense and memory space.Hence, we are motivated to combine the advantage of manifold and sparseness constraint and propose the GRNMF SC algorithm, which can not only preserve the geometrical structure, but also learn much sparser representations of the input data.It needs to emphasize that it is nontrivial to solve the objective function which simultaneously incorporates the Laplacian regularization and sparseness constraint into the NMF framework, because the sophisticated L1-norm solving tools cannot be adopted directly.We start from the initial idea of designing GNMF and incorporate the sparseness constraint smoothly.The concrete steps are first, construct a convex objection function by imposing the above constrains and then develop an optimization algorithm with multiplicative update rules to minimize this objective function.Finally, prove the algorithm can converge to a local minimum.Furthermore, we extend GRNMF SC to both supervised (S-GRNMF SC) and unsupervised versions (GRNMF SC) for image recognition and clustering, respectively; in clustering, the class labels are not available.Experimental results demonstrate that supervised GRNMF SC achieved higher recognition rates, especially in occluded face recognition, when comparing with the typical sparseness based NMF and manifold based NMF methods, and the unsupervised GRNMF SC obtained better clustering performance comparing with the popular clustering algorithms.
The rest of the paper is organized as follows.In Section 2, a brief review of standard NMF and its typical sparse variants is given.In Section 3, the proposed GRNMF SC method and a proof of its convergence are given.Experimental results on image recognition and clustering are presented in Section 4. We conclude the paper and plan the future work in Section 5.

Reviews of Standard NMF and Its Sparse Variants
In this section, we briefly describe the standard NMF algorithm [5] and two typical sparseness constraint NMF algorithms [8,13]; the reason of introducing the two sparseness constraint NMF algorithms is that our method is inspired by them.The introduction of GNMF is merged with GRNMF SC in Section 3.

Nonnegative Matrix Factorization (NMF)
. First, standard NMF [5,6] is introduced.Given a data matrix X = [x 1 , . . ., x  ] ∈ R × , each column of X is an -dimensionalsample vector with nonnegative values.NMF aims to find two nonnegative matrices U ∈ R × and V ∈ R × whose product can well approximate the original matrix.That is, it is to minimize the following cost function: represents the Frobenius norm.Since the objective function is not convex in U and V together, we are not expected to find the global minimum of . Lee and Seung [5] presented the following iterative update rule: It was proven that the above two equations will find local minima of the objective function .LNMF is aimed at learning local features by imposing the following three additional constraints on the NMF basis:

Local Nonnegative Matrix Factorization (LNMF
(1) Maximum orthogonality of U.This constraint imposes that different bases should be as orthogonal as possible so to minimize redundancy between different bases.This can be imposed by ∑  ̸ =   = min.
(2) Maximum expressiveness of U.This constraint reveals that only components giving most important information should be retained.This is imposed by (3) Maximum sparseness in encoding matrix V.It should contain as many zero elements as possible in coefficient matrix V.In other words, the number of basis components required to represent data matrix X is minimized.This can be imposed by The incorporation of the above constraints leads to the following constrained divergence as the objective function for LNMF: where ,  > 0 are some constants.The solution to the above constrained minimization can be presented following multiplicative updating rules: (4)

Sparse Nonnegative Matrix Factorization (SNMF).
Similar to the LNMF algorithm, Liu et al. [13] incorporated linear sparse coding to NMF.The core idea of SNMF is adding sparseness constraints to encoding matrix when conducting matrix factorization.SNMF could learn parts-based representation via fully multiplicative updates because of it adapting a generalized Kullback-Leibler divergence instead of mean square error for approximation error.Thus, the sparse NMF functional is where  ≥ 0. SNMF ensured sparseness via minimizing the sum of all V  .The multiplicative update rules for matrix U, V are

Graph Regularized Nonnegative Matrix Factorization with Sparse Coding (GRNMF_SC)
As mentioned earlier, He and Cai [20,21] proposed a GNMF algorithm.By utilizing the Laplacian regularization to NMF, GNMF can preserve the intrinsic structure of the data efficiently.GNMF can have more strong discriminative power than the classic NMF, when the dataset has apparent geometrical structure.However, since GNMF did not impose any sparse constraint to the basis matrix U or encoding matrix V, it cannot learn the sparse enough representation.
In this part, we incorporate sparse coding into GNMF and propose a GRNMF SC algorithm.

Graph Regularized Nonnegative Matrix Factorization (GNMF)
. GNMF first constructs an affinity graph to encode the geometrical information and then seeks a nonnegative matrix factorization which respects the graph structure.The procedure can be stated as follows.
Step 1.Consider a graph with  vertices where each vertex corresponds to a data point.The edge weight matrix W is defined as follows: where   (x  ) denotes the set of  nearest neighbors of x  .Define L = D − W, where D is a diagonal matrix whose unit entries are column sums of W, D  = ∑  W  .
Step 2. Let   (x  ) = k  be the function that maps the original data point x  onto the axis u  .GNMF then uses ‖  ‖ 2  to measure the smoothness of the function   along the geodesics in the intrinsic geometry of the data.When we consider the case that the data is a compact submanifold M ∈ R  , then the discrete approximation of ‖  ‖ 2  is computed as follows: By minimizing ‖  ‖ 2  , we get a mapping function   which is sufficiently smooth on the data manifold.An intuitive explanation of minimizing ‖  ‖ 2  is that if two data points x  and x  are close, then   (x  ) and   (x  ) are similar to each other.
Step 3. Finally, GNMF incorporates the constraint and minimizes the new objective function with the constraint that   and V  are nonnegative.Tr(⋅) denotes the trace of a matrix, and  ≥ 0 is a regularization parameter.

GRNMF SC.
In order to improve the degree of sparseness of coefficient matrix (V  ) and preserve the intrinsic structure of the high dimensional data, we add an L1-norm regularization to the coefficient matrix.By this way, we expect that each sample in X can be represented by a linear combination of only few basis vectors in U, and thus the sparseness can be guaranteed.The new objective function is as follows: where L = D−W and D  = ∑  W  .For S-GRNMF SC, we set and   =   , and W  = 0 otherwise.  and   denote the class labels of x  and x  .For unsupervised GRNMF SC, the definition of W  is the same as GNMF's in Section 3.1.Finally, the multiplicative updating rules for the above objective function can be represented as GRNMF SC has fully multiplicative update rules with two parameters.When setting  = 0, then GRNMF SC reduces to NNSC [12].When setting  = 0, GRNMF SC reduces to GNMF.Also, we find that the updating rules of encoding Matrix V could be rewritten as the following gradient decent format: In order to preserve the nonnegative property of coefficient matrix V, we should control the parameter  and  to make  positive as well as small.The proof of our optimization scheme of GRNMF SC is given next.The core idea of the proof is using auxiliary function technique as the EM algorithm and then taking turns updating basis matrix U and coefficient matrix V. We begin with the definition of the auxiliary function (V, V  ).Definition 1. (V, V  ) is an auxiliary function for (V) if the following conditions are satisfied: The reason why the auxiliary function is vital for proving is owing to the following theorem.Theorem 2. If  is an auxiliary function of , then  is nonincreasing under the update We start first to derive the multiplicative update steps of encoding matrix V.The objective function of GRNMF SC can be rewritten as Considering any element V  in V, we use    and    to denote the first-order derivative and second-order derivative of the objective function : And then, we define the auxiliary function as We need to prove (V, V) = (V) and (V, ) >   (V), we derive the Taylor series expansion of   (V): The above holds because Finally we obtain In other words, (V, V ()  ) ≥   (V).According to Theorem 2, by taking derivative with respect to V ()   on (17) and setting the derived result to zero, the updating rule of V can be expressed as Similarly, we can also obtain the updating rule of U with regularization:

Experimental Results
The face recognition experiments were performed on two benchmarks, the ORL 48 × 48 database and YALE 32 × 32 database, to test the recognition rates of the proposed S-GRNMF SC algorithm.The ORL contains 400 images, 10 different images per person for 40 individuals.For some individuals, the images were taken at different times.There are variations in facial expressions (open or closed eyes and smiling or nonsmiling) and facial details (glasses or no glasses).YALE database is more challenging than ORL, which contains 165 gray-scale images of 15 individuals.The images demonstrate variations in lighting condition (leftlight, center-light, and right-light), facial expression (normal, happy, sad, sleepy, and surprised), and facial details (glasses or no glasses).The algorithms, NMF [5], LNMF [8], SNMF [13], GNMF [21], and FMD-NMF [23], are used for comparison.LNMF and SNMF are two classical sparseness based NMF; GNMF and FMD-NMF are two manifold based NMF; they are the ideal comparative targets to S-GRNMF SC which merged the two kinds of merit.For all face recognition experiments, the nearest neighbor (NN) classifier was used, and the distance is measured by Euclidean metric.
The clustering experiment was performed on COIL20 image.COIL20 database contains gray scale images of 20 objects viewed from varying angles and each object has 72 images.Four popular clustering algorithms, -means, NMF [5], SNMF [13], and GNMF [21], were used as comparing algorithms.After the matrix factorization, we have a lower dimensional representation of each image.The clustering is then performed in this lower dimensional space.-means is considered as a baseline method which simply performs clustering in the original feature space.

Recognition Results on ORL and YALE Database.
In this part, the face recognition experiment is carried out on the ORL and YALE database.First, to evaluate all methods' ability to dealing with different s for ORL database, a random subset with (3, 4, 5, 6, 7) is taken with labels to form the training set, respectively, and the corresponding remaining part (7, 6, 5, 4, 3) with labels is taken to form the testing set.For YALE database, (4, 5, 6, 7, 8) is taken to form the training set and the corresponding remaining part (7, 6, 5, 4, 3) is taken to form the testing set.For each given /, we averaged the results over 10 random splits.Tables 1 and 2 show the optimal average recognition rates obtained by NMF, LNMF, SNMF, GNMF, FMD-NMF, and S-GRNMF SC with the same feature dimension in different Gms over 10 random splits.
Figure 1 shows the average recognition rates versus feature dimensions of all the competing algorithms.Note that the optimal average recognition rates are obtained over the whole feature dimensions.The feature dimension is chosen from 0 to 100 with intervals of 10 with different 5/5s for ORL database and 7/4 for YALE database.
From Tables 1 and 2 and Figure 1, it is shown that S-GRNMF SC algorithm increases about 2% in average recognition rates compared with FMD-NMF algorithm and delivers nearly 4% of improvement recognition rates compared to GNMF and SNMF.In addition, FMD-NMF obtains the next best results, for the reason that FMD-NMF also utilizes the class label information and preserves the manifold structure like S-GRNMF SC.GNMF and SNMF methods perform comparatively close to FMD-NMF.NMF is slightly worse than SNMF, while LNMF algorithm performs the worst; the reason is LNMF pays main attention to keep the bases orthogonal while ignoring the performance of coefficients; this leads to poor classification performance.

Learning Basis Images from the ORL and YALE Database.
In this subsection, we use NMF, LNMF, GNMF, FMD-NMF, and the proposed S-GRNMF SC algorithms to learn 25 and 49 basis images from the ORL and YALE database, respectively.Then we use the sparseness metric (SP) [24] to measure the sparseness of basis matrix as well as coefficients matrix.The sparseness measure, which is based on the relationship between the L1 norm and the L2 norm, is formularized as where h  is a column vector of matrix H.If all elements of h  are equal, SP(H) is equal to zero; if h  only contains a single nonzero element, SP(H) is equal to unity.The basis images are shown in Figure 2. The sparseness of basis matrix and coefficient matrix is shown in Tables 3 and 4.
From the results, it is clear that the bases obtained by NMF and GNMF are additive, but not spatially localized for facial representations.In contrast, S-GRNMF SC and FMD-NMF have the following advantages and improvements when compared with them.The bases learnt by S-GRNMF SC and FMD-NMF are better than NMF and GNMF, since these bases not only reveal the additive property but also capture the discriminant information and preserve the manifold structure.In addition, though LNMF can obtain the spatially localized bases, it does not take the intrinsic structures of the data into account, which is also important in the classification task.

Face Reconstruction on the ORL Database.
In this subsection, the reconstruction experiments of NMF, LNMF, GNMF, FMD-NMF, and S-GRNMF SC algorithms are performed on the ORL database.We selected 4 different types, that is, men's frontal view with no facial expression (MFV NFE), men's frontal view with wearing glasses (MFV WG), men's lateral view with smiling facial expression (MLV SFE), and women's frontal view with no facial expression (WFV NFE), from images in the ORL database.The reconstructed images using these algorithms compared with the original ones are presented in Figure 3, where the leftmost are the original images, and the images to the right are reconstructed images obtained by NMF, LNMF, GNMF, GRNMF SC, and FMD-NMF, respectively.It is evident that the reconstruction quality of GRNMF SC is better than the others' .This observation is validated by computing all methods' reconstruction residual error through pixel gray value difference between the original and reconstructed images.The residual error is represented by the norm ‖‖ 2 of the residual matrix.The results are shown in Table 5.The reconstruction residual error of GRNMF SC is the smallest in all cases except the first type.From these results, it is clear that the reconstruction quality by GRNMF SC is robust to gender and different facial expressions.GNMF and FMD-NMF are next     4 shows the occluded face images.Figure 5 shows the recognition accuracies versus different feature dimensions with occluding patch size of 10 × 10, 15 × 15, 20 × 20, and 25 × 25 pixels on the ORL database, respectively.Table 6 presents optimal average recognition rates on occluded ORL database with different size of sheltering patch.From Figure 5 and Table 6, we can see that S-GRNMF SC performs much better than other algorithms; there is about 5% to 20% improvement.The significant improvements are owing to the sparseness and manifold constraints in S-GRNMF SC.Specifically, S-GRNMF SC can obtain sparser representation than FMD-NMF; furthermore S-GRNMF SC maintains more local information than GNMF, and more discriminant and geometrical information than standard NMF, LNMF,   and SNMF.The occluding experiment strongly supports the necessity of imposing sparseness and manifold constraint simultaneously.

Clustering Experiment on COIL20 Database.
In this subsection, we conducted clustering experiment on COIL20 image library.In order to randomize the experiments, we evaluate the clustering performance with different number of clusters ( = 6, 8, . . ., 18,20).For each given cluster number  (except 20), 10 tests were conducted on different randomly chosen classes; then the average performance as well as the standard deviation was computed over these 10 tests.About the parameter configuration, we report the four matrix factorization based methods (NMF, SNMF, GNMF, and GRNMF SC) with the number of basis vectors equal to the number of clusters.There are three parameters in GRNMF SC algorithm: the number of nearest neighbors  in Graph  and the regularization parameters  and .We empirically set  = 4,  = 0.1, and  = 10.
The clustering result is evaluated by comparing the obtained label of each sample with the label provided by the dataset.Two metrics, the accuracy (AC) and the normalized mutual information metric (NMI), are used to measure the clustering performance.Please see [10,21] for detailed definitions of these two metrics.Table 7 shows the clustering results on the COIL20; the mean and standard error of the performance are reported in the table.
As shown in Table 7, our GRNMF SC always results in the best performances in all the cases.GRNMF SC aims to enable the learned basis to preserve the intrinsic manifold structure of original data and meanwhile ensure the sparseness of new representations under the basis.The above properties guarantee each image data point can be represented by linear combination of only few key basis vectors, which makes our GRNMF SC particularly suitable for image clustering.Consequently, by simply using -means on the low-dimensional sparse representation, GRNMF SC achieves very impressive clustering performance.In addition, GNMF gets better clustering results compared to SNMF and NMF, while -means performs the worst among the involved methods.

Conclusions and Future Work
In this paper, GRNMF SC is proposed, which combines the advantages of manifold learning and sparse coding.GRNMF SC explicitly adds the sparseness constraint on the coefficient matrix V  , which naturally leads to a sparse representation.The target function of our method is easy to propose, while the solving procedures are really nontrivial; in the paper we gave the detailed derivation of solving the target function and also a strict proof of its convergence, which is a key contribution of the paper.We implement GRNMF SC in both supervised (S-GRNMF SC) and unsupervised versions.S-GRNMF SC increases the recognition rates when comparing with standard NMF, SNMF, LNMF, GNMF, and FMD-NMF, especially in occluded face recognition.Unsupervised GRNMF SC also improves the clustering performance compared to recent popular clustering algorithms.
It should be noted that, because the proposed GRNMF SC method has a fully multiplicative update rules and two parameters, one should carefully select the two parameters to strike a balance between the weight of sparseness and discrimination according to the demands of real applications.
The future work may continuously investigate different NMF variants and find out the general rules.One also needs to use the advanced NMF method [25] to solve the problems in other real applications.For example, in [26], researchers successfully used the NMF algorithm solving the problem in neurorehabilitation engineering.Although NMF algorithm has emerged for several years, it is still a very important researching direction in the next few years.

(a) 5 ×
5 and 7 × 7 basis images of NMF on ORL and YALE database (b) 5 × 5 and 7 × 7 basis images of LNMF on ORL and YALE database (c) 5 × 5 and 7 × 7 basis images of GNMF on ORL and YALE database (d) 5 × 5 and 7 × 7 basis images of FMD-NMF on ORL and YALE database (e) 5 × 5 and 7 × 7 basis images of S-GRNMF SC on ORL and YALE database

Figure 3 :
Figure 3: Reconstruction effects by NMF, SNMF, LNMF, GNMF, S-GRNMF SC, and FMD-NMF.From top to the bottom are MFV NFE, MFV WG, MLV SFE, and WFV NFE images.Leftmost are the original face images and from second to the last columns are reconstruction effects.

Figure 5 :
Figure 5: Recognition accuracies versus different feature dimensions on ORL database with different occluding patch size.

Table 1 :
Optimal average recognition rates ±  2 on ORL database with different number of training samples of each person (dimensions = 50).

Table 2 :
Optimal average recognition rates ±  2 on YALE database with different number of training samples of each person (dimensions = 50).

Table 5 :
Reconstruction residual error of NMF, LNMF, GNMF, FMD-NMF, and GRNMF SC with 4 kinds of selected images on the ORL.In order to highlight the superior advantage of S-GRNMF SC against other NMF algorithms in the occluded face database, we give the experimental results in what follows.The size of sheltering patch is from 10 × 10, 15 × 15, 20 × 20, and 25 × 25 pixels, respectively, and the position is randomly selected.Figure

Table 6 :
Optimal average recognition rates ±  2 on occluded ORL database with different size of sheltering patch (dimensions = 50).