The convex nonnegative matrix factorization (CNMF) is a variation of nonnegative matrix factorization (NMF) in which each cluster is expressed by a linear combination of the data points and each data point is represented by a linear combination of the cluster centers. When there exists nonlinearity in the manifold structure, both NMF and CNMF are incapable of characterizing the geometric structure of the data. This paper introduces a neighborhood preserving convex nonnegative matrix factorization (NPCNMF), which imposes an additional constraint on CNMF that each data point can be represented as a linear combination of its neighbors. Thus our method is able to reap the benefits of both nonnegative data factorization and the purpose of manifold structure. An efficient multiplicative updating procedure is produced, and its convergence is guaranteed theoretically. The feasibility and effectiveness of NPCNMF are verified on several standard data sets with promising results.
This nonnegative matrix factorization (NMF) [
One of the most important drawbacks of NMF and its variants is the fact that these methods have to be performed in the original feature space of the data points, so that it can not be kernelized and the powerful idea of the kernel method cannot be applied to NMF. Ding et al. [
Recently, there has been a lot of interest in geometrically motivated approaches to data analysis in high dimensional spaces. When the data lives on or close to a nonlinear low dimensional manifold which is embedded in the high dimensional ambient space [
In this paper, we introduce a novel matrix factorization algorithm, called neighborhood preserving convex nonnegative matrix factorization (NPCNMF) which is based on the assumption that if a data point can be reconstructed from its neighbors in the input space, then it can be reconstructed from its neighbors by the same reconstruction coefficients in the low dimensional subspace, that is, local linear embedding assumption [
The rest of this paper is organized as follows. In Section
Nonnegative matrix factorization (NMF) factorizes the data matrix into one nonnegative basis matrix and one nonnegative coefficient matrix. Given a nonnegative data
The objective function is joint optimization problem of basis matrix
It is proved that the above updated steps will find a local minimum of the objective function in (
In reality, we have
Equation (
In this section, we introduce our neighborhood preserving convex nonnegative matrix factorization method, which takes the local linear embedding constraint as an additional requirement. The method presented in this paper is fundamentally motivated from the neighborhood preserving embedding.
Many real world data are actually sampled from a nonlinear low dimensional manifold which is embedded in the high dimensional ambient space. Both NMF and CF perform the factorization in the Euclidean space. They fail to discover the local geometrical structure of the data space, which is essential to the clustering problem. NPE aims at preserving the local manifold structure. Specifically, for each data point, it is represented as a linear combination of the neighboring data points and the combination coefficients are specified in the weight matrix. We can find an optimal embedding such that the combination coefficients can be preserved in the low dimensional subspace.
For each data point, we find its
Then
With the neighborhood preserving constraint, CNMF incorporates (
We introduce an iterative algorithm to find a local minimum for the optimization problem. By defining
This is a typical constrained optimization problem and can be solved using the Lagrange multiplier method. Let
The partial derivatives of
Using the Karush-Kuhn-Tucker conditions
The corresponding equivalent formulas are as follows:
Introduce
The equations lead to the following updating formulas:
Note that the solution to minimizing the criterion function
In this section, we will investigate the convergence of the updating formula in (
If
Consider
For any nonnegative matrices
The correctness and convergence of the algorithm are addressed in the following.
For given
One rewrites
Then the following function
From its minima and setting
The function
We find upper bounds for each of the two positive terms and lower bounds for each of the two negative terms. For the third term in
The second term of
To obtain lower bounds for the two remaining terms, we use the inequality
The last term in
Collecting all bounds, we obtain
To find the minimum of
To find the minimum of
Thus,
Updating
By Lemma
For given
One rewrites
Then the following function
From its minima and setting
The function
We find upper bounds for each of the three positive terms and lower bounds for each of the three negative terms. For the third term in
The second term of
For the fifth term in
To obtain lower bounds for the three remaining terms, we use the inequality
The fourth term in
The last term in
Collecting all bounds, we obtain
To find the minimum of
We have
Therefore
The Hessian matrix containing the second derivatives
Thus,
Updating
By Lemma
In this section, we show the performance of the proposed method on face recognition and compare our proposed method with the popular subspace learning algorithms: four unsupervised ones which are principal component analysis [
The experiments are used on three data sets. One is Cambridge ORL face database, the other is the Yale database, and the third one is the CMU PIE face database. The important statistics of these data sets are described below.
The Yale database contains 165 gray scale images of 15 individuals. All images demonstrate variations in lighting condition (left-light, center-light, right-light), facial expression (normal, happy, sad, sleepy, surprised, and wink), and with/without glasses.
The ORL database contains ten different images of each of 40 distinct subjects, thus 400 images in total. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glasses/no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).
The CMU PIE face database contains more than 40 000 facial images of 68 people. The images were acquired over different poses, under variable illumination conditions, and with different facial expressions. In our experiment, we choose the images from the frontal pose (C27) and each subject has around 49 images from varying illuminations and facial expressions.
In all the experiments, images are preprocessed so that faces are located. Original images are first normalized in scale and orientation such that the two eyes are aligned at the same position. Then the facial areas were cropped into the final images for clustering. Each image is of
For each data set, we randomly divide it into training and testing sets, and evaluate the recognition accuracy on the testing set. In detail, for each individual in the ORL and Yale data sets,we randomly select 2, 3, and 4 images per individual, respectively, for training samples, and the remaining for test samples, while for each individual in the PIE data set, we randomly select 5, 10, and 20 images per individual for training samples. For each partition, we repeated each experiment 20 times and calculated the average recognition accuracy. In general, the recognition rate varies with the dimension of the face subspace. The best result obtained in the optimal subspace and the corresponding dimensionality for each method are shown.
For the face recognition experiments, several parameters need to be decided beforehand. For LDA, we use PCA as a first step dimensionality reduction algorithm to avoid the singularity problem. The dimension of the PCA step is fixed as
Each testing sample
Tables It is clear that the use of dimensionality reduction is beneficial in face recognition. There is a significant increase in performance from using LDA, NPE, NMF, LNMF, and CNMF. However, PCA fails to gain improvement over the baseline. This is because that PCA does not encode the discriminative information. The performances of nonnegative algorithms NMF, LNMF, and CNMF are much worse than supervised algorithms LDA, which shows that without considering the labeled data, nonnegative algorithms could not guarantee good discriminating power. Our NPCNMF algorithm outperforms all other five methods. The reason lies in the fact that NPCNMF considers the geometrical structure of the data and achieves better performance than the other algorithms. This shows that by leveraging the power of both the parts-based representation and the intrinsic geometrical structure of the data, NPCNMF can learn a better compact representation in the sense of semantic structure.
Face recognition accuracy on the ORL data set. The number in brackets is the corresponding projection dimensionality.
Method | 2 Train | 3 Train | 4 Train |
---|---|---|---|
Baseline | 69.32% | 77.56% | 83.48% |
PCA | 69.32% (79) | 77.56% (118) | 83.48% (152) |
LDA | 72.80% (25) | 83.79% (39) | 90.13% (39) |
NPE | 73.19% (36) | 84.29% (54) | 91.06% (73) |
NMF | 70.87% (97) | 78.98% (81) | 84.48% (95) |
LNMF | 71.73% (178) | 81.09% (168) | 86.31% (195) |
CNMF | 72.23% (138) | 83.58% (143) | 89.56% (111) |
NCPNMF | 77.31% (143) | 86.73% (153) | 93.35% (145) |
Face recognition accuracy on the Yale data set. The number in brackets is the corresponding projection dimensionality.
Method | 2 Train | 3 Train | 4 Train |
---|---|---|---|
Baseline | 46.04% | 49.96% | 55.62% |
PCA | 46.04% (29) | 49.96% (44) | 55.62% (58) |
LDA | 42.81% (11) | 60.33% (14) | 68.10% (13) |
NPE | 48.19% (13) | 62.00% (19) | 69.00% (73) |
NMF | 44.11% (112) | 49.00% (195) | 52.19% (164) |
LNMF | 44.00% (157) | 48.84% (198) | 53.57% (197) |
CNMF | 49.72% (125) | 59.50% (168) | 65.77% (129) |
NPCNMF | 63.45% (124) | 71.83% (148) | 81.38% (153) |
Face recognition accuracy on the PIE data set. The number in brackets is the corresponding projection dimensionality.
Method | 5 Train | 10 Train | 20 Train |
---|---|---|---|
Baseline | 43.02% | 62.90% | 83.19% |
PCA | 42.87% (199) | 62.51% (195) | 82.84% (200) |
LDA | 84.39% (67) | 90.47% (67) | 93.98% (67) |
NPE | 84.71% (166) | 91.48% (200) | 94.33% (200) |
NMF | 78.66% (200) | 88.98% (200) | 92.52% (200) |
LNMF | 76.47% (200) | 87.91% (200) | 92.61% (196) |
CNMF | 83.72% (176) | 90.89% (187) | 93.78% (159) |
NPCNMF | 88.43% (147) | 94.86% (158) | 98.58% (133) |
In this paper, we have presented a novel matrix factorization method called NPCNMF for dimensionality reduction, which respects the local geometric structure. As a result, NPCNMF can discriminate power more than the ordinary NMF and CNMF approaches which only consider the Euclidean structure of the data. Experimental results on face datasets show that NPCNMF provides better representation in the sense of semantic structure.
Several challenges remain to be investigated in our future work. A suitable value of NPCNMF is currently limited to the linear projections, and those nonlinear techniques (e.g., kernel tricks) may further boost the algorithmic performance. We will investigate it in our future work. Another further research direction is how to extend the current framework for tensor-based nonnegative data decomposition. NPCNMF algorithm is iterative and sensitive to the initialization of
The authors declare that there is no conflict of interests regarding the publication of this paper.