Nonnegative matrix factorization (NMF) has been successfully applied in signal processing as a simple two-layer nonnegative neural network. Projective NMF (PNMF) with fewer parameters was proposed, which projects a high-dimensional nonnegative data onto a lower-dimensional nonnegative subspace. Although PNMF overcomes the problem of out-of-sample of NMF, it does not consider the nonlinear characteristic of data and is only a kind of narrow signal decomposition method. In this paper, we combine the PNMF with deep learning and nonlinear fitting to propose a bidirectional nonnegative deep learning (BNDL) model and its optimization learning algorithm, which can obtain nonlinear multilayer deep nonnegative feature representation. Experiments show that the proposed model can not only solve the problem of out-of-sample of NMF but also learn hierarchical nonnegative feature representations with better clustering performance than classical NMF, PNMF, and Deep Semi-NMF algorithms.
In the study of machine learning, pattern recognition, computer vision, and image processing, it is an important problem to find the effective representations of the input data matrix with nonnegative elements and very high dimensions. In 1999, Lee and Seung had proposed a classical feature representation method, named nonnegative matrix factorization (NMF) [
Given a nonnegative data matrix
Although the NMF is optimal for learning the parts of objects, it suffers from the out-of-sample problem [
The PNMF has fewer parameters than the NMF, and it is widely used in linear dimension reduction and can solve the problem about out-of-sample deficiency. Being the same with the NMF, the PNMF is a linear dimensionality reduction method, but many data present the nonlinear characteristics [
For obtaining the deep nonnegative feature representation, Trigeorgis et al. applied the concept of Semi-NMF [
Although the Deep Semi-NMF uses a multilayer model to obtain more features, it can only deal with seminonnegative data, which is a linear transformation with weak representation capacity. Moreover, the Deep Semi-NMF model still has the out-of-sample problem.
Based on the above analysis, the PNMF only computes one projection matrix and it cannot learn more rich features, especially when the data are a nonlinear or near a nonlinear manifold, or the data are hierarchically generated. Motivated by the ideas of the PNMF, the Deep Semi-NMF, and deep learning (especially, AutoEncoder [
The particular attraction of the NMF alspongorithm is the nonnegative constraints, and it is useful for data representation in clustering. But the NMF is a simple linear coding algorithm using a single layer network with nonnegative constraints, and it suffers from the out-of-sample deficiency which cannot directly obtain the codes of any new coming examples [
To the PNMF algorithm, it uses the transpose matrix of the learned basis matrix as the projection matrix, which obtains nonnegative coefficients for any new coming examples [
On the other hand, the current existing deep network models rarely consider the nonnegative constraints, even if the newest related model Deep Semi-NMF [
In this paper, we propose a nonnegative hierarchical data representation model, named bidirectional nonnegative deep learning (BNDL) model, which applies the concept of PNMF to train an initial multilayer nonlinear structure that is able to learn hidden complete deep representations of the original data.
Different from the other deep architectures, the BNDL firstly constructs a pretraining deep network through stacking every nonnegative two layers network independently to get the whole network, and the learning process of each layer is to combine the PNMF and a designed nonlinear mapping. That is to say that each time we do one-step decomposition, then the basis matrix of two-layer BNDL can be regarded as the weight matrix of the deep network, and the output of this step can be used as the input of the next layer by a Sigmoid function. Upwards, iterating this process, we can get a deep network. Downwards, we can reconstruct the original sample data. Because BNDL only learn one layer in each step, we can fast build a deep network. The hierarchical feature extraction strategy learns more meaningful, helpful features and higher-order nonnegative nonlinear characteristics than one-step learning. Finally, a fine-tune training is applied to improve the reconstruction performance and deep features of our deep network under the nonnegative weight value constraints.
Let
Due to preserving the same S-shape nonlinear mapping function of the top-to-bottom operator, the top-to-bottom reconstruction basis
So the two-layer structure for constructing BNDL can be illustrated in Figure
The two-layer binetwork structure for BNDL.
For the simplicity of initializing the bottom-to-up weights
Obviously,
According to minimizing (
When fixing the
The minimization of (
Under the nonnegative condition constraints
Based on the KKT conditions [
To (
So we can get a Multiplicative Update Rule (MUR) from (
Similarly, the MUR [
Moreover, by referring the proof of Theorem 1 in the literature [
The Euclidean metric
And there must exist the optimal
After decomposing the input
For minimizing the square objective function
If the Sigmoid transfer function for nonnegative neural networks is the hyperbolic tangent function
For obtaining the stability in our deep model, the input signals are normalized into the interval
If the transfer function for nonnegative neural networks is the ReLU function at
Though the weight matrices of all layers are learned, they are only efficient in each two-layer network, and they are not optimal for the whole network. The weights in higher layers may be not optimal for the lower layers [
The unrolling illustration of BNDL.
Summarizing the above analysis, unrolling each two-layer nonnegative network, the bottom-to-top weights
The main steps of the BNDL algorithm are described in Algorithm
Learning algorithm for BNDL includes the following steps.
Initialize the first layer Repeatedly update the reconstructed weights Modify the top-to-bottom reconstruction weight
In this section, we carry out some experiments to verify the validity of BNDL on three datasets including COIL20, COIL100, and CMU PIE, as shown in Table
The characteristics of the datasets.
Datasets | Dimension ( | Total number ( | Number of classes |
---|---|---|---|
COIL20 | 1024 | 1440 | 20 |
COIL100 | 1024 | 7200 | 100 |
CMU PIE | 1024 | 2856 | 68 |
COIL20 dataset [
COIL100 dataset [
CMU PIE [
The related literatures have demonstrated that NMF method has good performance for clustering, especially in the image clustering task. So we mainly do experiments about clustering by comparing the most related methods including NMF [
In this subsection, “2nd layer” expresses the second layer of the deep network matrix decomposition; “rec. 1st layer” expresses the first reconstruction layer of the deep network matrix reconstruction, and so on.
In the clustering experiment, we use the feature from the all learned layers and reconstruction layers of BNDL to compare with other methods. All algorithms use the same iterations (5000 times), the same initialization method (random initialization), and the same termination conditions (the error is less than 10−6). The clustering performance based on AC and NMI is shown in Table
Performance comparison on different datasets.
Algorithms | COIL20 | COIL100 | CMU PIE | |||
---|---|---|---|---|---|---|
AC | NMI | AC | NMI | AC | NMI | |
NMF | 0.6069 | 0.7216 | 0.4483 | 0.7236 | 0.4321 | 0.7197 |
PNMF | 0.6340 | 0.7356 | 0.4701 | 0.7409 | 0.1796 | 0.4326 |
Deep Semi-NMF 2nd layer | 0.2889 | 0.4124 | 0.3114 | 0.5502 | 0.6218 | 0.7868 |
Deep Semi-NMF 3rd layer | 0.3688 | 0.4304 | 0.4064 | 0.6695 | 0.6047 | 0.7720 |
Deep Semi-NMF 4th layer | 0.4375 | 0.5131 | 0.4065 | 0.6367 | 0.5018 | 0.7313 |
Deep Semi-NMF 5th layer | 0.6063 | 0.7141 | 0.4018 | 0.6674 | 0.8074 | 0.9402 |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
From the results of Table Compared with the single layer matrix decomposition network, BNDL learns more rich features while the clustering effect is not reduced. Compared with the deep network matrix decomposition network, each layer of BNDL has better clustering effect. Each layer of BNDL has a stable clustering performance, which is better for the data representation and downward transmission. From the results on COIL20 and COIL100, we can see that BNDL has a better clustering performance for large scale samples, which is more conducive to the characteristic expression of complex data. Experimental results on CMU PIE face images are disappointed in a degree. Compared with our model, NMF and Deep Semi-NMF suffer from the out-of-sample problem. But both our BNDL model and the classical PNMF can solve the problem about out-of-sample deficiency. Moreover, our BNDL obtains the better performance than the PNMF on CMU PIE face dataset. In addition, face clustering and classification usually get the better results by using cosine distance metric, so the graph regularization with the sine similarity is introduced to improve the BNDL in the future.
The reconstruction performance of BNDL is also excellent. In order to compare the reconstruction performance of each algorithm, here we compare the clustering performance of reconstruction data. Table
Performance comparison of reconstruction data on different datasets.
Algorithms | COIL20 | COIL100 | CMU PIE | |||
---|---|---|---|---|---|---|
AC | NMI | AC | NMI | AC | NMI | |
NMF rec. | 0.6267 | 0.7490 | 0.4468 | 0.7200 | 0.4391 | 0.7569 |
PNMF rec. | 0.6528 | 0.7630 | 0.4686 | 0.7373 | 0.1866 | 0.4698 |
Deep Semi-NMF rec. 1st layer | 0.5563 | 0.6304 | 0.3689 | 0.5783 | 0.8277 | 0.9506 |
Deep Semi-NMF rec. 2nd layer | 0.5535 | 0.6590 | 0.4125 | 0.6471 | 0.7962 | 0.9156 |
Deep Semi-NMF rec. 3rd layer | 0.6708 | 0.7263 | 0.4169 | 0.6568 | 0.8319 | 0.9447 |
Deep Semi-NMF rec. 4th layer | 0.5271 | 0.5701 | 0.3114 | 0.5262 | 0.5963 | 0.7627 |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
The reason why BNDL has better reconstruction results is that it connects each layer into a deep learning network after decomposing all layers and fine-tuning the whole network by improved BP algorithm to reduce reconstruction error to minimum. To verify the effect of fine-tuning to the whole network, we compare the difference of before fine-tuning and after fine-tuning, as shown in Figures
NMI of before and after fine-tuning of BNDL on COIL20.
NMI of before and after fine-tuning of BNDL on COIL100.
This paper proposes a bidirectional nonnegative deep learning model to obtain effective feature representation, which can automatically learn a deep hierarchy with nonlinear and nonnegative feature representations via inputting a given nonnegative dataset, and such representations are demonstrated to be suited for clustering. About the PNMF, it is a linear dimensionality reduction and feature representation method, and it only factorizes the original data in one step. And the Deep Semi-NMF can construct a multitime linear factorization to learn more features, but it is still a linear dimensionality reduction method and not an absolutely nonnegative treatment. Our BNDL model combines the advantages of PNMF and deep belief networks under the inspiration of the Deep Semi-NMF and overcomes the above-mentioned shortcomings. At the same time, we designed an effective learning algorithm for optimizing the corresponding parameters of our BNDL model. Lastly, we show its better clustering performance compared with the single-layered NMF, PNMF, and Deep Semi-NMF to a degree. In addition, our method avoids the out-of-sample problem and negative feature representation in the different motivation with Deep Semi-NMF.
The authors declare that they have no competing interests.
This work was supported by the National Natural Science Foundation of China (Grant nos. 61672120, 61379114) and the Chongqing Natural Science Foundation Program (Grant no. cstc2015jcyjA40036).