Incremental Nonnegative Matrix Factorization for Face Recognition

Nonnegative matrix factorization NMF is a promising approach for local feature extraction in face recognition tasks. However, there are two major drawbacks in almost all existing NMFbased methods. One shortcoming is that the computational cost is expensive for large matrix decomposition. The other is that it must conduct repetitive learning, when the training samples or classes are updated. To overcome these two limitations, this paper proposes a novel incremental nonnegative matrix factorization INMF for face representation and recognition. The proposed INMF approach is based on a novel constraint criterion and our previous block strategy. It thus has some good properties, such as low computational complexity, sparse coefficient matrix. Also, the coefficient column vectors between different classes are orthogonal. In particular, it can be applied to incremental learning. Two face databases, namely FERET and CMU PIE face databases, are selected for evaluation. Compared with PCA and some state-of-the-art NMF-based methods, our INMF approach gives the best performance.


Introduction
Face recognition has been one of the most challenging problems in computer science and information technology since 1990 1, 2 .The approaches of face recognition can be mainly categorized into two groups, namely geometric feature-based and appearance-based 3 .The geometric features are based on the short range phenomena of face images such as eyes, eyebrows, nose, and mouth.The facial local features are learnt to form a face geometric feature vector for face recognition.The appearance-based approach relies on the global facial features, which generate an entire facial feature vector for face classification.Nonnegative matrix factorization NMF 4, 5 belongs to geometric feature-based category, while principle component analysis PCA 6 is based on the whole facial features.Both NMF and PCA are unsupervised learning methods for face recognition.The basic ideas of these two approaches are to find the basis images using different criterions.All face images can be reconstructed by the basis images.The basis images of PCA are called eigenfaces, which are the eigenvectors corresponding to large eigenvalues of total scatter matrix.NMF aims to perform nonnegative matrix decomposition on the training image matrix V such that V ≈ WH, where W and H are the basis image matrix and the coefficient matrix, respectively.The local image features are learnt and contained in W as column vectors.Follow the success of applying NMF in learning the parts of objects 4 , many researchers have conducted indepth investigation on NMF and different NMF-based approaches have been developed 7-19 .Li et al. proposed a local NMF method 7 by adding some spatial constraints.Wild et al. 8 utilized spherical K-means clustering to produce a structured initialization for NMF.Buciu and Pitas 9 presented a DNMF method for learning facial expressions in a supervised manner.However, DNMF does not guarantee convergence to a stationary limit point.Kotsia et al. 15 thus presented a modified DNMF method using projected gradients.Some similar supervised methods incorporated into NMF were developed to enhance the classification power of NMF 11-13, 19 .Hoyer 10 added sparseness constraints to NMF to find solutions with desired degrees of sparseness.Lin 16, 17 modified traditional NMF updates using projected gradient method and discussed their convergences.Recently, Zhang et al. 18 proposed a topology structure preservation constraint in NMF to improve the NMF performance.
However, to the best of our knowledge, almost all existing NMF-based approaches encounter two major problems, namely time-consuming problem and incremental learning problem.In most cases, the training image matrix V is very large and it leads to expensive computational cost for NMF-based schemes.Also, when the training samples or classes are updated, NMF must implement repetitive learning.These drawbacks greatly restrict the practical applications of NMF-based methods to face recognition.To avoid the above two problems, this paper, motivated by our previous work on incremental learning 19 , proposes a supervised incremental NMF INMF approach under a novel constraint NMF criterion, which aims to cluster within class samples tightly and augment the betweenclass distance simultaneously.Our incremental strategy utilizes the supervised local features, which are considered as the short-range phenomena of face images, for face classifications.Two public available face databases, namely FERET and CMU PIE face databases, are selected for evaluation.Experimental results show that our INMF method outperforms PCA 6 , NMF 4 , and BNMF 19 approaches in both nonincremental learning and incremental learning of face recognition.
The rest of this paper is organized as follows: Section 2 briefly reviews the related works.Theoretical analysis and INMF algorithm design are given in Section 3. Experimental results are reported in Section 4. Finally, Section 5 draws the conclusions.

Related work
This section briefly introduces PCA 6 , NMF 4 , and BNMF 19 methods.Details are as follows.

PCA
Principal component analysis PCA , also called eigenface method, is a popular statistic appearance-based linear method for dimensionality reduction in face recognition.The theory used in PCA is based on Karhunen-Loeve transform.It performs the eigenvalue decomposition on the total scatter matrix S t and then selects the large principal components eigenfaces to account for most distributions.All face images can be expressed by the linear combinations of these basis images eigenfaces .However, PCA is not able to exploit all of the feature classification information and how to choose the principal component elements is still a problem.Therefore, PCA cannot give satisfactory performance in pattern recognition tasks.

NMF
NMF aims to find nonnegative matrices W and H such that where matrix V is also a nonnegative matrix generated by total n training images.Each column of W is called basis image, while H is the coefficient matrix.The basis number r is usually chosen less than n for dimensionality reduction.The divergence between V and WH is defined as NMF 2.1 is equivalent to the following optimization problem: The minimization problem 2.3 can be solved using the following iterative formulae, which converge to a local minimum:

BNMF
The basic idea of BNMF is to perform NMF on c small matrices V i ∈ R m×n 0 i 1, 2, . . ., c , namely where V i contains n 0 training images of the ith class, and c is the number of classes.BNMF is yielded from 2.5 as follows: where . ., H c , and n cn 0 is the total number of training images.

Proposed INMF
To overcome the drawbacks of existing NMF-based methods, this section proposes a novel incremental NMF INMF approach, which is based on a new constraint NMF criterion and our previous block technique 19 .Details are discussed below.

Constraint NMF criterion
The objective of our INMF is to impose supervised class information on NMF such that between-class distances increase, while the within-class distances simultaneously decrease.To this end, we define the within-class scatter matrix S i w of the ith coefficient matrix where is the mean column vector of the ith class.The within-class samples of the kth class will cluster tightly as tr S k w becomes small.Assume U i is an enlarging vector of U i , that is, U i 1 t U i with t > 0. Then we have Inequality 3.2 implies that between-class distances are increased as the mean vectors of classes in H are enlarged.
Based on above analysis, we define a constraint divergence criterion function for the kth class as follows: where parameters α, β > 0 and k 1, 2, . . ., c.Our entire INMF criterion function is then designed as below: Based on criterion 3.4 , the following constraint NMF CNMF update rules 3.5 -3.7 will be derived in the next subsection.We can show that the iterative formulae 3.5 -3.7 converge to a local minimum as well: So, our entire INMF is performed as follows: where 3.9

Convergence of proposed constraint NMF
This subsection reports how to derive the iterative formulae 3.5 -3.7 and discusses their convergences under constraint NMF criterion 3.3 .
where Q, Q are matrices with the same size.

3.11
To obtain iterative rule 3.7 and prove its convergence, one first constructs an auxiliary function for F with fixed W.

3.12
Proof.It can be directly verified that To this end, we will use the convex function y log x.For all i, j, and l σ ijl 1, it holds that k lj into the above inequality, we have 3.15 From the above equation, it directly induces the iterative formula 3.7 , and lemma 3.2 demonstrates that 3.7 converges to a local minimum.For update rule 3.5 -3.6 , the proof is similar to that of update rule 3.7 using the following auxiliary function with fixed H: 3.16

Incremental learning
From the above analysis, our incremental learning algorithm is designed as follows: i Sample incremental learning.As a new training sample x 0 of the ith class is added to training set, we denote that V i V i , x 0 .Thus the training image matrix becomes

3.17
In this case, it only needs to perform CNMF on matrix V i , that is, The rest decompositions such as V k CNMF ≈ W k H k k / i need not implement repetitive computation.So, sample incremental learning can be performed as follows: ii Class incremental learning.As a new class, denoted by matrix V c 1 , is added to the current training set, it forms a new training image matrix as

3.19
The incremental learning settings are similar to the first item i that all decompositions . ., c need not compute again.We only need perform CNMF Hence, class incremental learning can be implemented as below: 3.20

INMF algorithm design
Based on the above discussions, this subsection will give a detail design on our INMF algorithm for face recognition.The algorithm involves two stages, namely training stage and testing stage.Details are as follows.

3.21
Step 2. INMF is obtained as where r cr 0 , n cn 0 , and

3.23
If there is a new training sample or class added to current training set, then the incremental learning algorithm presented in Section 3.4 is applied to this stage.

Recognition stage
Step 3. Calculate the coordinates of a testing sample v in the feature space span{W 1 , W 2 , . . ., W r } by h W v, where W is the Moore-Penrose inverse of W.
Step 4. Compute the mean column vector v i of class i and its coordinates vector h i W v i i 1, 2, . . ., c .The testing image v is classified to class k, if d h, h k min 1≤ i ≤c d h, h i , where d h, h i denotes the Euclidean distance between vectors h and h i .

Sparseness of coefficient matrix H
Let h ∈ R n , define sparseness function with L 1 and L 2 norms 7 by It can be seen that sparseness function f sparse : R n →R with range 0, 1 .
For INMF method, we have the following theorem for each column h i of H.

Theorem 3.4. Sparseness of each column h i of H in INMF has the following estimation:
where h i belongs to class i in H. Obviously,

3.29
It concludes for r cr 0 that

3.30
In the experimental section, the parameters are selected as r 0 4 and c 120 using INMF on FERET database.It can be calculated that 0.9522 ≤ f sparse h i ≤ 1.

3.31
While on CMU PIE database, we select r 0 4 and c 68 and calculate that 0.9355 ≤ f sparse h i ≤ 1.

3.32
These demonstrate that each column of H in INMF is highly sparse.Apparently, the coefficient column vectors between different classes in H are automatically orthogonal.

Computational complexity
This section discusses the computational complexity of our proposed INMF approach.The ith iterative procedure of proposed INMF includes two parts, namely W i and H i .For each matrix V i the iteration for W i needs mr 0 n 0 r 0 2n 0 2 multiple times.While for H i , it needs n 0 r 0 mr 0 2m 10 multiple times.Therefore, the total running multiple times of our INMF are

3.33
Similar to INMF, we can obtain the running multiple times of NMF approach as T NMF 2mnr 2 4mnr 2mr 2nr.It can be seen that the computational complexity of our INMF method is greatly lower than that of NMF.

Experimental results
In this section, FERET and CMU PIE databases are selected to evaluate the performance of our INMF method along with BNMF, NMF, and PCA methods.All images in two databases are aligned by the centers of eyes and mouth and then normalized with resolution 112 × 92.The original images with resolution 112 × 92 are reduced to wavelet feature face with resolution 30 × 25 after two-level D4 wavelet decomposition.If there are negative pixels in the wavelet faces, we will transform them into nonnegative faces with simple translations.The nearest neighbor classifier using Euclidean distance is exploited here.In the following experiments, the parameters are set to r 120 for NMF, r 0 4 for BNMF and INMF, α 10 −4 , β 10 −3 for INMF.The stopping condition of iterative update is where F n is the nth update criterion function defined in 3.3 , the threshold δ is set to 10 −12 .We stop the iteration if stopping condition 4.1 is met or if exceeding 1000 times iteration.

Face databases
In FERET database, we select 120 people, 6 images for each individual.The six images are extracted from 4 different sets, namely Fa, Fb, Fc, and duplicate.Fa and Fb are sets of images taken with the same camera at the same day but with different facial expressions.Fc is a set of images taken with different camera at the same day.Duplicate is a set of images taken around 6-12 months after the day taking the Fa and Fb photos.Details of the characteristics of each set can be found in 3 .Images from one individual are shown in Figure 1.CMU PIE database includes totally 68 people.There are 13 pose variations ranging from full right-profile image to full left-profile image and 43 different lighting conditions, 21 flashes with ambient light on or off.In our experiment, for each person, we select 56 images including 13 poses with neutral expression and 43 different lighting conditions in frontal view.Part images of one person are shown in Figure 2.

Basis face images
This section shows the basis images of the training set learnt by PCA, NMF, BNMF, and INMF approaches.Figure 3 shows 25 basis images of each approach on CMU PIE database.It can be seen that the bases of all methods are additive except for PCA.PCA extracts the holistic facial features.INMF learns more local features than NMF and BNMF.Moreover, the greater number of basis image is, the more localization is learnt in all NMF-based approaches.

Results on FERET database
This section reports the experimental results with nonincremental learning and incremental learning on FERET database.All methods use the same training and testing face images.The experiments are repeated 10 times; and the average accuracies under different training number, along with the mean running times, are recorded.

Nonincremental learning
We randomly select n n 2, 3, 4, 5 images from each person for training, while the rest of 6− n images of each individual for testing.The average accuracies of training samples ranging from 2 to 5 are recorded in Table 1 and plotted in Figure 4 a .The recognition accuracies of INMF, BNMF, NMF, and PCA are 66.73%, 66.07%, 64.44%, and 34.33%, respectively, with 2 training images.The performance for each method is improved when the number of training images increases.When the number of training images is equal to 5, the recognition accuracies of INMF, BNMF, NMF, and PCA are 83.08%,81.67%, 80.25%, and 37.58%, respectively.In addition, Table 2 gives the comparisons on average time-consuming in three NMF-based approaches.It can be seen that our INMF method gives the best performance for all cases of nonincremental learning on FERET database.

Class incremental learning
For 119 people, we randomly select 3 images from each individual for training and then add a new class to the training set.NMF must conduct repeated learning while BNMF and INMF need merely perform incremental training on the new added class.The average accuracies and the mean running times are recorded in Table 3 plotted in Figure 6 a and Table 4, respectively.Compared with the NMF and BNMF approaches, the proposed method gives around 5% and 1.5% accuracy improvements, respectively.The running time of INMF is around 2 times and 219 times faster than that of NMF with 119 and 120 individuals for training and class-incremental learning, respectively.Above all, our INMF gives the best performance on FERET database.

Results on CMU PIE database
The experimental setting on CMU PIE database is similar to that of FERET database.It also includes two parts, namely nonincremental training and incremental learning.The   experiments are repeated 10 times and the average accuracies under different training number, along with the mean running times, are recorded for comparisons.Details are as follows.

Figure 1 :
Figure 1: Images of one person from FERET database.

Figure 2 :
Figure 2: Part images of one person from CMU PIE database.

Figure 3 :
Figure 3: Comparisons on basis images of PCA, NMF, BNMF, and INMF from left to right , respectively, on CMU PIE database results.

Figure 4 :
Figure 4: Accuracy comparisons on a FERET and b CMU PIE databases.
This concludes the theorem immediately.Obviously, the function G H, H k G k H k , H k is also an auxiliary function for the entire constraint NMF criterion F H k F k H k .Lemma 3.2 indicates that F H is nonincreasing under the update rule 3.11 .Let ∂G H, H /∂H