Face Recognition Using Double Sparse Local Fisher Discriminant Analysis

Local Fisher discriminant analysis (LFDA) was proposed for dealing with the multimodal problem. It not only combines the idea of locality preserving projections (LPP) for preserving the local structure of the high-dimensional data but also combines the idea of Fisher discriminant analysis (FDA) for obtaining the discriminant power. However, LFDA also suffers from the undersampled problem as well as many dimensionality reduction methods. Meanwhile, the projection matrix is not sparse. In this paper, we propose double sparse local Fisher discriminant analysis (DSLFDA) for face recognition. The proposed method firstly constructs a sparse and data-adaptive graph with nonnegative constraint. Then, DSLFDA reformulates the objective function as a regressiontype optimization problem.The undersampled problem is avoided naturally and the sparse solution can be obtained by adding the regression-type problem to a l1 penalty. Experiments on Yale, ORL, and CMU PIE face databases are implemented to demonstrate the effectiveness of the proposed method.


Introduction
Dimensionality reduction tries to transform the high-dimensional data into lower-dimensional space in order to preserve the useful information as much as possible.It has a wide range of applications in pattern recognition, machine learning, and computer vision.A well-known approach for supervised dimensionality reduction is linear discriminant analysis (LDA) [1].It tries to find a projection transformation by maximizing the between-class distance and minimizing the within-class distance simultaneously.In practical applications, LDA usually suffers from some limitations.First, LDA usually suffers from the undersampled problem [2]; that is, the dimension of data is larger than the number of training samples.Second, LDA can only uncover the global Euclidean structure.Third, the solution of LDA is not sparse, which cannot give the physical interpretation.
To deal with the first problem, many methods have been proposed.Belhumeur et al. [3] proposed a two-stage principal component analysis (PCA) [4] + LDA method, which utilizes PCA to reduce dimensionality so as to make the within-class scatter matrix nonsingular, followed by LDA for recognition.However, some useful information may be compromised in the PCA stage.Chen et al. [5] extracted the most discriminant information from the null space of within-class scatter matrix.However, the discriminant information in the nonnull space of within-class scatter matrix would be discarded.Huang et al. [6] proposed an efficient null-space approach, which first removes the null space of total scatter matrix.This method is based on the observation that the null space of total scatter matrix is the intersection of the null space of betweenclass scatter matrix and the null space of within-class scatter matrix.Qin et al. [7] proposed a generalized null space uncorrelated Fisher discriminant analysis technique that integrates the uncorrelated discriminant analysis and weighted pairwise Fisher criterion for solving the undersampled problem.Yu and Yang [8] proposed direct LDA (DLDA) to overcome the undersampled problem.It removes the null space of betweenclass scatter matrix and extracts the discriminant information that corresponds to the smallest eigenvalues of the withinclass scatter matrix.Zhang et al. [9] proposed an exponential discriminant analysis (EDA) method to extract the most discriminant information which is contained in the null space of the within-class scatter matrix.
To deal with the second problem, many methods have been developed for dimensionality reduction.These methods focus on finding the local structure of the original data space.Locality preserving projections (LPP) [10] was proposed to find an embedding subspace that preserves local information.One limitation of LPP is that it is an unsupervised method.Because the discriminant information is important to the classification tasks, some locality preserving discriminant methods have been proposed.Discriminant locality preserving projection (DLPP) [11] was proposed to improve the performance of LPP.Laplacian linear discriminant analysis (LapLDA) [12] tries to capture the global and local structure of the data simultaneously by integrating LDA with a locality preserving regularizer.Local Fisher discriminant analysis (LFDA) [13] was proposed to deal with the multimodal problem.It combines the ideas of Fisher discriminant analysis (FDA) [1] and LPP and maximizes between-class separability and preserves within-class local structure simultaneously.In LDA, the dimension of the embedding space should be less than the number of classes.This limitation can be solved by using the LFDA algorithm.
To deal with the third problem, many dimensionality reduction methods integrating the sparse representation theory have been proposed.These methods can be classified into two categories.The first category focuses on finding a subspace spanned by sparse vectors.The sparse projection vectors reveal which element or region of the patterns is important for recognition tasks.Sparse PCA (SPCA) [14] was proposed by using the least angle regression and elastic net to produce sparse principal components.Sparse discriminant analysis (SDA) [15] and sparse linear discriminant analysis (SLDA) [16] were proposed to learn a sparse discriminant subspace for feature extraction and classification in biological and medical data analysis.Both methods try to transform the original objective into a regression-type problem and add a lasso penalty to obtain the sparse projection axes.One disadvantage of these methods is that the number of sparse vectors is at most  − 1.  is the number of class.The second category focuses on the sparse reconstructive weight among the training samples.Graph embedding framework views many dimensionality reduction methods as the graph construction [17].The -nearest neighbor and the -ball based methods are two popular ways for graph construction.Instead of them, Cheng et al. built the ℓ 1 -graph based on sparse representation [18].The ℓ 1 -graph has proved that it is efficient and robust to data noise.ℓ 1 -graph based subspace learning methods include sparse preserving projections (SPP) [19] and discriminant sparse neighborhood preserving embedding (DSNPE) [20].
Motivated by ℓ 1 -graph and sparse subspace learning, in this paper, we proposed double sparse local Fisher discriminant analysis (DSLFDA) for multimodal problem.It measures the similarity on the graph by integrating the sparse representation and nonnegative constraint.To obtain sparse projection vectors, the objective function can be transformed into a regression-type problem.Furthermore, the space spanned by the solution of regression-type problem is identical to that spanned by the solution of original problem.The proposed DSLFDA has two advantages: (1) it remains the sparse characteristic of ℓ 1 -graph; (2) to enhance the discriminant power of DSLFDA, the label information is used in the definition of local scatter matrices.Meanwhile, the projection vectors are sparse, which can make the physical meaning of the patterns clear.The proposed method is applied to face recognition and is examined using the Yale, ORL, and PIE face databases.Experimental results show that it can enhance the performance of LFDA effectively.
The rest of this paper is organized as follows.In Section 2, the LFDA algorithm is presented.The double sparse local Fisher discriminant analysis algorithm is proposed in Section 3. In Section 4, experiments are implemented to evaluate our proposed algorithm.The conclusions are given in Section 5.

Related Work
In this section, we give a brief of LDA and LFDA.Given a data set  = [ 1 ,  2 , . . .,   ] ∈ R × with each column corresponding to a data sample,   ∈ R  (1 ≤  ≤ ), the class label of   is set to   ∈ {1, 2, . . ., }, and  is the number of classes.We denote   as the number of samples in the th class.Dimensionality reduction tries to map the point   ∈ R  into   ∈ R  ( ≪ ) by the linear transformation: The above transformation can be written as matrix form: where  = [ 1 ,  2 , . . .,   ] ∈ R × .
2.1.Linear Discriminant Analysis.Linear discriminant analysis tries to find the discriminant vectors by the Fisher criterion, that is, the within-class distance is minimized and the between-class distance is maximized simultaneously.The within-class scatter matrix   and between-class scatter matrix   are, respectively, defined as follows: where   is the data set of class .  is the mean of the samples in class  and  is the mean of the total data.LDA seeks the optimal projection matrix by maximizing the following Fisher criterion: The above optimization is equivalent to solving the following generalized eigenvalue problem: consists of the eigenvectors of  −1    corresponding to the first  largest eigenvalues.

Local Fisher Discriminant Analysis.
Local Fisher discriminant analysis (LFDA) is also a discriminant analysis method.It aims to deal with the multimodal problem.The local within-class scatter matrix  lw and the between-class scatter matrix  lb are defined as where is the affinity matrix.  = exp(−‖  −   ‖ 2 /    ), and   is the local scaling of   defined by   = ‖  −  ()  ‖ where  ()  is the th nearest neighbor of   .
The objection function of LFDA is formulated as where tr(⋅) is the trace of a matrix.The projection matrix can be obtained by calculating the eigenvectors of the following generalized eigenvalue problem: Because of the definition of matrix , LFDA can effectively preserve the local structure of the data.

Double Sparse Local Fisher Discriminant Analysis
where   = [ 1 , . . .,  ,−1 , 0,  ,+1 , . . .,  , ]  is a -dimensional vector in which the th element is equal to zero. 1 ∈ R  is a vector of all ones.The ℓ 1 -minimization problem (10) can be solved by many efficient numerical algorithms.In this paper, the LARS algorithm [21] is used for solving problem (10).The matrix  can be seen as the similarity measurement by setting the matrix  = (+  )/2.Therefore, the new local scatter matrices can be defined as follows: where  lw and  lb are the weight matrices and defined as The final objective function is described as follows: The optimal projection  can be obtained by solving the following generalized eigenvalue problem: When the matrix  lw is nonsingular, the eigenvectors are obtained by the eigendecomposition of matrix ( lw ) −1  lb .However, the projection matrix is not sparse.
Mathematical Problems in Engineering 3.2.Finding the Sparse Solution.We first reformulate formulas ( 11) and ( 12) in matrix form.Consider where  (lb) is the diagonal matrix and the th diagonal ,  (lb) =  (lb) −  (lb) .Similarly, formula ( 12) can be expressed as where  (lw) =  (lw) −  (lw) ,  () is the diagonal matrix, and the th diagonal element is .Matrices  (lb) and  (lw) are always symmetric and positive semidefinite; therefore, the eigendecomposition of  (lb) and  (lw) can be expression as follows: where Σ  and Σ  are the diagonal matrices.Their diagonal elements are the eigenvalues of matrices  (lb) and  (lw) , respectively.So  lb and  lw can be rewritten as where   =   Σ 1/2  and   =   Σ 1/2  .The following result which was inspired by [14,16] gives the relationship between problem (10) and the regressiontype problem.
Theorem 1. Suppose that   is positive definite; its Cholesky decomposition can be expressed as   =      , where   ∈ R × is a lower triangular matrix.Let  = [V 1 , V 2 , . . ., V  ] be the eigenvector of problem (15) where  > 0 and   (:, ) is the th column of   .Then the columns of  span the same linear space as well as those of .
To obtain sparse projection vectors, we add a ℓ 1 penalty to the objective function (20): Generally speaking, it is difficult to compute the optimal  and  simultaneously.An iterative algorithm was usually used for solving problem (21).For a fixed , there exists an orthogonal matrix P such that [, P] is  ×  column orthogonal matrix.Then the first term of (21)   which is subject to    = .The optimal solution can be obtained by computing the singular value decomposition and  =      .The algorithm procedure of DSLFDA is summarized as follows.
Input: the data matrix .
Output: the sparse projection matrix .
(3) Initialize matrix  as an arbitrary column orthogonal matrix.

Experimental Results
In this section, we use the proposed DSLFDA method for face recognition.Three face image databases, that is, Yale [22], ORL [23], and PIE [24], are used in the experiments.We compare our proposed algorithm with PCA, LDA, LPP, LFDA, SPCA, SPP, DSNPE, and SLDA.For simplicity, we use nearest neighbor classifier for classification task and the Euclidean metric is used as the distance measure.our experiments, the face region of each original image was cropped based on the location of eyes.Each cropped image was resized to 32 × 32 pixels.Figure 1 shows the cropped sample images of two individuals from the Yale database.
In the first experiment, we randomly select  ( = 2, 3, 4, 5, 6) images per subject for training and the remaining images are for testing.10 time runs were implemented for stable performance.The average rates are used as the final recognition accuracies.For LFDA, the parameter is set to −1 for simplicity.LPP is implemented in supervised model.For SPCA, we manually choose the sparse principal component in order to obtain the best performance.Table 1 shows the recognition accuracies of different methods with the corresponding dimension.
In the second experiment, we experiment with different dimensionalities of the projected space.Five images per individual were randomly selected for training, and the remaining images were used for testing.Figure 2 shows the performance of different methods.expressions.The original size of the images is 243×320 pixels.The images were manually cropped and resized to 32 × 32 pixels.Figure 3 shows the cropped sample images of two individuals from the ORL database.

Experiment on the ORL Face
In the first experiment, we randomly select  ( = 2, 3, 4, 5, 6) images per subject for training and the remaining images are for testing.10 time runs were implemented for stable performance.The average rates are used as the final recognition accuracies.The experimental parameters were set as in Section 4.1.
In the second experiment, we experiment with different dimensionalities of the projected space.Five images per individual were randomly selected for training, and the remaining images were randomly selected for testing.Figure 4 shows the performance of different methods (Table 2).

Experiment on the PIE Face Database.
The CMU PIE face database contains 41368 images of 68 individuals.The images were captured under 13 different poses, under 43 different illumination conditions, and with 4 different expressions.In our experiments, we choose a subset (C29) that contains 1632 images of 68 individuals.These were manually cropped and resized to 32 × 32 pixels.Figure 5 shows the cropped sample images of two individuals from CMU PIE database.In the first experiment, we randomly select  ( = 3, 6, 9, 12, 15) images per subject for training and the remaining images are for testing.10 time runs were implemented for stable performance.The average rates are used as the final  recognition accuracies.The experimental parameters were set as in Section 4.1.Table 3 shows the recognition accuracies of different methods with the corresponding dimension.
In the second experiment, we experiment with different dimensionalities of the projected space.Fifteen images per individual were randomly selected for training, and the remaining images were randomly selected for testing.Figure 6 shows the performance of different methods.data into a low-dimensional subspace whose dimensionality is larger than the number of classes.

Discussion and Conclusion
(3) From Table 3, LPP and SLDA outperform LFDA on the CMU PIE database.However, DSLFDA can achieve better performance than other methods.This point shows that DSLFDA improves not only the performance of LFDA but also the performance of sparse-based method, such as SLDA.The proposed DSLFDA algorithm constructs the graph on the original data and obtains the nonnegative similarity measurement.This is different from SPP and DSNPE.
(4) From the experimental results, we obtain that SPP can get competitive performance on CMU PIE database, rather than ORL and Yale databases.The reason may be that the sparse representation needs abundant training samples.Conversely, the nonnegative similarity measurement in DSLFDA is adaptive and can overcome the drawback of sparse representation.
(5) DSNPE can be regarded as an extension of SPP.It can extract the discriminant information and perform better than SPP.On the Yale database, DSNPE can achieve the best recognition performance when the training samples per individual are four and five.

Conclusion.
In this paper, we proposed a sparse projection method, called DSLFDA, for face recognition.It defines a novel affinity matrix that describes the relationships of points on the original high-dimensional data.The sparse projection vectors are obtained by solving the ℓ 1 -optimization problem.Experiments on Yale, ORL, and CMU PIE face databases indicate that DSLFDA can get competitive performance compared to other dimensionality reduction methods.We only focus on supervised learning in this paper.Because a large amount of unlabeled data is available in practical applications, semisupervised learning has attracted much attention in recent years [25][26][27].One of our future works is to extend our approach under the semisupervised learning framework.On the other hand, DSLFDA needs the local within-class scatter matrix positive definite.We add an identity matrix to the local within-class scatter matrix for regularization.This may motivate us to find the regularization method to approximate the local within-class scatter matrix well.

Figure 1 :
Figure 1: Sample images of two individuals from Yale database.

4. 1 .Figure 2 :
Figure 2: The recognition performance versus different dimensions on the Yale database.
Database.The ORL database contains 400 images of 40 individuals.Each individual has 10 images.The images were captured at different times, under various light conditions, and with different facial

Figure 3 :
Figure 3: Sample images of two individuals from ORL database.

Figure 4 :
Figure 4: The recognition performance versus different dimensions on the ORL database.

Figure 5 :
Figure 5: Sample images of two individuals from CMU PIE database.

Figure 6 :
Figure 6: The recognition performance versus different dimensions on the CMU PIE database.

Table 1 :
The top recognition rates (%) and the corresponding dimensions on Yale database by different methods (mean ± std).

Table 2 :
The top recognition rates (%) and the corresponding dimensions on ORL database by different methods (mean ± std).

Table 3 :
The top recognition rates (%) and the corresponding dimensions on CMU PIE database by different methods (mean ± std).