Direct Neighborhood Discriminant Analysis for Face Recognition

Face recognition is a challenging problem in computer vision and pattern recognition. Recently, many local geometrical structure-based techiniques are presented to obtain the low-dimensional representation of face images with enhanced discriminatory power. However, these methods suffer from the small simple size SSS problem or the high computation complexity of high-dimensional data. To overcome these problems, we propose a novel local manifold structure learning method for face recognition, named direct neighborhood discriminant analysis DNDA , which separates the nearby samples of interclass and preserves the local within-class geometry in two steps, respectively. In addition, the PCA preprocessing to reduce dimension to a large extent is not needed in DNDA avoiding loss of discriminative information. Experiments conducted on ORL, Yale, and UMIST face databases show the effectiveness of the proposed method.


Introduction
Many pattern recognition and data mining problems involve data in very high-dimensional spaces.In the past few decades, face recognition FR has become one of the most active topics in machine vision and pattern recognition, where the feature dimension of data usually can be very large and hardly handled directly.To get a high recognition rate for FR, numerous feature extraction and dimension reduction methods have been proposed to find the low-dimensional feature representation with enhanced discriminatory power.Among these methods, two state-of-the-art FR methods, principle component analysis PCA 1 , and linear discriminant analysis LDA 2 have been proved to be useful tools for dimensionality reduction and feature extraction.
LDA is a popular supervised feature extraction technique for pattern recognition, which intends to find a set of projective direction to maximize the between-class scatter matrix S b and minimize the within-class scatter matrix S w simultaneously.Although successful in many cases, many LDA-based algorithms suffer from the so-called "small sample size" SSS problem that exists when the number of available samples is much smaller than the dimensionality of the samples, which is particularly problematic in FR applications.To solve this problem, many extensions of LDA have been developed in the past.Generally, these approaches to address SSS problem can be divided into three categories, namely, Fisherface method, Regularization methods, and Subspace methods.Fisherface methods incorporate a PCA step into the LDA framework as a preprocessing step.Then LDA is performed in the lower dimensional PCA subspace 2 , where the within-class scatter matrix is no longer singular.Regularization methods 3, 4 add a scaled identity matrix to scatter matrix so that the perturbed scatter matrix becomes nonsingular.However, Chen et al. 5 have proved that the null space of S w contains the most discriminate information, while the SSS problem takes place, and proposed the null space LDA NLDA method which only extracts the discriminant features present in the null space of the S w .Later, Yu and Yang 6 utilized discriminatory information of both S b and S w , and proposed a direct-LDA DLDA method to solve SSS problem.
Recently, the motivation for finding the manifold structure in high-dimensionality data elevates the wide application of manifold learning in data mining and machine learning.Among these methods, Isomap 7 , LLE 8 , and Laplacian eigenmaps 9, 10 are representative techniques.Based on the locality preserving concept, some excellent local embedding analysis techniques are proposed to find the manifold structure based on local nearby data 11, 12 .However, these methods are designed to preserve the local geometrical structure of original high-dimensional data in the lower dimensional space rather than good discrimination ability.In order to get a better classification effect, some supervised learning techniques are proposed by incorporating the discriminant information into the locality preserve learning techniques 13-15 .Moreover, Yan et al. 15 explain the manifold learning techniques and the traditional dimensionality reduction methods as a unified framework that can be defined in a graph embedding way instead of a kernel view 16 .However, the SSS problem is still exists in the graph embedding-based discriminant techniques.To deal with such problem, PCA is usually performed to reduce dimension as a preprocessing step in such environment 11, 15 .In this paper, we present a two-stage feature extraction technique named direct neighborhood discriminant analysis DNDA .Compared to other geometrical structure learning work, the PCA step is not needed to be done in our method.Thus, more discriminant information can be kept for FR purpose, and as a result improved performance is expected.The rest of the paper is structured as follows: we give a brief review of LDA and DLDA in Section 2. We then introduce in Section 3 the proposed method for dimensionality reduction and feature extraction in FR.The effectiveness of our method is evaluated in a set of FR experiments in Section 4. Finally, we give concluding remarks in Section 5.

LDA
LDA is a very popular technique for linear feature extraction and dimensionality reduction 2 , which chooses the basis vectors of the transformed space as those directions of the original space to make the ratio of the between-class scatter and the within-class scatter are maximized.Formally, the goal of LDA is to seek the optimal orthogonal matrix w, such that maximizing the following quotient, the Fisher Criterion: where S b is the between-class scatter matrix, S w is the within-class scatter matrix, such that w can be formed by the set of generalized eigenvectors corresponding to following eigenanalysis problem: When the inverse of S w exists, the generalize vectors can be obtained by eigenvalue decomposition of S −1 w S b .However, one usually confronts the difficulty that the within-class scatter matrix S w is singular SSS in FR problem.The so-called PCA plus LDA approach 2 is a very popular technique which intends to overcome such circumstances.

DLDA
To take discriminant information of both S b and S w into account without conducting PCA, a direct LDA DLDA technique has been presented by Yu and Yang 6 .The basic idea behind the approach is that no significant information will be lost if the null space of S b is discarded.Based on the assumption, it can be concluded that the optimal discriminant features exist in the range space of S b .
Let multiclass classification be considered, given a data matrix X ∈ R d×N , where each column x i represents a sample data.Suppose X is composed of c classes and total number of samples is denoted by c i 1 N i N, for the ith class consists of N i samples.Then, the between-class scatter matrix is defined as where are the class mean sample, and denotes the total mean sample.Similarly, the within-class scatter matrix is defined as where, In DLDA, eigenvalue decomposition is performed on the between-class matrix S b , firstly.Suppose the rank of S b is t, and let D b diag λ 1 , λ 2 , . . ., λ t be a diagonal matrix with the t largest eigenvalue on the main diagonal in descending order, Y v 1 , v 2 , . . ., v t is the eigenvector matrix that consists of t corresponding eigenvectors.Then, dimensionality of data x is reduced by using the projection matrix And eigenvalue decomposition is performed on the within-class scatter matrix of the projected samples, S w Z T S w Z.Let D w diag η 1 , η 2 , . . ., η t be the ascending order eigenvalue matrix of S w and U u 1 , u 2 , . . ., u t be the corresponding eigenvector matrix.Therefore, the final transformation matrix is given by To address the computation complexity problem of high dimensional data, the eigenanalysis method presented by Turk and Pentland 1 is applied in DLDA, which makes the eigenanalysis of scatter matrices be progressed in an efficient way.For the eigenvalue decomposition of any symmetry matrix A with the form of A BB T , we can consider the eigenvectors v i of B T B such that 2.9 Premultiplying both sides by G, we have from which it can be concluded that the eigenvectors of A is Bv i with the corresponding eigenvalue λ i .

Direct neighborhood discriminant analysis
Instead of mining the statistical discriminant information, manifold learning techniques try to find out the local manifold structure of data.Derived from the locality preserving idea 10, 11 , graph embedding framework-based techniques extract the local discriminant features for classification.For a general pattern classification problem, it is expected to find a linear transformation, such that the compactness for the samples that belong to the same class and the separation for the samples of the interclass should be enhanced in the transformed space.
As an example, a simple multiclass classification problem is illustrated in Figure 1.Suppose there are two nearest inter-and intraclass neighbors searched for classification.The interand intracalss nearby data points of five data points A-E is shown in Figures 1 b and 1 c , respectively.For data point A, it is optimal that the distance from its interclass neighbors should be maximized to alleviate their bad influence for classification.On the other hand, the distance between data point A and its intraclass neighbors should be minimized to make A be classified correctly.
Based on the consideration, two graphs, that is, the between-class graph G and the within-class graph G are constructed to discover the local discriminant structure 13, 15 .For each data point x i , its sets of inter-and intraclass neighbors are indicated by kNN b x i and Miao Cheng et al. kNN w x i , respectively.Then, the weight W ij reflects the weight of the edge in the betweenclass graph G is defined as and similarly define within-class affinity weight as Let the transformation matrix be denoted by P ∈ R d×d d d , which transforms the original data x from high-dimensional space R d into a low-dimensional space R d by y P T x.The separability of interclass samples in the transformed low-dimensional space can be defined as where tr • is the trace of matrix, X x 1 , x 2 , . . ., x N is the data matrix, and D b is a diagonal matrix, of which entries are column or row, since Similarly, the compactness of intraclass samples can be characterized as tr P T X 2D w − 2W w X T P .

3.4
Here, D w is a diagonal matrix of which entries are column or row sum of W w on the main diagonal, D w ii j W w ij .Then, the optimal transformation matrix P can be obtained by solving the following problem:

3.5
Here, S c is always singular with small training sample set leading problem to get projective matrix P directly, thus previous local discriminant techniques still suffer from the curse of high dimensionality.Generally, PCA is usually performed to reduce dimension as a preprocessing step in such environment 15 , however, possible discriminant information may be ignored.Inspired by DLDA, we can perform eigenanalysis on S s and S c successively to extract the complete local geometrical structure directly without PCA preprocessing.To alleviate the burden of computation, we reformulate S s and S c so that Turk's eigenanalysis method can be employed.For each nonzero element of W b , W b ij , we build an N dimensional interclass index vector h i,j of all zeroes except the ith and jth element is set to be 1 and −1, respectively:   As each column in H s has only two nonzero elements 1 and −1, we can make the first row in H s be a null row by adding all rows but the first to the first row.On the other hand, for each column h j,i in H s , there is another column h j,i with contrary sign.Then, it is clear that  In many FR cases, the number of pixels in a facial image is much larger than the number of available samples, that is, d N. It tells us that the rank of S s is at most min{N − 1, N b /2}.Similarly, S c can also be reformulated as

3.10
Here, H c ∈ R N×N w is the intraclass index matrix consisting of all the N w intraclass index vectors as columns, which is constructed according to the N w nonzero elements in W w .Similar to S s , the rank of S c is up to min{N − 1, N w /2}.Based on the modified formulation, the optimal transformation matrix P can be obtained as P * arg max P P T S s P P T S c P arg max

3.11
As the null space of S s contributes little to classification, it is feasible to remove such subspace by projecting S s into its range space.We apply the eigenvalue decomposition to S s and unitize it through Turk's eigenanalysis method, while discarding those eigvectors whose corresponding eigvalues are zero, which do not take much power for discriminant analysis.Then, the discriminant information in S c can be obtained by performing eigenanalysis on S c , which is gotten by transforming S c into the range subspace of S s .This algorithm can be implemented by the pseudocode shown in Algorithm 1.

Experiments
In this section, we investigate the performance of the proposed DNDA for face recognition.Three popular face databases, ORL, Yale, and UMIST are used in the experiments.To verify the performance of DNDA, each experiment is compared with classical approaches: Eigenface 1 , Fisherface 2 , DLDA 6 , LPP 11 , and MFA 15 .The three nearest-neighbor classifier with Euclidean distance metric is employed to find the image in the database with the best match.

ORL database
In ORL database 18 , there are 10 different images for each of 40 distinct subjects.For some subjects, the images were taken at different times, varying the lighting, facial expressions open/closed eyes, smiling/not smiling , and facial details glasses/no glasses .All the images are taken against a dark homogeneous background with the subjects in an upright, frontal position with tolerance for some side movement .The original images have size of 92 × 112 pixels with 256 gray levels; such one subject is illustrated in Figure 2 a .The experiments are performed with different numbers of training samples.As there are 10 images for each subject, n n 3, 4, 5 of them are randomly selected for training and the remaining are used for testing.For each n, we perform 20 times to choose randomly the training set and the average recognition rate is calculated.Figure 3 illustrates the plot of recognition rate versus the number of features used in the matching for Eigenface, Fisherface, DLDA, LPP, MFA, and DNDA.The best performance obtained by each method and the corresponding dimension of reduced space in the bracket are shown in Table 1.

Yale database
The  The experimental implementation is the same as before.For each individual, n n 3, 4, 5 images are randomly selected for training and the rest are used for testing.For each given n, we average the results over experiments repeated 20 times independently.Figure 4 illustrates the plot of recognition rate versus the number of features used in the matching for Eigenface, Fisherface, DLDA, LPP, MFA, and DNDA.The best results obtained in the experiments and the corresponding reduced dimension for each method is shown in Table 2.

UMIST database
The UMIST face database 20 consists of 564 images of 20 people.For simplicity, the Precropped version of the UMIST database is used in this experiment, where each subject covers a range of poses from profile to frontal views and a range of race/sex/appearance.The size of cropped image is 92 × 112 pixels with 256 gray levels.The facial images of one subject with different views are illustrated in Figure 2 c .
For each individual, we chose 8 images of different views distributed uniformly in the range 0-90 • for training, and the rest are used for training.Figure 5 illustrates the plot of recognition rate versus the number of features used in the matching for Eigenface, Fisherface, DLDA, LPP, MFA, and DNDA.The best performance and the corresponding dimensionalities of the projected spaces for each method are shown in Table 3.
From the experiment results, it is very obvious that DNDA achieves higher accuracy than the other methods.This is probably due to the fact that DNDA is a two-stage local discriminant technique, different form LPP and MFA.Moreover, PCA is removed in DNDA preserving more discriminant information compared with others.

Conclusions
Inspired by DLDA, we propose in this paper a novel local discriminant feature extraction method called direct neighborhood discriminant analysis DNDA .In order to avoid SSS problem, DNDA performs a two-stage eigenanalysis approach, which can be implemented efficiently by using Turk's method.Compared with other methods, PCA preprocessing is left out in DNDA with the immunity from the SSS problem.Experiments on ORL, Yale, and UMIST face databases show the effectiveness and robustness of our proposed method for face recognition.To get a better classification result, the improvement and extension of DNDA are to be taken into account in our future work.

A. Proof of 2D − 2W HH T
Given the graph weight matrix W with l nonzero elements, consider two matrices M, N ∈ R N×l .For each nonzero element in W, there is corresponding column in M and N with common location, respectively.Let Z { i, j | W ij / 0} be the index set of nonzero elements in W. For the kth 1 k l nonzero element W ij in W, the kth column of M, N is represented as Then, it is easy to get A.5 It is easy to check that H M − N, which completes the proof.

Figure 1 :
Figure 1: Local discriminant neighbors.a Multi-class classification b Two interclass neighbors c Two intraclass neighbors.

whereFigure 3 :
Figure 3: Recognition rate against the number of features used in the matching on the ORL database: a 3 training samples, b 4 training samples, and c 5 training samples.
Yale Face Database 19 contains 165 grayscale images of 15 individuals.There are 11 images per subject, one per different lighting condition left-light, center-light, right-light , facial expression normal, happy, sad, sleepy, surprised, wink , and with/without glasses.Each images used in the experiments is 92 × 112 pixels with 256 gray levels.The facial images of one individual are illustrated in Figure 2 b .

Figure 4 :Figure 5 :
Figure 4: Recognition rate against the number of features used in the matching on the Yale database: a 3 training samples, b 4 training samples and c 5 training samples.

1 N
b 1 a, b N , and M a,: N T b,: 0 for a, b / ∈ Z, N a,: M T b,: 0 for b, a / ∈ Z, A.3 where M k,: and N k,: denote the kth row of M and N, respectively.Therefore, we can get ik M jk W ji , A.4 where δ ij is the Kronecker delta.Note that both matrix D and W are symmetry matrices, based on the above equations, it is easy to find out M − N M − N T MM T NN T − MN T − NM T D D − W − W 2D − 2W.

6
Suppose there are N b nonzero elements in W b , let H s h 1 , h 2 , . . ., h N b be the interclass index matrix made up of N b interclass index vectors.It can be easily obtained that 2D b − 2W b H s H T s , which we prove in Appendix A. Therefore, S s can be reformulated as 1.Construct the between-class and the within-class affinity weight matrix W b , W w .2. Construct the interclass and the intraclass index matrix H s , H c according to the nonzero elements of W b , W w .
Input: Data matrix X ∈ R d×N , class label L Output: Transformed matrix P * S s and keep the largest t nonzero eigenvalues λ λ 1 , λ 2 , . . ., λ t and corresponding eigenvectors U u 1 , u 2 , . . ., u t after sorted in decreasing order, where t rank S s .4. Compute P s as P s UD −1/2 s , where D s diag λ 1 , λ 2 , . . ., λ t is diagonal matrix with λ i on the main diagonal.

Table 3 :
Comparison of recognition rates of Eigenface, Fisherface, DLDA, LPP, MFA, and DNDA on the UMIST database.N b is the number of nonzero elements in W b , as it preserves a similar procedure to DLDA o c 3 .Compared with Eigenface o N 2 d and Fisherface o N 2 d , DNDA is still more efficient for feature extraction in high dimensionality if d N.