Geometric Distribution Weight Information Modeled Using Radial Basis Function with Fractional Order for Linear Discriminant Analysis Method

Fisher linear discriminant analysis (FLDA) is a classic linear feature extraction and dimensionality reduction approach for face recognition. It is known that geometric distribution weight information of image data plays an important role in machine learning approaches. However, FLDA does not employ the geometric distribution weight information of facial images in the training stage. Hence, its recognition accuracy will be affected. In order to enhance the classification power of FLDA method, this paper utilizes radial basis function (RBF) with fractional order to model the geometric distribution weight information of the training samples and proposes a novel geometric distribution weight information based Fisher discriminant criterion. Subsequently, a geometric distribution weight information based LDA (GLDA) algorithm is developed and successfully applied to face recognition. Two publicly available face databases, namely, ORL and FERET databases, are selected for evaluation. Compared with some LDA-based algorithms, experimental results exhibit that our GLDA approach gives superior performance.


Introduction
Over the past two decades, face recognition (FR) has made great progress with the increasing computational power of computers and has become one of the most important biometric-based authentication technologies.The key issue of FR algorithm is dimensionality reduction for facial feature extraction.According to different processes of facial feature extraction, face recognition algorithms can be generally divided into two classes, namely, (local) geometric feature based and (holistic) appearance based [1].The geometric feature-based approach is based on the shape and the location of facial components (such as eyes, eyebrows, nose, and mouth), which are extracted to represent a face geometric feature vector.However, for the appearance-based approach, it depends on the global facial pixel features, which are exploited to form a whole facial feature vector for face classification.Principle component analysis (PCA) [2] and linear discriminant analysis (LDA) [3] are two famous appearancebased approaches for linear feature extraction and dimensionality reduction.They are also called Eigenface method and Fisherface method in face recognition, respectively.The objective of PCA is to find the orthogonal principle component (PC) directions and preserve the maximum variance information of the training data along PC directions.PCA can reconstruct each facial image using all Eigenfaces.Since PCA takes no account of the discriminant information, it is unsuitable for classification tasks.LDA is a supervised learning method and seeks the optimal projection mapping under Fisher criterion such that the ratio of interdistance to intradistance attains the maximum.Therefore, from the classification point of view, LDA should give better performance than PCA.LDA is theoretically sound.However, it still has two issues.For one thing, LDA often encounters a small sample size (3S) problem, which always occurs when the dimension of the input sample space is greater than the number of training facial images.Under this situation, LDA cannot be performed directly.To solve the 3S problem, a large number of LDA-based approaches have been proposed [4][5][6][7][8][9][10][11][12][13][14][15][16].Among them, Fisher linear discriminant analysis (FLDA) method, also called Fisherface method in FR, is a two-stage algorithm.It first employs PCA for dimensionality reduction to guarantee that the between-class scatter matrix is full rank, and then LDA can be implemented in the PCA-mapped low dimensional feature space.Direct LDA [6] (DLDA) is another LDA-based approach which uses simultaneous diagonalization technique [17] to solve 3S problem.The basic idea of DLDA is to previously discard the null space of betweenclass scatter matrix   and then keep the null space of withinclass scatter matrix   .Although DLDA is computationally efficient, it suffers from the performance limitation especially when the number of training images increases.This is because discarding the null space of   would also discard the null space of   indirectly.Literature [5] shows that the null space of   contains the most discriminant information.For another thing, these LDA-based methods are based on the classic Fisher criterion, which does not consider the geometric distribution weight information of the training data.So, their recognition performances will be degraded.
To enhance the discriminant power of LDA-based approach, this paper presents a novel Fisher criterion by taking into account the geometric distribution weight information of the training facial data.It is natural to think that the intradata nearby its class center is more important to represent the feature of the class.So, the proposed method attempts to impose a penalty weight (small weight) on the intradata if the intradata is far from its own class center.In the meanwhile, if two different class centers are close to each other, they will be given a small weight as well.To this end, we should extract the geometric distribution weight information of the training data.In recent years, lots of fractional order based methods [18][19][20][21][22][23][24][25] have been proposed in the area of dynamic systems, image processing, face recognition, and so on.This paper will adopt radial basis function (RBF) with fractional order [21][22][23] to model the geometric distribution weight information of the training samples, and thus we are able to establish a new Fisher criterion incorporated with data geometric distribution weight information.Based on the modified Fisher discriminant criterion, a geometric distribution weight information based linear discriminant analysis (GLDA) method is proposed for face recognition.Our GLDA approach is tested on two face databases, namely, ORL database and FERET database.Compared with FLDA method and DLDA method, experimental results show that the proposed GLDA method outperforms FLDA and DLDA methods.
The rest of this paper is organized as follows.Section 2 briefly introduces the related works.In Section 3, RBF with fractional order is exploited to model the data geometric distribution weight information.The new Fisher criterion is then established using geometric distribution weight information of the training data, and GLDA algorithm is designed.Experimental results on two face databases are reported in Section 4. Finally, Section 5 draws the conclusions.

Related Works
In this section, we will introduce some related linear feature extraction and dimensionality reduction algorithms for face recognition.
2.1.Some Notations.Let  be the dimension of the original sample space and let  be the number of the sample classes.The th class   = { ()  1 , , and the entire mean  = (1/) ∑  =1   .In PCA algorithm, total scatter matrix   , also called covariance matrix, is defined by In LDA algorithm, within-class scatter matrix   and between-class scatter matrix   are defined, respectively, as follows: ( The radial basis function   () with fractional order  is given as follows The previous RBF can be viewed as the normalized radial kernel of fractional order .

Fisher LDA.
The goal of linear discriminant analysis is to find a low dimensional feature space in which the intradata are tightly clustered and the interdata are far from each other.Therefore, LDA should acquire an optimal projection matrix  LDA to maximize the ratio of between-class scatter and the within-class scatter; namely, The previous problem is equivalent to solving the following eigen-system: where Λ is a  ×  diagonal eigenvalue matrix with its eigenvalues sorted in decreasing order.The projection matrix  LDA is formed with the eigenvectors corresponding to the largest  − 1 eigenvalues.In face recognition, the column vectors of  LDA are called Fisherfaces as well.However, LDA often suffers from small sample size problem when the number of training samples is smaller than the dimension of the sample vector.Under this situation, the withinclass scatter matrix is invertible, and the eigensystem ( 6) cannot be solved.This means that LDA cannot be performed directly.So, Fisher LDA (FLDA) uses PCA for dimensionality reduction in advance.
2.4.Direct LDA.Yu and Yang [6] proposed a direct LDA (DLDA) approach using simultaneous diagonalization technique [17].Direct LDA is actually a subspace approach to overcome 3S problem of LDA.It attempts to obtain the optimal projection matrix  in the subspace (  ) ∩ (  ) and satisfies the following equations: where (  ) means the null space of   , (  ) denotes the complement subspace of (  ), and  is an identity matrix.Diagonal matrix Λ may contain 0s and some small eigenvalues in its diagonal.Details can be found in [6].
We can see that some useful discriminant information will be discarded in the intermediate PCA stage of FLDA or simultaneous diagonalization stage of DLDA.Moreover, both FLDA method and DLDA method do not exploit the geometric distribution weight information of the training samples.These factors will affect their recognition performance.

Proposed GLDA Method
This section will propose a novel discriminant criterion, which will use the geometric distribution weight information of the training samples.Based on the new discriminant criterion, our GLDA method is proposed.Details are discussed as follows.

Proposed Discriminant Criterion.
To take advantage of geometric distribution weight information of face pattern space, we redefine the within-class scatter matrix S and between-class scatter matrix S , respectively, as follows: where    ( ()  −   ) and    (  − ) are radial basis functions defined by (3).  and   are fractional order parameters, which can be more flexibly adjusted to obtain the optimal parameters.It can be seen from ( 8) that if the distance between the samples    and   is large, it will impose a penalty weight.Similarly, if the class center   is nearby the center , then we also give it a small weight.Otherwise, it will have a large weight.
Based on the previous analysis, our geometric distribution weight information based Fisher criterion function () is defined by To obtain the following optimal projection matrix: we can equivalently solve the following eigensystem: where Λ is a diagonal eigenvalue matrix with its eigenvalues sorted in decreasing order.The projection matrix  GLDA is formed with eigenvectors corresponding to the largest  − 1 eigenvalues.

Algorithm
Design.This subsection will develop our GLDA algorithm based on geometric distribution weight information Fisher discriminant criterion (9).Details are as follows.
It is easily seen that two scatter matrices S and S can be rewritten in the following forms, respectively: where Φ = [( (1)  1 −  1 ) ⋅  (1)  1 where Since the total scatter matrix S = S + S , if we define Φ = [ Φ , Φ ] ∈  ×(+) , then S can be written as To solve the problem of eigensystem (11) and compare the proposed GLDA with FLDA algorithm under the same conditions, this paper will also use PCA for dimensionality reduction and guarantee that the geometric information based within scatter matrix S is nonsingular.This means that GLDA can be carried out in the PCA-transformed low dimensional feature space.Thereby, our GLDA algorithm is designed as follows.
Step 4. Perform an eigenvalue decomposition Ŝ =   Λ     , where Λ  is a diagonal eigenvalue matrix of Ŝ with its diagonal elements in a decreasing order and   is an orthonormal eigenvector matrix.Let   =   Λ −1    .
Step 5.The final GLDA optimal projection matrix is

Experimental Results
This section will evaluate the performance of the proposed GLDA method for face recognition.Two LDA-based algorithms, namely FLDA [3] and DLDA [6] algorithms, are chosen for comparisons under the same experimental conditions.In the following experiments, the values of fractional order parameters are given as   = 0.25 and   = 0.0125.They are manually determined using full search method.
4.1.Human Face Image Databases.Two popular and publicly available databases, namely, ORL database and FERET database, are selected for the evaluation.In ORL database, there are 40 persons and each person consists of 10 images with different facial expressions, small variations in scales, and orientations.The resolution of each image is 112 × 92 and with 256 gray levels per pixel.Image variations of one person in the database are shown in Figure 1.For FERET database, we select 120 people, 6 images for each individual.The six images are extracted from 4 different sets, namely, Fa, Fb, Fc, and duplicate.Fa and Fb are sets of images taken with the same camera at the same day but with different facial expressions.Fc is a set of images taken with different cameras at the same day.Duplicate is a set of images taken around 6-12 months after the day the Fa and Fb photos were taken.Details of the characteristics of each set can be found in [26].All images are aligned by the centers of eyes and mouth and then normalized with resolution 112 × 92.This resolution is the same as that in ORL database.Images from two individuals are shown in Figure 2.For all facial images, the following preprocessing steps are preformed.
(i) All images are aligned with the centers of eyes and mouth.The orientation of face is adjusted (on-theplane rotation) such that the line joining the centers of eyes is parallel with -axis.
(ii) The dimension of the images is reduced by one-fourth using Daubechies' D4 wavelet filter.The resolution for all images in the following experiments is 30 × 25.
(iii) For each facial image sample  ∈   , it is normalized using the following formula: In the recognition stage, the nearest neighbor approach is employed for face classification, which is base on Euclidian distance measurement between the testing image and the class center.

Comparisons on ORL Database.
The experimental setting on ORL database is as follows.We randomly selected  ( = 2, 3, . . ., 9) images from each individual for training and the rest (10 − ) of the images are for testing.In order to have a fair comparison, all methods use the same training and testing facial images.Moreover, the experiments are repeated 10 times, and the average accuracies are then calculated to avoid the statistical variations.The average accuracies are recorded and tabulated in Table 1 and plotted in Figure 3. TN in Table 1 means the numbers of training samples.It can be seen that the recognition accuracy of each approach ascends when the number of training images increases.The recognition accuracy of GLDA method increases from 79.98% with 2 training images to 99.00% with 9 training images.However, for FLDA and DLDA methods, their accuracies increase from 66.13% and 78.69% with 2 training images to 97.75% and 96.25% with 9 training images, respectively.Experimental results show that our GLDA method gives the best performance on ORL database.
We would also like to see the detailed performance of every method, which is graphically illustrated using the cumulative match characteristic (CMC) curve and the receiver operating characteristic (ROC) curve.The CMC curve shows the recognition accuracy against the rank, and the ROC curve displays the false acceptance rate (FAR) versus the genuine acceptance rate (GAR).High accuracy or high GAR with low FAR means good performance.
For each number of training images, the CMC curves and the ROC curves are plotted in Figure 4 ((TN = 2)-(TN = 9)) and Figure 5 ((TN = 2)-(TN = 9)), respectively.It can be seen that our method gives the best performance for all cases.

4.3.
Comparisons on FERET Database.The experimental setting for the FERET database is similar with that of ORL database.As the number of images for each person is 6, the number of training images ranges from 2 to 5. The experiments are repeated 10 times and the average accuracy is then calculated.The average accuracy is recorded and tabulated in Table 2 and plotted in Figure 3, respectively.When 2 training images is used for testing, the recognition rate of our method is 72.94%, while those of FLDA and DLDA methods are 62.85% and 70.25%, respectively.The performance for each method is also improved when the number of training images increases.When the number of training images is equal to 5, the accuracy for GLDA method is increased to 89.83% while those for FLDA method and DLDA method are 89.42% and 85.58%, respectively.It can be seen that the proposed method outperforms FLDA method and DLDA method on FERET database as well.
Like the ORL database, the detailed performance of each approach is shown using CMC and ROC curves.They are plotted in Figure 6 and Figure 7, respectively, with the number of training images ranging from 2 to Figures 6 and 7 demonstrate that our GLDA method has superior performance on the FERET database.

Conclusions
In order to enhance the discriminant power of the traditional LDA-based FR algorithms, this paper proposed to integrate the geometric distribution weight information of the training samples into Fisher criterion and developed a novel geometric distribution weight information based LDA (GLDA) face recognition approach.The geometric distribution weight information is learnt using radial basis function with fractional order.The proposed GLDA method is tested using two face databases, namely, ORL and FERET face databases.Compared with FLDA method, experimental results demonstrate that our GLDA method has the best performance.

Figure 1 :
Figure 1: Images of one person from ORL database.

Figure 2 :
Figure 2: Images of two persons from FERET database.

Figure 3 :
Figure 3: Rank 1 accuracy versus training number on the ORL face database (b) and FERET face database (a).

Figure 4 :
Figure 4: CMC curve comparisons on the ORL database.

Figure 5 :
Figure 5: ROC curve comparisons on the ORL database.

Figure 7 :
Figure 7: ROC curve comparisons on the FERET database.

Table 1 :
Recognition rates on ORL database.

Table 2 :
Recognition rates on FERET database.
(1).PCA.Principal component analysis algorithm is also known as Karhunen-Loeve transformation.It aims to find orthogonal principal component directions such that the scatter of all projected samples on large principal component direction is maximal.PCA is theoretically based on total scatter matrix   which can be calculated via formula(1).The PCA projection matrix  PCA is determined by the following criterion: