Face Recognition Method Based on Fuzzy 2 DPCA

2DPCA, which is one of the most important face recognition methods, is relatively sensitive to substantial variations in light direction, face pose, and facial expression. In order to improve the recognition performance of the traditional 2DPCA, a new 2DPCA algorithm based on the fuzzy theory is proposed in this paper, namely, the fuzzy 2DPCA (F2DPCA). In this method, applying fuzzy K-nearest neighbor (FKNN), the membership degree matrix of the training samples is calculated, which is used to get the fuzzy means of each class. The average of fuzzy means is then incorporated into the definition of the general scatter matrix with anticipation that it can improve classification result. The comprehensive experiments on the ORL, the YALE, and the FERET face database show that the proposed method can improve the classification rates and reduce the sensitivity to variations between face images caused by changes in illumination, face expression, and face pose.


Introduction
Face recognition is one kind of biometrics, aiming at capturing and use of behavioral or physiological characteristics for individual verification or personal identification.Over the last several decades, it has been an important issue in image processing and pattern recognition, and, in many application fields, such as access control, card identification, mug shot searching, and security monitoring, it plays an important role.
In face recognition fields, many researchers have proposed a lot of feature extraction methods [1][2][3][4][5][6][7][8], of which the Eigenfaces method based on principal component analysis (PCA) introduced by Turk and Pentland [4] is one of the popular approaches and basic classification techniques.Recently, many improved methods were developed [9][10][11].In fact, the PCA dwells on a linear projection of a high-dimensional face image space into a new low-dimensional feature space by finding a set of orthogonal basis images (called Eigenfaces) [2].In this new basis, the image coordinates (the PCA coefficients) are uncorrelated.
There is an important problem in applying the abovementioned methods which we should take into account, however.In the PCA-based face recognition technique, the 2D face image matrices must be previously transformed into 1-D image vectors [4], and the resulting image vectors of faces usually lead to a high-dimensional image vector space.Obviously, it is difficult to evaluate the covariance matrix accurately due to its large size and the relatively small number of training samples.On the other hand, the matrixto-vector transform may also cause the loss of some useful structural information embedding in the original images.To overcome the problems, a straightforward image projection technique, called two-dimensional principal component analysis (2DPCA), is developed by Yang et al. [12] for image feature extraction.As opposed to the conventional PCA, the 2DPCA is based on 2-D matrices rather than 1-D vector; that is, the image matrix does not need to be previously transformed into a vector.Instead, an image covariance matrix can be constructed directly using the original image matrices.In contrast to the covariance matrix of the PCA, the size of the image covariance matrix using the 2DPCA is small.As a result, the 2DPCA has two important advantages over the PCA.First, the size of covariance matrix using the 2DPCA is much smaller, so it is easier to evaluate the covariance matrix accurately.Second, the 2DPCA computes the corresponding eigenvectors more quickly than that of the PCA, so less operation time is required [12].
However, there is an important problem in applying 2DPCA method which should be mentioned.In the 2DPCA model, the mean matrix, which is generally estimated by the class sample averages of all training samples, is used to character the total scatter matrix, so the average of training samples plays a critical role in the construction of the total scatter matrix and finally affects the projection directions of the 2DPCA.Since face recognition is typically a small sample size problem, in which only a few of image samples are available for training per class, it is difficult to give an accurate estimate of the mean using the samples average, in particular when there are outliers in the sample set (e.g., the images with variations of the noise, occlusion, etc.) [13].Inaccurate estimate of the mean must have a negative effect on the robustness of the 2DPCA models.Another major problem coming with the use of the Eigenface technique is that it can be affected by the variations in illumination, facial expressions, and pose.All these nonideal conditions make the distribution of samples uncertain.To improve the recognition performance of the 2DPCA and address these uncertainties, taking advantage of the fuzzy technology [14] is a good choice.
So far, a number of scholars have applied fuzzy theory to face recognition algorithm [14][15][16][17][18][19]; for example, Keller et al. [14] proposed a fuzzy fisherface method, and Yang et al. [17] extended it to a 2-D space.Zhai et al. [18] applied the fuzzy rough set to select the optimal discriminant feature, and Liu and Shi [19] incorporated the local features into the fuzzy weight scheme to improve the recognition performance of the traditional 2DPCA.Inspired by the successful application of them, we envision that the fusion of fuzzy theory and 2DPCA can improve the recognition rate to some extent.
Based on what have mentioned above, in order to make full use of the class information and the distribution information of the samples to make an accurate estimate of the training samples mean in the definition of the 2DPCA model, we incorporate the fuzzy theory and the class information into the computation of the mean matrix.We call this method fuzzy 2DPCA (F2DPCA) algorithm.In this method, the fuzzy membership degree matrix based on the Euclidean distance of all training samples is calculated firstly, and, then, the fuzzy mean of each class can be obtained.Finally, we perform average operation on these fuzzy means to get the center of all training samples.In the new definition of 2DPCA model, the average of all training samples is replaced with the center obtained above.The comprehensive experiments on the ORL, the YALE, and the FERET face databases show that the proposed method can improve classification rates and reduce sensitivity to variations between face images caused by changes in illumination and viewing directions.
The organization of this paper is as follows.We briefly reviewed the theory of 2DPCA in Section 2. The content of Section 3 is that the fuzzy K-nearest neighbor was introduced firstly, and then we described the idea of fuzzy 2DPCA in detail.In Section 4, experiments on face image databases were presented to demonstrate the effectiveness of the new method.Conclusions are summarized in Section 5.

2DPCA
Suppose A is an -dimensional unitary column vector.The idea of 2DPCA is to project a given image X, an  ×  matrix, onto A to get an -dimensional vector Y by the following linear transformation [12]: we call  the projected feature vector of X.
The procedure of 2DPCA to obtain the A vector can be characterized as follows.
Suppose there are  pattern classes in  × space and there is a sample set including  face images {X 1 , X 2 , . . ., X  }, where X  ∈  × ; each sample belongs to a class ,  ∈ {1, 2, . . ., }.Total scatter matrix can be defined as where  denotes the number of training samples and X = (1/) ∑  =1 X  is the mean of all training samples.In 2DPCA algorithm, the optimal projection vector  opt satisfies with where {A  |  = 1, 2, . . .} are  orthogonal eigenvectors of S  corresponding to the largest  eigenvalues, respectively.

Fuzzy 2DPCA
3.1.Fuzzy K-Nearest Neighbor (FKNN).In our method, the fuzzy membership degree and the class centers are obtained through the FKNN algorithm [14].With FKNN, the computation of the membership degree can be realized through the following steps.
Step 1. Compute the Euclidean distance matrix between pairs of image matrices in the training set.
Step 2. Set the diagonal elements of the Euclidean distance matrix to infinity.
Step 3. Sort the distance matrix (treat each of its columns separately) in an ascending order.Collect the corresponding class labels of the patterns located in the closest neighborhood of the pattern under consideration (as we are concerned with "" neighbor, this returns a list of "" integers).
Step 4. Compute the membership degree to class "" for the th pattern using the method proposed in the literature [14] according to the following equation to construct the membership degree matrix U: ( Consider U = [  ],  = 1, 2, . . ., ;  = 1, 2, . . ., , which satisfies two obvious properties: where  is the number of the neighbors of the th data (pattern) that belong to the th class.After the examination of the membership allocation formula we conclude that the method attempts to "fuzzify" or refine the membership grades of the labeled patterns [14].
Intuitively, if there are very few neighbors of the pattern that belong to the same category, the membership grade is kept close to 0.51.Alternatively, if   =  means that all neighbors are in the same class as the pattern under consideration, then   = 1 [14].

The Algorithm of Fuzzy 2DPCA.
The key step of fuzzy 2DPCA is how to incorporate the contribution of each training sample into the total scatter matrix.Based on the fuzzy set theory, each sample can be classified into multiclasses with fuzzy membership degrees, instead of binary classification, so, in the redefinition of the fuzzy total scatter matrix, the membership degree of each sample (contribution to each class) and the class information should be considered.The idea of the fuzzy 2DPCA is that the means of each class are calculated with fuzzy membership degrees matrix of all training samples by (6) firstly, and then we average all class means to get the center of all training samples.In the definition of the new proposed method, the original mean X of all samples is replaced with the new obtained fuzzy mean X.Based on what we have described above, the algorithm of the proposed supervised fuzzy 2DPCA can be summarized as follows.
Step 1 (FKNN).The fuzzy membership degree matrix U can be computed with the FKNN algorithm in the original training image space.
Step 2. Based on U, compute the fuzzy class mean X of each class; after that, the average procedure is performed on all fuzzy class means to get the fuzzy average X of all training samples according to (6) and (7).Consider Step 3. Redefine the total scatter matrix according to the new obtained fuzzy average vector X.Then the optimal projection  matrix can be obtained by computing the optimal problem.Consider Step 4 (classify).Project all samples into the obtained optimal discriminant matrix Ãopt and classify testing samples with nearest distance classifier.

Experiments
We compare the proposed algorithm (F2DPCA) with the traditional PCA [4] and the 2DPCA [12] on three face image databases, namely, the ORL database, the YALE database, and the FERET database, to illustrate the effectiveness of our method.To further show the performance of the F2DPCA, we also compare the F2DPCA algorithm with other subspace methods that were combined with the fuzzy theory, for example, the FKF [15], the CFLDA [16], the F2DLDA [17], the 2DPCAF [18], and the FW2DPCA [19].The face image database introduction and experiment results are showed in the next subsections.

Experiment on the ORL Database.
The ORL database [20] is a basic face database for testing face recognition method, which includes 400 face images from 40 subjects, each providing 10 different images with the size of 112 × 92 pixels and 256 grey levels per pixel.Some images were taken at different times, containing variation of the lighting, facial expressions, and facial details. Figure 1 shows some sample images of different individuals.
In our experiments, 7 random images of each individual are used for training, and the remaining 3 images are used for testing.The PCA, 2DPCA, and F2DPCA are, respectively, used for feature extraction.In order to make full use of    the available image samples and to evaluate the above algorithms more accurately, we adopt a cross-validation strategy and run the recognition procedure 10 times with different training sample set and testing sample set [17].We keep nearly 95-percent image energy and the number of principal components.The FKNN parameter K is set as  = −1, where  denotes the number of training samples per  class.Finally, a nearest neighbor classifier with the Euclidean distance is employed for classification.The recognition result versus the dimension is shown in Table 1, from which we can see that our method works well and the recognition performance outperforms that of the PCA and 2DPCA.Figure 2 demonstrates that, with the number of training samples increasing, the maximal recognition rates of the PCA, 2DPCA, and F2DPCA all increase greatly, but, corresponding to the same number of training samples, the recognition rate of the proposed method is the best in the three algorithms, so the method we put forward is efficient.The contents of Table 2 give the maximum recognition rate and time complexity comparison of different face recognition algorithm.The value of the time complexity is the time to recognize the remaining testing sample images.From Table 2, we can see that the recognition rate of the F2DPCA is slightly lower than that of the CFLDA and the F2DLDA, but the time complexity is much lower than that of the CFLDA and the F2DLDA.The ideas of the CFLDA and the F2DLDA are similar to the idea of the F2DPCA, but the CFLDA and the F2DLDA include complex inversion computation, so their computation complexities are higher.The core of the 2DPCAF and the FW2DPCA is still the idea of the 2DPCA algorithm, so the overall performance is poorer than that of the F2DPCA.

Experiment on YALE Database.
There are 165 images from 15 individuals (each person has 11 different images) under various lighting conditions and facial expressions in the YALE face database [21].In our experiment, each image was manually cropped and resized to 100 × 100 pixels.The processed sample images of some person are shown in Figure 3.
The same experiment procedure as that on ORL database is performed on the YALE database.The experiment was performed using 6 random images per class for training and the remaining 5 images for testing.The PCA, 2DPCA, and the proposed F2DPCA are, respectively, used for feature extraction.The recognition rate versus the dimension is illustrated in Table 3.
Table 3 shows that the F2DPCA significantly outperforms the PCA and the 2DPCA.The maximal recognition rate of F2DPCA is 89%, while that of 2DPCA is only 87%.
Besides, as illustrated in Figure 4, the experiment results also tell us that the recognition rates of the PCA, the 2DPCA, and the F2DPCA all increase significantly with the number of training samples increasing, but the F2DPCA has better performance than the others on the whole and the F2DPCA is more robust to outliers than PCA and 2DPCA.
The contents of Table 4 tell us that the overall performance of the F2DPCA is better than that of other methods.

4.3.
Experiment on FERET Database.The proposed method was also tested on a subset of the FERET database.The FERET face image database is a standard database for testing and evaluating state-of-the-art face recognition algorithm [22,23].This subset includes 1400 images with variations in facial illumination, expression, and pose.All these images are from 200 people (each person has 7 images).In our experiment, the facial portion of each original image is cropped manually based on the location of eyes and resized to 40 × 40 pixels and without histogram equalization.Some processed images of one person are shown in Figure 5.
In this experiment, we applied the same experiment procedure as that on ORL database, and 4 random images of each individual are used for training, and the remaining 3 images are used for testing.The PCA, the 2DPCA, and the F2DPCA are, respectively, used for feature extraction.The recognition results are obtained by nearest neighbor classifier with the Euclidean distance.After repeating the recognition procedure 10 times with different training sets and testing sets, the average result is used as the final recognition rate.The recognition results versus the dimension of three methods are plotted in Table 5.The data in the table indicates that both the PCA and the 2DPCA are inferior to the F2DPCA in terms of recognition performance.
Figure 6 gives the situation of the recognition rate versus the number of training samples on the FERET database using three feature extraction methods, respectively.Obviously, the recognition performance of the proposed method is better.The experiment results from Figures 2, 4, and 6 also show us that the proposed method works better on both the YALE and the FERET, because the samples in the YALE and the FERET contain more variations of expression, illumination, and pose, so the F2DPCA is robust to these nonideal conditions.
Table 6 shows the comprehensive analysis data of the 6 methods.From the table we can see that the overall performance of the F2DPCA is better than that of other methods.

Conclusion
In order to improve the recognition performance of the 2DPCA under nonideal conditions, a new method for feature extraction, namely, the fuzzy 2DPCA (F2DPCA), is proposed in this paper.Considering the important role that the fuzzy set theory plays in processing the uncertainty, in this method we fuse the 2DPCA and the fuzzy K-nearest neighbor.According to the fuzzy K-nearest neighbor algorithm, the fuzzy membership degree of all samples to each class is calculated to construct the membership degree matrix, firstly.The average of the fuzzy means of each class is computed which is used in the redefinition of the total scatter matrix.From the above discussion contents, we can see that the proposed method makes full use of both the class information and the sample's distribution information, so it can improve recognition results.Experiments on the ORL, the YALE, and the FERET face databases showed that the new method works effectively.

Figure 1 :
Figure 1: Sample images of some persons in the ORL database.

Figure 2 :
Figure 2: Recognition rate versus number of training samples on the ORL database.

Figure 3 :
Figure 3: Sample images of some persons in the YALE database.

Figure 4 :
Figure 4: Recognition rate versus number of training samples on the YALE database.

Figure 5 :
Figure 5: Sample images of some persons in the FERET database.

Table 1 :
Maximum recognition rate on the ORL database with 7 training samples.

Table 2 :
Maximal recognition rate and time complexity comparison on the ORL database with 7 training samples.

Table 3 :
Maximum recognition rate on the YALE database with 6 training samples.

Table 4 :
Maximal recognition rate and time complexity comparison on the YALE database with 6 training samples.

Table 5 :
Maximum recognition rate on the FERET database with 4 training samples.

Table 6 :
Maximal recognition rate and time complexity comparison on the FERET database with 4 training samples.