A Fuzzy Kernel Maximum Margin Criterion for Image Feature Extraction

Based on kernel principal component analysis, fuzzy set theory, and maximum margin criterion, a novel image feature extraction and recognitionmethod, called fuzzy kernel maximummargin criterion (FKMMC), is proposed. In the proposedmethod, two new fuzzy scatter matrixes are redefined. The new fuzzy scatter matrix can reflect fully the relation between fuzzy membership degree and the offset of the training sample to subclass center. Besides, a concise reliable computational method of the fuzzy between-class scatter matrix is provided. Experimental results on four face databases (AR, extended Yale B, GTFD, and FERET) demonstrate that the proposed method outperforms other methods.


Introduction
The dimensionality reduction is an important research topic in computer vision and pattern recognition for many years [1][2][3].As is well known, lots of methods had lower efficiency and limitations in high dimension case.Data transformation is an essential method of dimensionality reduction, which can transform high-dimensional data to relatively low-dimensional space according to certain criterions, and the problem can be solved by an existing method in lowdimensional space.In order to achieve this goal, a variety of approaches were proposed.Most famous methods might be Principal Component Analysis (PCA) [1] and Linear Discriminate Analysis (LDA) [2].On this basis, a number of improvement algorithms have been proposed.
PCA is a kind of unsupervised learning algorithms; it reflects the overall variability of the data.Each axis has different contribution to this variety.It is well known that the axes corresponding with the larger eigenvalue possess bigger contribution, while the axes corresponding with the smaller eigenvalue often reflected noise or details.Therefore, the axes corresponding with the larger eigenvalue are chosen as transformational operator, and it not only retains the most useful information of the original image, but also reaches the effect of smoothing and denoising.Due to the fact that PCA is a linear method based on the Gaussian distribution, which is not suitable for non-Gaussian distribution case, for this purpose, the kernel-based principal analytical method (KPCA) [3] was proposed, which is nonlinearly related to the input space.For the aim of dimensionality reduction and data interpretation, a number of principal axis selection and sparse methods have been proposed, such as the latest kernel entropy principal component analysis (KECA) [4], which chooses the principal axes by utilizing a maximized sample density Renyi entropy.
LDA is a traditional supervised learning method for dimensionality reduction, which hopes to be able to obtain a transformation operator by maximizing the between-class distance, while minimizing the within-class distance.Because algorithm needs to compute inverse matrix in its solving process, especially in the small size sample (SSS) situation, the within-class scatter matrix often is singular, which causes LDA to fail to run.To solve the singularity problem, a mass of improved algorithms has been proposed.
The LDA+PCA [5] is a well-known null subspace method, which only calculates the maximum eigenvectors to form the transformation matrix when the within-class scatter matrix is of full rank; otherwise, it first runs PCA and then performs LDA.The regularized discriminant analysis (RDA) [6] tries to obtain more reliable estimates of the eigenvalues by correcting the eigenvalue distortion with the ridge-type regularization.Penalty discriminant analysis (PDA) [7] desires not only to overcome the small size sample problem but also to smooth the coefficients of discrimination vectors for better interpretation.The Inverse Fisher discriminant analysis (IFDA) [8] modifies the procedure of PCA and derives regular and irregular information from the within-class scatter matrix by inverse Fisher discrimination criterion.Locality Preserving Projections (LPP) [9] are a linear subspace learning method derived from Laplacian Eigenmap, which possesses a significant advantage, and it can generate an explicit map and then minimize the local scatter of the projected data.The local geometrical structure based tensor subspace analysis (TSA) [10] captures an optimal linear approximation to the face manifold in the sense of local isometrics.Maximum margin criterion (MMC) [11] used the difference of both between-class scatter and within-class scatter as discrimination criterion.Linear Laplacian discrimination (LLD) [12] formulates the within-class scatter matrix and the between-class scatter matrix by means of similarityweighted criterions.The similarities here are computed from the exponential function of pairwise distance in the original sample spaces, which is protected from various forms of metrics.So LLD can be applicable to any linear space for classification.Kernel linear discriminant analysis (KLDA) [13] is equivalent to kernel principal component analysis (KPCA) plus Fisher linear discriminant analysis (LDA).The optimal solution for KLDA is obtained by solving a general eigenvalue problem, but the within-class scatter matrix is often singular.The fuzzy inverse Fisher discriminant analysis (FIFDA) [14] is built on the inverse Fisher discrimination criterion and fuzzy membership degree.In this method, a membership degree matrix is calculated using FKNN, and then the membership degree is incorporated into the definition of the betweenclass scatter matrix and within-class scatter matrix to get the fuzzy between-class scatter matrix and fuzzy withinclass scatter matrix.The 2-dimensional linear discriminant analysis (2DLDA) [15] is based on 2D image matrices; that is, the image matrix does not need to be transformed into a vector.Instead, the image between-class scatter matrix and within-class scatter matrix can be constructed directly from the image matrices, and their eigenvectors are computed for image feature extraction.The Laplacian bidirectional maximum margin criterion (LBMMC) [16] formulates the image total Laplacian matrix, image within-class Laplacian matrix, and image between-class Laplacian matrix using the sample similar weight that is widely used in machine learning.Twodimensional MMC (2DMMC) has been proposed [17], which aims to find two orthogonal projection matrices to project the original image matrices to a low-dimensional matrix subspace.In the projected subspace, a sample is close to those in the same class but far from those in different classes.Twodimensional maximum margin criterion (B2D-MMC) [18] introduces a blockwise model for face recognition, performing one-side subspace projection inside each block manifold, in which a block is close to those belonging to the same class but far from those belonging to different classes.The unilateral projection and the blockwise learning can avoid iterations and alternations as in current bilateral projection based two-dimensional feature extraction approaches and have advantages in complexity and locality.In recent years, representation-based face recognition methods [19,20] have caused wide public concern in pattern recognition, but they only focus on classification techniques.In this paper, we pay close attention to feature extraction, rather than classification techniques.
In particular, the latter methods focused on embedding weight into the scatter matrix to improve the performance of an algorithm.I think this idea is highly significant because class attribution of training sample has an obvious ambiguity since training samples are not completely separable among the subclasses and are often partial overlapping.Moreover, the kernel and fuzzy approaches are ideal mathematical tools to solve such problems.
In the state of the art, kernel and fuzzy technology cannot be combined with each other, and it is uncertain how to select the bandwidth of the kernel.In the improved algorithms of LDA, weighted scatter matrix cannot reflect sufficiently the interrelation between training sample and subclass prototype.Otherwise, the eigendecomposition of the scatter matrix is influenced by its calculation since summation operator produces a computational error.
In this paper, we propose a kernel fuzzy maximum margin criterion (KFMMC) for feature extraction and recognition.This method is accomplished by means of a twostage procedure.Firstly, the data are transformed into the kernel subspace by kernel principal component analysis (KPCA) with 98% choosing ratio.Secondly, in order to simplify calculation, we construct the fuzzy between-class scatter matrix and fuzzy within-class scatter matrix on kernel subspace by Euclid distance based basic fuzzy membership.Then the algorithm maximizes the difference of both fuzzy between-class scatter matrix and within-class scatter matrix for obtaining transformational operator.It can integrate efficiently both kernel feature analysis and classification information for achieving dimensional reduction.Our main work can be summarized as in the following several aspects.
(1) The proposed algorithm replaces local Laplacian factor in LBMMC with fuzzy membership degree of training samples to sample subclass.It fully embedded fuzzy membership degree into the between-class scatter matrix and the withinclass scatter matrix by transforming the between-class scatter matrix, which is different from the fuzzy embedding way of FIFDA.(2) The variance of training samples is used as the bandwidth of the Gaussian kernel, which can avoid effectively uncertainty of parameter settings, and it can meet the properties of Gaussian distribution.In order to remove noise in the training sample and retain initial data information in kernel subspace, we only abandon partial eigenvectors corresponding least eigenvalues in accordance with 2% summation of all eigenvalues.In the kernel subspace generated by KPCA, the algorithm performs the MMC algorithm embedded fuzzy factor.Finally, we can obtain a succinct kernel transformational operator.The proposed algorithm does not need an iterative procedure as other feature decomposition-based algorithms and does not need to solve inverse matrix.Since it is not necessary to compute the inverse matrix, the small size sample (SSS) problem is alleviated in traditional LDA and its improvements.
The organization of this paper is as follows.In Section 2, the KPCA, MMC, LLD, and FIFDA are reviewed briefly.In Section 3, a new method of embedding fuzzy factor into scatter matrix is presented, and new succinct computing formulas for two scatter matrices are described in detail.In Section 4, the proposed algorithm and its computational complexity (includes training time and testing time) are discussed in detail.In Section 5, experiments are presented to demonstrate the effectiveness of the proposed algorithm.Conclusions are drawn in Section 6.

Related Works
For the sake of convenient description, in this section, we introduce simply several corresponding algorithms in connection with our research.

Maximum Margin Criterion.
Suppose that there are  known pattern classes in training data set , and   is the number of training samples in th class.The between-class scatter matrix and within-class scatter matrix can be written as ( 3) and ( 4), respectively, where  is the total number of training samples,  = ∑  =1   ,    denotes th training sample in the th class,   is the mean of training samples in th class, and  0 refers to the mean of all training samples.
In classical Fisher discriminant analysis, the discrimination criterion is maximizing the ratio of the betweenclass scatter to the within-class scatter.MMC defined the difference of between-class scatter matrix and within-class scatter matrix as discriminant rule and obtained a transformational matrix ,  = [ 1 ,  2 , . . .,   ], and      = 1.The concerned problem can come down to the following constrained optimization: Solving this optimization problem is really the feature decomposition of   −   .The generated eigenvectors are sorted in descending order according to the corresponding eigenvalue. consists of first  eigenvectors.Comparing with LDA, the main merit of MMC is to avoid calculating inverse withinclass scatter matrix.However, the within-class scatter matrix is often the singular matrix.[12].Inspired by the application of Laplacian Eigenmaps in manifold learning and its linearization LPP in clustering and recognition, Zhao et al. [12] proposed Linear Laplacian Discrimination.Its basic theory can be described as follows.Supposing that   is a dimension sample space, ‖ ⋅ ‖   is the Euclidean norm in the original sample space.Weight    is defined by

Linear Laplacian Discrimination (LLD)
Sign   = diag(  1 ,   2 , . . .,     ), and 1   denotes an all-one column vector of length   .Let is a 0-1 indicator matrix of th class and satisfies   =   , where   is a training sample set of th class.Let Then within-class scatter matrix can be calculated by For between-class scatter matrix, weight   is defined as follows: Let and then between-class scatter matrix can be defined as Finally, transformational operator  will be solved satisfying Similar to what the authors pointed out, like LDA, LLD encounters the computational troubles as well when the within-class scatter matrix is singular.Although the authors proposed some methods to address this issue, this problem was not resolved in essence.Moreover, how to assign parameter  in the expression computing weight  is a problem, and two weights in both within-class scatter matrix and betweenclass scatter matrix are in disagreement.

Fuzzy Inverse Fisher Discriminant Analysis (FIFDA).
In FIFDA, fuzzy membership degree and each class center are obtained through FKNN algorithm.The fuzzy membership degree of training sample can be computed as follows: if  = the same as the th label of the pattern if  ̸ = the same as the th label of the pattern, (14) where  denotes neighbor size and   is a the number of the neighbors of the th sample in the th class.  satisfies two obvious properties: The mean vector of each class is The corresponding fuzzy within-class scatter matrix and the fuzzy between-class scatter matrix can be defined as ( 17) and (18), respectively, where  is a constant which controls the influence of fuzzy membership degree.Finally the fuzzy inverse Fisher criterion function can be defined as follows: In this method, the fuzzy between-class scatter matrix and fuzzy within-class scatter matrix are redefined according to FKNN.This method reduces the sensitivity to substantial variations between face images caused by varying illumination, viewing conditions, and facial expression.But we can find out that the embedded way of fuzzy membership degree is not very appropriate in the definition of fuzzy between-class scatter matrix.The main reason is that the fuzzy membership degree reflects relation between a training sample and some class center, but (m  −  0 ) expresses the difference between the fuzzy mean of class  and total training sample mean.So it is improper to take    as the weight of (m  −  0 ).Besides, the parameter  in KNN affects also the performance of FIFDA.

Fuzzy Maximum Margin Criterion (FMMC)
Based on the statement in Section 2, there are several problems that merit our attention.In the LLD method, the weight is a Gaussian function, and the important property of Gaussian function is smoothing and denoising.But it is uncertain how to assign parameter .In particular, it cannot provide further classifying information when the training samples overlap.However, fuzzy theory can deal better with this problem.FIFDA method defines fuzzy membership degree with the adjacent properties between training samples.But we can see that there are some artificial factors in the fuzzy membership definition.Moreover, the fuzzy betweenclass scatter matrix in FIFDA cannot integrate tightly fuzzy membership degree and the samples, and the latter is used to calculate the fuzzy membership degree.In this paper, in order to avoid the uncertainty of fuzzy membership in FIFDA, we employ traditional Euclidean distance based fuzzy membership and redefine fuzzy between-class scatter matrix and within-class scatter matrix.Finally, we give a succinct way of computing new fuzzy between-class scatter matrix. Suppose Corresponding fuzzy within-class scatter matrix can be defined as Let   = diag{ 1 ,  2 , . . .,    }, and 1   indicates an all-one row vector of length   .  satisfies   =   , where   is a matrix with the samples in class  as columns.Set Then because Therefore, the fuzzy between-class scatter matrix can be defined as follows: In order to reduce computational complexity, the above fuzzy between-class scatter matrix can be further be simplified as follows: Let  = {  } × , let  = [ 1 ,  2 , . . .,   ], and let   = diag{√ 1 /, √ 2 /, . . ., √  /}.
and then we have Therefore, the fuzzy maximum margin problem can be translated into an objective optimization problem as We can obtain transformational operators from eigendecomposition of   −  .That is, firstly, we eigendecompose   −   and then sort the acquired eigenvalues in descending order, which are denoted by and corresponding eigenvectors are  1 ,  2 , . . .,   .Finally we obtain transformational operator which consisted of the first eigenvectors according to dimensional reduction requirement.
In the above statements, the standard between-class scatter matrix is unfolded as the expression about the difference between training sample and subclass mean   −   , which is convenient for fuzzy membership degree embeds to betweenclass scatter matrix.Since fuzzy membership degree   reflects the affiliation of sample   to subclass , it is the most efficient way to let fuzzy membership degree   act directly on   −   for reflecting the constraint role of fuzzy membership to sample deviation.It enjoys larger superiority than any other fuzzy scatter matrix definition.The later experiments will reveal the flexibility of the definition.
In calculating fuzzy between-class scatter matrix, the triple accumulation operation is transformed into the succinct matrix operation.It can use effectively the matrix computation superiority provided by MATLAB.In particular, (28) shows that fuzzy between-class scatter matrix can be computed by the product of both matrix  and its inverse matrix.It can effectively avoid that   is an inexact real symmetric matrix caused by machine precision and computational errors, whereas the scatter matrix obtained by summation is usually inexact real symmetric matrix.In particular, when the number of training samples is very large, this phenomenon easier emerges.However, the real symmetry is very important to ensure that the obtained eigenvalues are real in successive eigendecomposition.In reality, in experimentation on the AR face database, we can find that some eigenvalues are a complex number when we calculated directly fuzzy between-class scatter matrix according to (25).We are not seeking to let this situation occur in feature extraction process.Therefore, our way offers a kind of concise reliable computational method of fuzzy between-class scatter matrix for the other researchers to embed fuzzy factor (or other weights) into the scatter matrix.

Fuzzy Kernel Maximum Margin Criterion Based Algorithm for Feature Extraction
In this section, we provide a novel fuzzy kernel maximum margin criterion (FKMMC) for feature extraction, which consists of FMMC and KPCA.In KPCA step, we obtain transformational operator   according to 98% selection ratios for eigenvalue.Other eigenvectors are abandoned to achieve the purpose of denoising.The concrete algorithm can be described as follows.
Step 1. Compute the standard variance of training samples and refer to it as .
Step Step 4. Projecting original samples into kernel subspace, we have   =     , for convenience sake, yet let  ≜   .
Step 6. Compute fuzzy within-class scatter matrix   and fuzzy between-class scatter matrix   according to ( 23) and (28).
From the above algorithm we can see that the algorithm only includes matrix product, matrix transposition, diagonalization, and eigendecomposition.The eigendecomposed matrixes are all real symmetrical matrixes and the matrix possesses real eigenvalues according to matrix theory.Thus, these matrixes can be all eigendecomposed, and the algorithm does not need to calculate inverse matrix.Therefore, the total computational process is feasible.In Algorithm 1,   is a  ×  matrix and  FMMC is a  ×  matrix, so that the output of Algorithm 1  FKMMC is a  ×  matrix.In Algorithm 2,   is a 1 ×  vector, and its output is a  × 1 vector.In the image recognition, commonly satisfy  ≪  ≪ .Therefore, the proposed algorithm can reduce efficiently the data dimension.Since the kernel map makes data in kernel space to be separated in an easier way, the FMMC can provide more classified information.So the proposed algorithm includes more classified information while data dimension is reduced.On the other hand, the computational complexity of the sample standard variance is ( × ), and the kernel matrix 's is ( 2 × ).The computational complexity of eigendecomposition kernel matrix  is ( 2 ).The computational complexity of both   and   is all ( 3 ).Considering  <  ≪ , so the computational complexity of the proposed algorithm is ( 2 ×), that is, the computational complexity of computing the kernel matrix.

Experiments on Georgia Tech Face Database.
The face image database used in our experiments is the Georgia Tech Face Database (GTFD) [21,22], which consists of 50 subjects with 15 face images available for each subject.These face images vary in size, facial expression, illumination, and rotation along the image plane direction and perpendicular direction to the image plane.In our experiments, all images in the database were manually cropped and resized to 60 × 40.After the images were cropped, most of the complex background has been excluded.Also, in-plane rotation was partially eliminated, but the out-of-plane rotation was left untouched.They are further converted to gray level images for both training and testing purposes.
In our first experiment, we choose first  samples per individual for training and the remaining individual for testing, and let  = 2, 3, 4, 5, 6, respectively.For each , KPCA, KLDA, LLD, 2DLDA, LPP, FIFDA, TSA, LBMMC, 2DMMC, B2DMMC, and the proposed FKMMC are used for feature extraction, respectively.In the PCA stage of LLD and FIFDA, the eigenvectors are selected as transformational operator keeping nearly 99% image energy.In the FIFDA, let  = 2, and the FKNN parameter  is set as  =  − 1.
In the LBMMC algorithm, the parameter  of the similarity is set as  = 100.In the LPP and TSA algorithms,  is set as default.In the 2DLDA, LPP, TSA, LBMMC algorithms, the selected eigenvectors (projection vectors) are full rank.In the KPCA, KLDA, LLD, and the proposed FKMMC, the number of selected eigenvectors (projection vectors) is 20% of the number of total training samples.In the TSA algorithm, the number of iterations is taken to be 10.In the B2D-MMC algorithm, the number of the layer is set as 6.Finally, a nearest neighbor classifier with Euclid distance is employed.The final results are given in Figure 1.From Figure 1, we can see that the proposed method enjoys the best recognition rate.Although TSA and 2DLDA near the result of the proposed algorithm, TSA needs 20 eigendecomposition for 10 recursions, and 2DLDA needs to calculate the Moore-Penrose pseudo inverse of a matrix.But the proposed algorithm does not need to calculate the inverse of matrix and recursion, and moreover, its stability and true recognition rate are also higher than 2DLDA and TSA.In the second experiment, we randomly choose  = 2, 3, 4, 5 samples from every individual for training, while the remaining samples are used for testing.The various assumptions in the first experiment will be retained.The test results are reported in Table 1, which lists the average recognition rates crossing 20 runs of each algorithm under the nearest neighbor classifier with Euclid distance metrics and their corresponding standard deviation (std).Table 1 shows that the result of our method is a little better than that of TSA and 2DLDA and is much better than the other methods.The little std shows that our method enjoys more stability.This result further justifies the conclusion of the first experience.

Experiments on Extended Yale B Database.
Extended Yale face database B contains 2535 images of 39 human subjects (each person providing 65 different images) under various poses and illumination conditions.In our experiment, we choose its cropped version images set, which was finished by Lee et al. [23].All images were resized to 60 × 40.
In the experiment, we choose randomly  = 2, 3, 4, 5 samples from every individual for training and the remaining images for testing.KPCA, KLDA, LLD, 2DLDA, LPP, FIFDA, TSA, LBMMC, 2DMMC, B2D-MMC, and the proposed FKMMC are used for feature extraction.In the PCA stage of LLD, FIFDA, the eigenvectors are selected as transformational operator keeping nearly 99% image energy.In the LBMMC algorithm, the parameter  of the similarity is set as  = 100, and in the LPP and TSA algorithms,  is set as default.In the 2DLDA, LPP, TSA, and LBMMC algorithms, the selected eigenvectors (projection vectors) are full rank.
In the KPCA, KLDA, LLD, and the proposed FKMMC, the number of selected eigenvectors (projection vectors) is 15% of the number of total training samples.In the TSA algorithm, the number of iterations is taken to be 10.In the B2D-MMC algorithm, the number of the layers is set as 6.Finally, a nearest neighbor classifier with Euclid distance is employed.The final results are given in Table 2 and Figure 2. Just like you see, the proposed method has the best recognition rate.Table 2 and Figure 2 show that our proposed algorithm shows better performance as compared with other algorithms on the extended Yale B database.We can see also that the results of both 2DLDA and KLDA are nearer to our algorithm, but 2DLDA needs to compute the Moore-Penrose pseudo inverse of a matrix which costs more calculation time than matrix multiplication.At the same time, we also see that the performance of TSA is not perfect on the extended Yale B database although TSA has a good performance on the Georgia Tech face database, and so our algorithm is more stable than TSA.

Experiments on AR Database and FERET Face Database.
The AR face database [24] was created by Aleix Martinez and Robert Benavente in the Computer Vision Center (CVC) at the U.A.B.It contains over 3300 color images corresponding to 126 people's faces (70 men and 56 women).Images feature frontal view faces with different facial expressions, illumination conditions, and occlusions (sun glasses and scarves).The pictures were taken at the CVC under strictly controlled conditions.No restrictions on wear (clothes, glasses, etc.), makeup, hair style, and so forth were imposed on participants.Each person participated in two sessions, separated by two weeks (14 days).The same pictures were taken in both sessions.In our experiments, each image was manually cropped and resized to 60 × 40.
The FERET face database [25]   We repeat the second experiment in Section 5.2 and choose randomly  = 2, 3, 4, 5 samples from every individual for training on the AR face database and three samples from every individual for training on the FERET face database.The results are reported in Tables 3 and 4, respectively, and are shown in Figures 3 and 4. On the AR face database, the proposed algorithm is not obviously advantage than KLDA and TSA for true recognition rate, but our algorithm enjoys a lower std.This result shows that our algorithm has more opportunities to get high recognition rate when the number of testing samples is larger.On FERET face database, our algorithm is evidently superior to other algorithms at both true mean recognition rate and std.

Friedman Test and Nemenyi Test.
In order to compare with relative recognition methods, we use Friedman test and Nemenyi test [26,27].The Friedman test is a nonparametric equivalent of the repeated-measures ANOVA [27].It ranks the algorithms for each data set separately; the best performing algorithm gets the rank of 1, and the second best rank is 2, . .., as shown in Table 5.Let    be the rank of the th of  algorithms on the th of  data sets.The Friedman test compares the average ranks of algorithms,   = (1/) ∑     .Under the null-hypothesis, which states that all the algorithms are equivalent and so their ranks   should be equal, the Friedman statistic is distributed according to  2  with  − 1 degrees of freedom, when  and  are big enough (as a rule of a thumb,  > 10 and  > 5), and a derived statistic is In my experiments, the ranks of each method and its average ranks are listed in We use  = 0.1 and get  0.1 = 2.978 for comparisons among eleven methods.The CD is 3.492.The Nemenyi test is shown in Figure 5.In the figure, the mean rank of each method is denoted by a circle.The horizontal bar, which is across the circle, indicates the "critical difference." Two methods are significantly different if two bars not overlapping in horizontal direction; otherwise it means the two methods are similar in the ranks.For the recognition results, the proposed method always ranks the 1st among the competitors.There is no significant difference between FKMMC and TSA, but only a half overlapping.We can see that the proposed method presents significant advantage compared to other methods besides TSA.In particular, the proposed method is improvement over LBMMC, but its bars are not overlapping in horizontal direction.This shows my improvement is a meaningful work.
From the above, we can see that the proposed method enjoy better performance than other competitors.In the proposed method, the kernel technique is used to enhance the separability of samples set, and we take the fuzzy set theory to reduce the sensitivity to substantial variations between face images caused by varying illumination, viewing conditions, and facial expression since the fuzzy membership degree can reflect relation between a training sample and some class center.Using the two techniques, the proposed method FKMMC improves markedly the performance of the original method LBMMC in two respects of true recognition rate and training time.Although it is seen that the proposed method costs more running time than LPP, 2DLDA, LLD, 2D-MMC, and B2D-MMC from Table 6, its average rank has significant advantage than that of those methods.For kernel approaches, the average training time and the test time of the proposed method are lower than those of KPCA and KLDA due to the fact that our method adapted a new way of calculating fuzzy kernel scatter matrix.

Conclusion
In the pattern recognition, the feature extraction techniques are widely employed to reduce the dimensionality of data and to enhance the discriminatory information.In this paper, fuzzy kernel maximum margin criterion method is proposed.The proposed method absorbs efficiently the advantages of both the kernel method and maximum margin criterion and redefines the fuzzy between-class scatter matrix.The new fuzzy scatter matrix can fully reflect the relation between fuzzy membership degree and the offset of the training sample to subclass center.The new methods can effectively extract the most discriminatory information while they achieve dimensional reduction and do not suffer from the small sample size problem.The final transformational operator is a  ×  matrix.In image recognition, the number of training samples  is far smaller than the sample dimension .Therefore, the proposed method is faster than non-kernel method LBMMC if we do not consider the time cost of computing kernel projecting.The experimental results show that the proposed method in this paper is effective and robust.
In particular, the definition of fuzzy between-class scatter matrix can offer a kind of concise reliable computational method for the other researchers hoping to embed fuzzy factor (or other weights) into the scatter matrix.

Notations
KPCA: Kernel principal component analysis MMC: Maximum margin criterion LLD: Linear Laplacian discrimination FIFDA: Fuzzy inverse Fisher discriminant analysis FKNN: Fuzzy kernel nearest neighboring LBMMC: Laplacian bidirectional maximum margin criterion SSS: The small size sample KECA: The latest kernel entropy principal component analysis
is from the FERET Program sponsored by the US Department of Defense's Counterdrug Technology Development Program through the Defense Advanced Research Projects Agency (DARPA), and it has become the de facto standard for evaluating state-ofthe-art face recognition algorithms.The whole database contains 13,539 face images of 1565 subjects taken during different photo sessions with variations in size, pose, illumination, facial expression, and even age.The subset we use in our experiments includes 200 subjects each with four different images.All images are obtained by cropping based on the

Table 1 :
Average recognition rates and standard deviation on the GTFD face database for sample numbers per class  = 2, 3, 4, 5.

Table 2 :
Average recognition rates and standard deviation on the extended Yale B face database for sample numbers per class  = 2, 3, 4, 5.

Table 5 .
Obviously, the methods number  = 11, the number of experiments  = 16 in this paper, and corresponding is distributed according to the  distribution with 11 − 1 = 10 and (10 − 1) × (16 − 1) = 150 degrees of freedom.The critical value of (10, 150)for  = 0.01 is 2.4412.Since the   is 16.2241,The Friedman test rejects the null-hypothesis.In order to check how the performance of two methods is significantly different, we

Table 3 :
Average recognition rates and standard deviation on the AR face database for sample numbers per class  = 2, 3, 4, 5.

Table 4 :
Average recognition rates and standard deviation on the FERET face database for sample numbers per class  = 2, 3, 4, 5.