A Unified Factors Analysis Framework for Discriminative Feature Extraction and Object Recognition

1School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China 2School of International College, Huanghuai University, Zhumadian, Henan 463000, China 3School of Computer Science and Technology, Hubei University of Science and Technology, Xianning 437100, China 4National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China


Introduction
Feature extraction and selection are both a critical and challenging component of pattern recognition [1,2].Some basic principles can be used to guide the design of a feature extractor; however, the design work is essentially an experimental science depending on its field and problem being addressed.The utilization of algorithms for dimensionality reduction in supervised or unsupervised learning tasks has attracted a lot of attention in recent years.The relative simplicity and effectiveness of the linear algorithms for both Principal Component Analysis (PCA) [3] and Linear Discriminant Analysis (LDA) [3,4] have made these two algorithms very attractive.The Locality Preserving Projections (LPP) [5] and Discriminant Canonical Correlations (DCC) [6] are other linear algorithms that have been proposed for dimensionality reduction.Three other recently developed algorithmstermed ISOMAP [7], LLE [8], and Laplacian Eigenmap [9]can be utilized when conducting nonlinear dimensionality on a dataset that lies on or around a lower-dimensional manifold.Additionally, in order to extend the linear dimensionality reduction algorithms to nonlinear ones, the kernel trick approach [10] has also been applied to perform linear operations on higher or infinite dimensional features which are transformed by a kernel mapping function.What is more, a number of algorithms [11][12][13][14] have recently been proposed to carry out dimensionality reduction on objects encoded as matrices or tensors of an arbitrary order.
All the aforementioned algorithms have been designed according to specific intuitions.Solutions are given through the optimization of both intuitive and pragmatic objectives, with the aim of extracting low-dimensional, simplified, and robust features from the high-dimensional, redundant, disturbed, and distorted original data.Content and style factors can be seen as the two components of an object [15].During pattern recognition field, the information explored as base of recognition is defined as content factor and the information disturbed recognition is defined as style factor.Two examples are given below.One is for face recognition.
A front face with neutral expression, no occlusion, and normalized illumination represents the content factor of a face image, whereas a face with variation of poses, illumination, and expressions represents the style factor, as shown in Figure 1.The occurrence of interference between content and style factors necessitates their separation in order to achieve accurate classification capabilities.The other is for speech recognition.In the process of recognizing the text of speech, the style factor includes timbre, tone, and emotion of the speaker.These elements should be regarded as background noise and minimized, as they do not provide any useful information related to speech text.However, the situation is opposite for speaker recognition since the favorable element is timbre information, which is defined as content factor here.The influence of speech text which is defined as style factor should be minimized.Thus the primary objective of the linear, nonlinear, manifold, or tensor decomposition methods is to reduce the style factors of the object.
In this paper, we present a general framework called factors analysis, along with its linearization, kernelization, and tensorization, which offers a unified view for understanding and explaining many of the popular dimensionality reduction algorithms such as the ones mentioned above.The main objective of factor analysis is to find the essential lowdimensional features by simultaneously minimizing the style factors and maximizing the content factors of the object through the method of projection.There are two vital steps in the factor analysis framework.The first is to construct a factor separating objective function which includes the construction of a partition and weight matrix.The second step is to design the space mapping function.Using the factor analysis framework as a platform, we develop two novel dimensionality reduction algorithms, FA-LDA and FA-LPP.FA-LDA is to overcome the limitations of LDA, which solves the problem that the intraclass and interclass scatter matrix in LDA are not optimal through setting proper weights.FA-LPP is to overcome the limitations of LPP, which improves the design of partition so that the objects not only preserve the locality similarity but also have abundant identification information.

Factors Analysis: A General Framework for Feature Extraction
2.1.Factors Analysis.Content and style factors can be seen as two independent elements which influence the object and determine the observation [15], as mentioned before.The information contributed to recognition is defined as content factor and the negative information is style factor during pattern recognition.For a general classification problem, the sample set is represented as  = [ 1 ,  2 , . . .,   ],   ∈ R  , where  denotes the sample number and  denotes the feature dimension.In practice, the feature dimension  is very high usually, including redundancy and interference information.In order to alleviate the curse of dimensionality, it is necessary to transform the original data  from a high-dimensional space to a low-dimensional space which is represented as . The essence of dimensionality reduction is to find a mapping function  :  → ŷ that transforms  ∈ R  into appropriate low-dimensional space ŷ ∈ R   , where typically  ≫   : ŷ =  () . ( According to the various cases, the mapping function can be explicit or implicit and linear or nonlinear.A dimensionality reduction algorithm based on factor analysis is proposed in this paper, which is different from traditional methods.The objective of the dimensionality reduction algorithm based on factor analysis is to find a low-dimensional dataset  from the original dataset , in which the style factor is suppressed as much as possible.The objective function is represented as follows: where    ,    are the content and style variance matrices after mapping (factors separating), respectively,  is the total number of partitions,   is the sample number of the th partition,   is the prior probability of the th partition (including the th class for supervised methods and the th local manifold space for unsupervised methods, generally named as the th partition),    is the mean vector of the th partition after factors separating,  ()   ,  ()  are the th and th sample vectors of the th partition, respectively, after factors separating,    ,    are weight matrices of content variance and style variance, respectively, and  is the constraint constant, which is different in different cases.Factor analysis criteria make the content variance of the objects between partitions as high as possible and the style variance of the objects within a partition as low as possible.In order to eliminate the style factor of the objects maximally, the style factor should be minimized and the content factor should be maximized after mapping transformation.

General Framework for Feature Extract.
There are two vital steps for feature extraction based on factor analysis framework; one is the design of factor separating objective function, mainly including the design of partition and weight matrix, and the other is the design of space mapping function.Partition is designed based on the class information with supervised methods or local neighborhood through unsupervised or semisupervised methods of the samples.Weight matrix is designed according to the classification performance for purpose of finding the optimal distribution scheme, which is biased to the weakly separated samples in the case without sacrificing the strongly separated samples.Space mapping function is designed to transform the samples from input space to feature space.Thus, the research of factor analysis framework can proceed from the following two aspects: the first is to propose different space mapping transformation aimed at different applications, and the second is to find the design scheme of factor separating objective function that matches the pattern recognition best.

Space Mapping Transformation.
There are three categories of space mapping transformation algorithms, which are linear space, kernel space, and tensor space mapping transformation, respectively.
(a) Linear Space Mapping Transformation.The relationship of datasets  and  is supposed to be linear in linear space mapping transformation; namely,   =     .Thus the objective function (see (2)) is converted to where  0  is the mean vector of the th partition in original space,  ()   ,  ()  are the th and th sample vectors of the th partition samples in original space, and   ,   are the content and style variance matrices in original space, respectively, defined as follows: (b) Kernel Space Mapping Transformation.The direct way to extend methods from linear projections to nonlinear cases is to utilize the kernel trick.The intuition of the kernel methods is to map the data from the original input space to a higherdimensional Hilbert space (kernel space) as  :  → .The content and style variance matrices   ,   in kernel space  are defined as follows: where =1 ( 0  ) is the mean vector of the th partition samples in kernel space  and ( ()   ) is the kernel mapping to  ()   .
(c) Tensor Space Mapping Transformation.Tensorization is another extension of linear methods, and it is a multiple linear extension.Firstly some terminologies [16] on tensor operations are reviewed before introducing the tensor space mapping transformation.
Tensor is the extension of vector and matrix.Tensor  ∈ R  1 × 2 ×⋅⋅⋅×  represents   components in -dimension space, where every component is a function of coordinates, linearly changing with the coordinates following certain rules. is the order of the tensor.When  = 1  is simplified to a vector and when  = 2  is simplified to a matrix.Expanding the tensors can facilitate its matrix representation.The -order expansion of tensor  is  () ∈ R   ×( +1 ⋅⋅⋅   1  2 ⋅⋅⋅ −1 ) .The -mode product between a tensor  and a matrix  ∈ R   ×  is defined as  =  × , or  () =  () for the form of tensor expansion.

The Design of Factor Separating Objective Function.
The design of factor separating objective function is mainly about separating the content and style factors of objects, including the design of partitions and weight matrix.
There are two schemes for partition design recently; one is based on supervised algorithms, such as LDA, in which the samples are partitioned according to the class information of samples; the other is based on unsupervised algorithms, such as LPP, in which the samples are partitioned according to the -nearest neighborhood of samples.
Variant weighting schemes have been proposed for different cases, which can be classified into four categories according to the weighting algorithms for scatter matrix between classes (content variance matrix   ) in supervised face recognition methods based on weighted Linear Discriminant Analysis [17].
where ed is the Euclidean distance between the mean vectors of two classes.
where Δ  is the Mahalanobis distance between the mean vectors of two classes and erf() = (2/√) ∫  0  − 2  is the error function.
(3) Confusing Information Algorithm [19]. where The weighting scheme about the style variance matrix   gets little attention recently.According to the weighting scheme in LPP method, a new weighting scheme for   based on thermonuclear similarity is proposed in this paper: )   ,   belong to same partition It deserves deep research on how to distribute the weights effectively and reasonably.The optimal weighting scheme will never abandon any object that may be classified correctly and never waste any weight value.In other words, it will help all the objects which need help as much as possible in order to achieve a win-win situation.

Unifying Various Algorithms of Feature Extraction.
The relationship between several classical subspace dimensionality reduction algorithms and factor analysis framework is discussed in this section.Actually, most subspace algorithms that are commonly used recently can be represented and explained by factor analysis framework.After unifying current subspace algorithms in framework, it is obvious that these algorithms are just different in the basis of factor construction and constraint of factor separation, though the motivation or objective of these algorithms is varying.Next we will introduce the relationships between factor analysis and some typical methods of dimension reduction based on subspace.

The Relationship with LDA.
The objective of LDA is to find the optimal projection direction of classification that maximizes the ratio between the intraclass and interclass scatter matrix: where  is the class number,   is the prior probability of class ,  0  is the mean vector of class  before projection,  0 is the overall mean vector before projection, and   is the sample number of class .Refer to (3) and (4), set the weight matrix of   and   as all 1 matrix, and then After further conversion, Substitute ( 17) into ( 14); then Thus, LDA is a particular case of factor analysis framework when the entire weight matrix is 1 and the basis of partition is class information.The content factor is the accumulation of mean variance among each class and the style factor is the variance of each sample in the same class.[20] and DLA [21].In order to overcome the hypothesis of LDA that the samples should be of Gaussian distribution, MFA redefines the scatter matrix within class and between classes of LDA, and the objective function is as follows: It can be concluded that MFA is also a particular case of factor analysis framework; only it considers both the class information and nearest neighborhood of the samples for the partition design.The numerator of (20) can be rewritten as follows:

The Relationship with MFA
where  0  ,  0  are the mean vector of the  2 -nearest pairs between the th and th classes.Therefore, the interclass separability can be seen as the content factor in factor analysis framework.Though the class information and nearest neighborhood are considered in the design of interclass separability, the weight matrix is ignored.
Similarly, the denominator of (20) can be rewritten as follows: Therefore, the intraclass compactness of MFA can be seen as the style factor in factor analysis framework.Interclass separability is conducted based on the -nearest neighbor relationships of the samples from the same class, disregarding the design of weight matrix.
For DLA, the design criterion of part optimization is that the distance of the nearest neighbor image is minimized within class and maximized among classes.The objective function of DLA is equal to (20), and thus DLA is a particular case of factor analysis framework also.
Mathematical Problems in Engineering 2.3.3.The Relationship with LPP.Refer to (3); () = arg min     = trace(    ).Assume that all the prior probability   is identical, and then the content and style variance matrix can be represented as follows: where   can be represented as follows: Partition the samples according to -nearest neighbors; then (24) can be written as where  ∈ R × is the relation matrix weighted by the heat kernels: (, ) = exp(−‖  −   ‖ 2 /) if   is one of the nearest neighbors of   or   is one of the -nearest neighbors of   ; otherwise it is 0, and  is a tuning parameter.Thus (3) can be represented as follows: where  is diagonal matrix, whose elements are the sum of the column or row elements in symmetric matrix , represented as   = ∑    , and  is Laplace operator;  =  − .It can be seen that LPP is also a particular case of factor analysis framework.The style factor is the weighted accumulation of sample variance in the same neighborhood.

The Relationship with PCA.
PCA is a commonly used algorithm for dimensionality reduction in pattern recognition, the objective of which is to find the best representative mapping space of the original data in the minimum mean square sense.The objective function to the dataset where Σ is the covariance matrix: where  is the mean value of all the samples; reformulate Σ in different index, and then Change the summation order in (29); then Add ( 28) to (30): Set    to be an  ×  matrix in which each element is equal to 1/2; then style variance matrix   can be represented as follows: Treat each sample as an independent partition; then Thus (3) can be represented as follows: It can be seen that PCA is also a particular case of factor analysis framework.The style factor of PCA is the accumulation of all the sample variance.[20], which describes most current subspace methods and presents that the variance of linear subspace methods is expressed in the graph structure difference, and the variance between linear methods and kernel methods is expressed in different way to embed.In 2009, [21] proposed a Patch Alignment (PA) framework for the motivation of dimensionality reduction.In the same year a universal learning framework of supervised subspace [22] has been proposed, which is based on the fundamental concept of discriminant analysis and evaluates the classification capacity of the features through the ratio between the intraclass compactness and interclass separability after projecting.It proves that LDA, LPP, and NPE algorithms are all particular cases of that framework, and the essence of these algorithms, no matter whether supervised or unsupervised, is to minimize the intraclass compactness while maximizing the interclass separability, only different in the process of sample classes.

Related Works and
We propose the framework based on factor analysis through deep research on subspace methods and current unified framework.Factor analysis framework is superior to other frameworks in the following aspects: (1) The classification meaning is clearer and the design is more flexible.The content and style factors in factor analysis are various depending on the different applications, so kinds of design schemes of the two factors can be proposed by researchers.The intraclass compactness and interclass separability in the universal learning framework of supervised subspace proposed in [22] are exactly the content and style factors in factor analysis framework.The part optimization design in [21] is the process of factor separation.Meanwhile, factor analysis criterion provides flexible weights allocation plan to research.
(2) Factor analysis framework is adapted to feature extraction and dimensionality reduction methods which are commonly used currently, including linear, nonlinear, and tensor decomposition methods.Supervised and unsupervised feature extraction methods are also included.
(3) The factor analysis framework is designed for classification, so superior feature extraction methods can be designed through this platform.The FA-LDA algorithm presented in Section 3.1 and FA-LPP algorithm presented in Section 3.2 are both improved feature extraction methods proposed using factor analysis framework.Researchers for LDA are mainly focused on the Fisher criterion solution and small sample problems but little on the Fisher criterion itself.Actually, Fisher criterion is suboptimal for the classification problem with  ( > 2) classes.Marco Loog is one of the earliest researchers that pointed out this problem and proposed an improved weighting algorithm based on Bayesian error rate function [18].In the meanwhile, Li et al. proposed an improved weighting algorithm based on Euclidean distance at the same year [17].After that, fractional-step LDA is proposed in [23] to solve the suboptimal problem of Fisher criterion.Fractional-step LDA is effective but time consuming since it is an iterative algorithm.In this section, we improve the Fisher criterion based on factor analysis in order to solve its suboptimal problem.

FA-LDA and FA-LPP
Actually, the content factor   and style factor   in factor analysis criterion are identical to interclass and intraclass scatter matrices   ,   in the case that every weight equals 1.When there is a pair of classes far from each other, the covariance (  −   )(  −   )  will be large and totally determine   , leading to the fact that the final projecting matrix excessively emphasizes the classes which are already classified well but make less use of the pair of classes (  ,   ) near each other.However, the pairs of classes near each other deserve more attention.
According to the above discussion, the weight should be reduced for the class pairs far away from each other while it should be increased for the class pairs near each other.  is obtained from weighting   : where where  is the vector dimensionality,   (),   () are the th feature in the mean vector of the th and th class, respectively, and (  (),   ()) is the correlation coefficient of the th feature in the mean vector of the th and th class, defined as follows: Similarly, the objective of the factor analysis criterion is to remove the style factor as much as possible.Thus, the contribution to solve the eigenvalue from the sample pair near each other in the same class should be reduced, since it is possible that this pair of samples has little interference from style factor; the contribution to solve the eigenvalue from the sample pair far away from each other in the same class should be increased, since it is certain that one sample of this pair contains style factor, and the optimization function of the criterion should emphasize this kind of pairs.  is obtained from weighting   : where the weighting function (, ) is defined as follows: where  is an empirical constant.The distance within class is more, the weight in  −1  is heavier, and vice versa.
Given the weighted   ,   , FA-LDA is defined as follows: 3.2.FA-LPP.The style variance in the same locality manifold is minimized after projecting in LPP algorithm; namely, the locality manifold is preserved in low-dimensional space after projecting.We improve the LPP algorithm based on factor analysis criterion so that the style variance in locality manifold of the same class is minimized and the content variance in locality manifold of different classes is maximized after projecting.Thus, the style factor of FA-LPP is where weight matrix    is )   and   belong to the same class and   is one of the -nearest neighbors of   0 other.
The content factor of FA-LPP is ,   are the mean values of the -nearest neighbors of the th and th classes, respectively, and (, ) is computed by (36).Given the definition of   ,   , FA-LPP can be defined as follows:

Experiments
4.1.FA-LDA.To evaluate the proposed FA-LDA algorithm, we compare it with the LDA algorithm on artificial data and real-world face dataset.During the testing phases, the nearest neighbor (NN) rule was used in classification.Note that [24] proposed a probabilistic graphical model framework to mimic the process of generating a face image.However, they focus on addressing the obstacles of small sample set, occlusion, and illumination variations in one sample face identification.So we do not compare our algorithm with [24].
Experiments on Artificial Data.The artificial data includes 3 subsets with 3, 4, and 5 classes, respectively, and the prior probability of each class is identical.Each class contains 60 samples with 9 dimensions.There is one class easy to discriminate in each subset.Thus, its within class covariance matrix plays a dominant role compared to other classes.Meanwhile, assume that the other classes obey the normal distribution and have identical within class covariance matrix.In experiments, 10 samples are selected as train set and the other 50 samples are equally divided into 10 groups as test set.The experiments of each class are conducted 10 times and each experiment is repeated 10 times.The classification rate on average and variance of the two algorithms are listed in Table 1.It can be seen that the classification performance of LDA algorithm is not good on this artificial data, which indicates the limitation of LDA.FA-LDA is superior to LDA in classification property.Since LDA algorithm simply uses the average of the within class scatter matrix as unified covariance matrix, it leads to an overemphasis on the scatter matrix of the class easy to discriminate.However, for FA-LDA algorithm the weights of this easily separated class are small and more attention is on the confusable classes.
Experiments on Real-World Data.The ORL database contains 400 images of 40 individuals which is used for real-world experiment.The images were captured at different times and with different variations including expression and facial details, which can be seen from Figure 2. In order to evaluate the superiority of FA-LDA, we compare it with the LDA and  current weighted algorithms, in which WLDA1 [17] is based on Euclidean distance and WLDA2 [19] is based on confusion matrix.For training, we randomly selected different numbers (three, four, five, six, seven, eight, and nine) of images per individual and used the rest of the images for testing.Such a trial was independently performed 10 times, and then the average recognition results were calculated.The results are shown in Figure 3.The classification property of LDA is the worst, which illustrated the validity the weighting scheme based on factor analysis criterion.FA-LDA algorithm achieves the best performance for it not only weights   to reduce the influence of the far away class, but also weights   to improve the anti-interference ability of discriminant criterion.

FA-LPP.
To evaluate the proposed FA-LPP algorithm, we compare it with LPP and MFA/DLA on ORL and Yale database.The Yale database is built by computational vision and control center at Yale University.Everyone has 11 images varying in expressions (blinking, glad, sad, or surprised), illumination conditions (frontal, left, or right), and facial details (glasses or no glasses), which can be seen from Figure 4.
For the ORL and Yale database, the image set is partitioned into the different gallery and probe sets where   /   indicates that  images per person are randomly selected for training and the remaining  images are used for testing.For the LPP, the important parameters include  (the number of Neighbor Measurements).For the MFA and DLA algorithm, the important parameters include  1 (the number of Neighbor Measurements of the Same Class),  2 (the number of Neighbor Measurements of Different Classes).For the proposed FA-LPP, the important parameters include  (the number of Neighbor Measurements of the Same Class).
Each experiment is repeated 10 times and the average classification rate on ORL and Yale database is listed in Tables 2 and 3, which show the superiority of our algorithm.The LPP algorithm only utilizes the nearest neighborhood for partition, while MFA, DLA, and FA-LPP algorithms consider not only the nearest neighborhood but also the class information.Thus, the classification accuracies of MFA, DLA, and FA-LPP algorithms are higher than LPP algorithm.Since the weighting scheme is also taken into account for FA-LPP, the classification accuracy of FA-LPP is higher than MFA and   For LPP, the numbers in the parentheses are the selected subspace dimensions.For FA-LPP, the first numbers in the parentheses are the selected subspace dimensions and the second numbers are the parameters.For MFA and DLA, the first numbers in the parentheses are the selected subspace dimensions and the second and the third numbers are the parameters  1 and  2 , respectively.For LPP, the numbers in the parentheses are the selected subspace dimensions.For FA-LPP, the first numbers in the parentheses are the selected subspace dimensions and the second numbers are the parameters.For MFA and DLA, the first numbers in the parentheses are the selected subspace dimensions and the second and the third numbers are the parameters  1 and  2 , respectively.
DLA.What is more, there is only one parameter of nearest neighborhood to design for FA-LPP algorithm.from Google, and 400 images collected from scanned images leading to a total of 1400 images.Testing set consists of 302 images from FG-Net database, 200 images from Google, and 200 images from scanned photographs, totaling 702 images.
For the four-class classification, faces in the training dataset will be assigned to age groups according to their labeled age.As every face has an age value labeled, it can provide necessary data for building the factors analysis framework.
In this experiment, the Gabor feature was used first in original facial image, and then we used the FA-LDA, FA-LPP, LDA, WLDA1, WLDA2, LPP, and DLA algorithms in feature extraction and dimensionality reduction.The comparative results in Figure 6 show that our methods (FA-LDA, FA-LPP) outperformed others.From this result, the classification accuracies of DLA and our FA-LDA and FA-LPP are higher than LDA, WLDA1, WLDA2, and LPP algorithms.The reason is that LDA and LPP are used for linear dimensional reduction without considering weight design problems, respectively, and WLDA1 and WLDA2 are not considered partition design.Furthermore, since LPP is based on manifold learning which is suitable for nonlinear problem such as face age group classification, therefore, the LPP and FA-LPP are better than LDA and FA-LDA, respectively.
Recently, deep learning is a new area of machine learning research which has got hot attention [25,26].Our factors analysis framework can be embedded in deep learning based object recognition system as feature dimension reduction part.Therefore, our factors analysis framework can be seamlessly integrated with the deep learning method.

Conclusions
In this paper, we aim to provide insights into the relationship among the state-of-the-art dimensionality reduction algorithms as well as to facilitate the design of new algorithms.A general framework known as factor analysis, along with its linearization, kernelization, and tensorization, has been proposed to provide a unified perspective for the understanding and comparison of many popular dimensionality reduction algorithms.Moreover, the factor analysis framework can be used as a general platform to develop new algorithms for dimensionality reduction.As shown in this paper, we have proposed two novel dimensionality reduction algorithms called FA-LDA and FA-LPP by designing the objective function of factors separation that characterize the weight matrix and the partition and by optimizing their corresponding criteria based on the factor analysis framework.These new algorithms are shown to effectively overcome the data distribution assumption of the traditional LDA and LPP algorithm.Thus, FA-LDA and FA-LPP are more general algorithms for discriminant analysis.
A byproduct of this paper is a series of kernelization and tensorization versions of the factors analysis.One of our future works is to systematically compare all possible extensions of the algorithms mentioned in this paper.

Figure 1 :
Figure 1: Face factors analysis (the content factors which without illumination and poses are what we want to extract and the style factors which with illumination and poses are what we want to remove).

Figure 2 :Figure 3 :
Figure 2: The sample facial images of ORL database.

Figure 4 :
Figure 4: The sample facial images of Yale database.

Figure 5 :
Figure 5: Sample images of FG-Net aging database (the same person with face images of different ages).
)∈  2 () or (,)∈  2 (  )            −        2 () is a data pairs set that are the  2 -nearest pairs between two different classes  and   , and    1 () indicates the index set of the  1 -nearest neighbors of the sample  within the th class.

Table 1 :
Classification accuracies of LDA and FA-LDA on the artificial database.

Table 2 :
Best recognition rates of four algorithms on the testing sets of ORL.

Table 3 :
Best recognition rates of four algorithms on the testing sets of Yale.
4.3.Comprehensive Experiments.In order to comprehensively analyze the performance of FA-LDA and FA-LPP, we conducted facial age group classification experiment and compare it with LDA, WLDA1, WLDA2, LPP, and DLA.In this experiment, the images for age group classification are collected from multiple data sources like 1002 facial images from FG-NET database, 500 images from Google database, and 600 images from the scanned photographs leading to a total of 2102 sample facial images.A few of them are shown in Figure5.For experimental purposes the total 2102 images are split into two training and test sets.Training set consists of 700 images from FG-Net aging database, 300 images Figure 6: Comparisons of the performance of different approaches on facial age database.