A Method Based on Active Appearance Model and Gradient Orientation Pyramid of Face Verification as People Age

Face verification in the presence of age progression is an important problem that has not been widely addressed. In this paper, we propose to use the active appearance model (AAM) and gradient orientation pyramid (GOP) feature representation for this problem. First, we use the AAMon the dataset and generate the AAM images; we then get the representation of gradient orientation on a hierarchical model, which is the appearance of GOP. When combined with a support vector machine (SVM), experimental results show that our approach has excellent performance on two public domain face aging datasets: FGNET andMORPH. Second, we compare the performance of the proposed methods with a number of related face verification methods; the results show that the new approach is more robust and performs better.


Introduction
1.1.Background.Face verification is an important yet challenging problem in computer vision and has a very wide range of applications, such as surveillance, access control system, image retrieval, and human computer interaction.Despite decades of study on face image analysis, age related facial image analysis has not been extensively studied until recently.Most of the research effort has been focused on pursuing robustness to different imaging conditions, such as illumination change, pose variation, and expression.Published approaches to age invariant face recognition are limited.Most of the available algorithms dealing with facial aging problem are focused on age estimation [1][2][3][4][5][6][7] and aging simulation [8][9][10].One of the successful approaches to age invariant face recognition is to build a 2 D or 3 D generative model for face aging [11]; the aging model can be used to compensate for the aging process in face matching or age estimation, which we can see in Figure 1.There are only a few previous works that applied age progression for face verification tasks.Ramanathan and Chellappa [12] used a face growing model for face verification tasks for people under the age of eighteen.This assumption limits the application of these methods, since ages are often not available.When comparing two photos, these methods either transform one photo to have the same age as the other or transform both to reduce the aging effects.While the model-based methods have been shown to be effective in age invariant face recognition, they have some limitations, such as, the difficulty to construct face models, the need for the training faces' information and other uncontrolled conditions (e.g., frontal pose, normal illumination, and neutral expression).Unfortunately, such constraints are not easy to satisfy in practice.Biswas et al. [13] studied feature drifting on face images at different ages and applied it to face verification tasks.Other studies used age transformation for verification including [14][15][16][17].Ling et al. [18] used gradient orientation pyramid for feature representation, combined with support vector machine for verifying faces across age progression, and it showed the difference of the influence on the image information in matching.Li et al. [19] proposed a discriminative model called MFDA to address face matching in the presence of age variation.In a recent work [20], Sungatullina et al. proposed a new multiview discriminative learning (MDL) method with three different types of local feature representations for ageinvariant face recognition.

Contribution.
In this paper, we make some contributions.First, we propose using the active appearance model (AAM) and the gradient orientation pyramid (GOP) for face verification.We show that, when combined with the support vector machine (SVM), the feature demonstrates excellent performance for face verification with age gaps.This is mainly motivated by the good performance of the gradient orientation pyramid as shown in [18].The gradient orientation is robust to aging processes under some flexible conditions and pyramid technique is used to add hierarchical information that further improves the performance.Given an AAM image pair by using the active appearance model [21], we use the gradient orientations pyramid to build the feature vector.At last, similar to the procedure in [22], we combined with a SVM classifier for the face verification.Second, we assessed seven different methods to complete the task, including two benchmark approaches ( 2 norm and gradient orientation) and five different representations with the same SVM-based framework (intensity different, gradient with magnitude, gradient orientation, gradient orientation pyramid, and active appearance model with gradient orientation pyramid).The thorough empirical experiments are executed on the two big public aging datasets: FGNET and MORPH.The rest of the paper is organized as follows.Section 2 describes the AAM.Then in Section 3, we introduce the gradient orientation pyramid.Section 4 reports the verification experiments on the two datasets mentioned above.At last, Section 5 summarizes and discusses the paper.

Active Appearance Model (AAM)
AAM (active appearance model) is a feature point extraction method which is widely used in the field of pattern recognition.Cootes first proposed the ASM [21], but the ASM more or less ignores the texture (color and gray value) information of images.Then the AAM proposed, the facial features localization method based on AAM in establishing face model process, not only considers the local features, but also considers comprehensively the global shape and texture information.Through statistical analysis of the shape and texture features of the human faces, face mixed model is established, which is the final AAM.
Firstly we use the 68 feature points to establish the shape model; then we normalized the shape model to eliminate the effect of other factors, and after the normalizing, we use PCA on the normalized shape model.Then we can get an average sample as follows: The covariance matrix of the training sample is Use the covariance matrix, and we can get the shape model, the model parameter c of the AAM used to control the shape the model.The model shape presentation is where x is the average vector of model shape and   is the matrix describing the model of variation derived from the training set.
Based on the normalized shape model, we apply the Delaunay triangulation algorithm (shown in Figure 2) and the affine transformation to get the texture model.
There are two triangular nets, 1 and 2; their three vertices are  1 (  where , , and  are adaptive parameters, 0 ≤ , ,  ≤ 1, and  +  +  = 1.And if point  is in 1, we can get ( Then we can get the point in 2as follows: By establishing one-to-one mapping, we can get the model texture representation: where ĝ is the average vector of model texture and   is the matrix describing the model of variation derived from the training set.
When we get the AAM images, which are of the size of 120 * 126 pixels, we normalize the AAM image to 70 * 80 pixels and transform it into a gray-scale one.And the normalized images are independent of the shape information, we will use only the texture information of the original images in the next feature extraction process.The final AAM image we used is shown in Figure 3.
By using the AAM, we can reduce the impact of the age variation in face verification.
(1) In the process of the AAM, the face pose has been corrected; and the effect of the posture has been nearly eliminated.
(2) The differences among different people are reduced by normalizing the shape model.
(3) By normalizing the shape model, the texture models we have gotten almost ignore the shape information, and the only texture information is mainly used in the feature extraction process.
(4) To verify the reason 3, in the Section 4, we will conduct experiments on the effect of shape and texture representations, and the results show that the texture feature is more useful than shape feature in face verification across age progression.

Gradient Orientation Pyramid (GOP)
Because of previous study of the robustness of gradient orientation (GOP) [18], we propose to use it as the feature descriptor for face verification across age progression in our experiment.
where Φ() is the Gaussian kernel, ↓ 2 denotes half size downsampling, and  is the number of pyramid layers.What should be noted in ( 8) is that  is used for both the original image and the images at different scales for convenience.Then, the gradient orientation at each scale  is defined by its normalized gradient vector at each pixel as where  is a threshold for dealing with "flat" pixels.The gradient orientation pyramid (GOP) of  is naturally defined as () = stack({((, ))}  =0 ) ∈  ×2 that maps  to a  × 2 representation, where stack (⋅) is used for stacking gradient orientations of all pixels across all scales and  is the total number of pixels.Figure 4 illustrates the computation of a GOP from an input image.
Given an AAM image pair ( 1 ,  2 ) and corresponding GOPs ( 1 = ( 1 ),  2 = ( 2 )), the feature vector  = ( 1 ,  2 ) is computed as the cosines of the difference between gradient orientations at all pixels over scales as where ⊙ is the element-wise product.Next, a Gaussian kernel is used on the extracted feature  for combining with the SVM.Specifically, our kernel is defined as where  is a parameter determining the size of RBF kernels.

Face Verification Experiments
In this section, we report experimental results obtained on FGNET and MORPH database by comparing our algorithm with a number of related face verification methods.While there are several public domain face datasets, only a few are constructed specifically for the aging problem.The lack of a large face aging database until recently limited the research on age invariant face verification.There are two desired attributes of a face aging database: (i) large number of subjects and (ii) large number of face images per subject captured at many different ages.In addition, it is desired that these images  should not have large variations in pose, expression, and illumination.But the conditions are too hard to satisfy in the real world.Hence, it is crucial to design an appropriate feature representation scheme which is tolerant to such multiple variations.

Experimental Classifier and Evaluation.
Our model face verification is a two-class classification problem.Given an input image pair  1 and  2 , the task is to assign the pair as either intrapersonal (i.e.,  1 and  2 from the same people) or extrapersonal (i.e.,  1 and  2 from the different individuals).We use a support vector machine (SVM).Specifically, given an image pair ( 1 ;  2 ), it is first mapped onto the feature space as where  ∈   is the feature vector extracted from the image pair ( 1 ;  2 ) through the feature extraction function,  :  ×  →   ,  is the set of all images, and   forms the dimensional feature space.
Then SVM is used to divide the feature space into two classes, one for intrapersonal pairs and the other for extrapersonal pairs.Using the same terminology as in [20], we denote the separating boundary with the following equation: where   is the number of support vectors and   is the support vector.The notation Δ is used to balance the correct rejection rate and correct acceptance rate as described in (14), and the (⋅, ⋅) is the kernel function that provides SVM with nonlinear abilities.By the way, in our experiments, we adopt the most widely used SVM, the LIBSVM library [23].
For verification tasks, we use two popular critical criteria, the correct rejection rate (CRR) and the correct acceptance rate (CAR): CRR = #correctly rejected extrapersonal pairs #total intrapersonal pairs , CAR = #correctly accepted intrapersonal pairs #total extrapersonal pairs , where "accept" indicates that the input image pair are from the same subject and "reject" indicates the opposite.The performance of algorithms is evaluated using the CRR-CAR curves that are usually created by varying some classifier parameters.For our experiments, the CRR-CAR curve in each experiment is created by adjusting parameter in (13).In addition, the equal error rate (EER), defined as the error rate when a solution has the same CAR and CRR, is frequently used to measure verification performance.As we know, the lower the EER, the better the performance.

Experimental Compared Approaches.
We compare the following approaches.
(i) SVM + AAM + GOP: this is the approach proposed in this paper.
(ii) SVM + GOP: this is the method using GOP in [18].
(iii) SVM + GO: this is the SVM+GOP without a hierarchical model.
(iv) SVM + G: this one is similar to SVM+GO, except that the gradient (G) is used instead of gradient.
(v) SVM + diff: as in [22], we use the differences of normalized images as input features combined with SVM.
(vi) GO: this is the method using gradient orientation proposed in [24].
(vii)  2 : this is a benchmark approach that uses the  2 norm to compare two normalized images.
To avoid to the huge difference between the original images and the AAM images, in our compared experiments, the images are preprocessed using the same scheme as in [18].This includes manual eye location labeling, alignment by eyes, and cropping with an elliptic region, where the size of the elliptic region is almost equal to the black edge of the AAM images.Figure 5 shows the preprocessing in the other compared experiments.

Experiments on the FGNET Dataset.
In the face verification, there are several public domain face datasets; only a few are constructed specifically for the aging problem.As we know, the lack of a large face aging database until  recently limited the research on age invariant face recognition.The FGNET Aging database [25] is a widely used and an important dataset of the face recognition in the presence of age progression.It contains 1002 facial images of 82 persons; consequently, there is an average of 12 images per subject and all the images in the database are annotated with landmark points and age information in the age range of 0-69 years.Figure 6 shows some of the examples in the FGNET database.
The appearance changes of human faces are very different in children than in adults, which were proved in [26].In this experiment we only use a subset of the FGNET database that contains only images that are taken above age 18 (including 18), which is consistent with the study in [12,18].In the subset of the FGNET database, there are 272 facial images of 62 persons.
For verification tasks, we generate 665 intrapersonal images pairs by collecting all image pairs from same subjects.
Extrapersonal pairs are randomly selected from images from different subjects.We only utilize the three-fold cross validation for the less number of the images, such that in each fold samples from the same subject never appear in both training and testing pairs; each fold contains 220 intrapersonal pairs and 2000 extrapersonal pairs.The results are shown in the Figure 7 and Table 1.
From the figure and table, we can see that the proposed approach outperforms the other methods in the experiments, especially compared to the  2 norm and gradient orientation, which improves approximately 10%.Although the improvement is not obviously compared to the SVM + GO and SVM + GOP, but in contrast to others traditional methods, the improvement is apparent.Since, as we know, the number of the images is limited, we consider that the experiments only in the FGNET database is not enough, and the extended face verification experiments are needed.

Experiments on the MORPH Dataset.
In this section, we report results on experiments on a larger public domain face aging dataset, which is the MORPH database [27].
In the MORPH database, there are total of 52099 facial images of 12938 subjects in the age range of 16-77 years.But there is only age information in the dataset, lack of the landmark point information.So we labeled the 68 landmark points of all the facial images manually prior to the verification task.Figure 8 shows several examples of face images at different ages in MORPH database.While the number of subjects in this database is large, the number of face images per subject is rather small (an average of about 4 facial images per subject).Notice there are also large pose, lighting, and expression variations along with the age variation.In this experiment we use all the image data to carry on the face verification.
We emphasize the importance of experiments on MORPH database due to the following reasons.
(i) MORPH is very challenging for our task in two ways.
First, it contains much larger age gaps.The largest (ii) Compared to the FGNET, obviously, the number of the facial images in MORPH is larger.The subjects are 10 times more than the subjects in FGNET.We consider that experiments in MORPH will serve as a baseline for future studies on the topic.
(iii) The MORPH database contains facial images of three kinds of skin color, in contrast to the FGNET which only contains the white people.And the illumination and environment of the images in the MORPH will be more in line with the actual application; it can provide a better foundation in its future actual application.
Because of the big size of the MORPH database, we decide to use the three-fold, five-fold, and ten-fold cross validation rather than utilize only three-fold cross validation in the following experiments; each fold contains 2000 intrapersonal pairs and 3000 extrapersonal pairs which we also choose randomly from all the intrapersonal pairs and extrapersonal pairs by collecting all image pairs from all the subjects, which makes each fold samples from the same subject never appear in both training and testing pairs.
The experiment results are shown in Figures 9, 10, and 11 and Tables 2, 3, and 4.
AS we see in the results of the three-fold across validation experiments on the MORPH database, we can study that, again, our method outperforms all other approaches, and in addition, the results improve more than the experiments  on the FGNET database (the equal error rate difference increases from 0.3% to 1.3%).We also can find that all the methods' equal error rate decreases obviously (even the worst result (the  2 norm) equal error rate decreases from 40.6% to 26.6%), this probably thanks to the fact that the dataset is big enough, so that the SVM could conduct more thorough learning, or maybe due to the lesser average of facial images per subject and the smaller difference of age of the facial images of per subject to make the learning more simple.Consequently, because of the tremendous MORPH database, in addition, we consider to extend the experiments by using the five-fold and ten-fold cross validation to improve the verification performance of our method.

Age Factor Experiments.
As mentioned above, the experiments in [26] have proved that the appearance changes of human faces are very different in children than in adults, and  For verification tasks, we generate about 2000 intrapersonal images pairs by collecting all image pairs from same subjects.As the experiments in Section 4.3, we only utilize the three-fold cross validation in the experiments, and each fold contains 690 intrapersonal pairs and 2000 extrapersonal pairs.The results are shown in Figure 12 and Table 5.We can see that the performance of the experiments decreases a lot; in other words, the face verification becomes more difficult, which confirms the method in [26] again.
Because the paper is about the face verification across aging, the way age differences affect the performance in the task is very interesting.So we make statistics about the age effect in the experiments.The statistics results are shown in Figures 13 and 14.
From the figures in the experiment FGNET dataset, we found that in the experiments almost all the performance reduces with the age gap increasing, especially when the age gap is more than four.Maybe the SVM+diff is more suitable  for the face images in the FGNET database below 18 years old, and as we can see the results shown in the Figure 12 and Table 5 are consistent.
Figure 15 shows the experiments on the MORPH database.And there is something interesting in the experiments on MORPH database; the faces separated by four years are easier than those separated by more than four years and less than four years.Because the age gaps of each subject are lesser, when the age gap is big enough, the task becomes easy as the age gap is four, and the results on MORPH database are irregular.

Shape and Texture Representation Experiments.
In this experiment, we only use the shape feature and the texture feature to address the age variant in face verification on MORPH.For the texture feature, we increase the weight from 0.1 to 0.9; for the shape feature, the weight deceases from 0.9. to 0.1.We use the three-fold, five-fold, and ten-fold cross validation in the experiments.The experiment results are shown in Figures 16, 17, and 18 and Table 6.
We can see from the figures and the table, with the weight of the texture feature increase, the performance of the face verification is better.Then we can deduce that the texture feature plays a more improtant and useful role than the shape feature on face verification across age progression.

Conclusion and Discussion
In this paper we studied the problem of face verification with age variation combining active appearance model (AAM) and gradient orientation pyramid (GOP) representation.First, we establish the AAM on the datasets.After we generate the AAM images, we use a robust face descriptor, the gradient orientation pyramid, for face verification tasks across ages.Then we use SVM classification to train and test them.To compare to previously used descriptors which are very classic in our experiments, the proposed method demonstrated very promising results on two public domain databases: FGNET  Facial aging is a challenging problem that will require continued efforts to further improve the recognition performance.There are several directions for future work.First, since the data always affect the performance in the face verification, we plan to test on other large public datasets for deeper understanding of the proposed approaches.Second, we anticipate the proposed and more effective feature extraction methods to solve face verification problem in the future research.

Figure 1 :
Figure 1: Schematic of the aging simulation process from age a to age b.

Figure 2 :
Figure 2: Result of the Delaunay triangulation.

Figure 3 :
Figure 3: The AAM image and the normalized image.

Figure 4 :
Figure 4: Computation of a GOP from an input image .

Figure 5 :Figure 6 :
Figure 5: The image preprocessing in the compared experiments.

Figure 8 :Figure 9 :
Figure 8: Typical images with age differences in MORPH.

Figure 11 :
Figure 11: CRR-CAR curves for ten-fold cross validation experiment on MORPH dataset.

Figure 13 :Figure 14 :
Figure 13: Effect of aging on verification performance in the experiments on FGNET dataset (above 18).

Figure 15 :Figure 16 :
Figure 15: Effect of aging on verification performance in the experiments on MORPH dataset.

Morph: 5 -Figure 17 :Figure 18 :
Figure 17: Effect of shape and texture feature for five-fold cross validation in the experiments on MORPH dataset.

Table 1 :
Equal error rates for experiment on FGNET database (above 18).

Table 2 :
Equal error rates for three-fold validation experiment on MORPH database.

Table 3 :
Equal error rates for five-fold validation experiment on MORPH database.

Table 4 :
Equal error rates for ten-fold validation experiment on MORPH database.

Table 5 :
Equal error rates for experiment on FGNET database (below 18).

Table 6 :
Equal error rates for experiment on MORPH database.
and MORPH database.Both databases contain many face images with large age differences.In addition, we make statistics about the age effect in the experiments.