An Improved Metric Learning Approach for Degraded Face Recognition

To solve the matching problem of the elements in different data collections, an improved coupled metric learning approach is proposed. First, we improved the supervised locality preserving projection algorithm and added the within-class and betweenclass information of the improved algorithm to coupled metric learning, so a novel coupled metric learning method is proposed. Furthermore, we extended this algorithm to nonlinear space, and the kernel coupled metric learning method based on supervised locality preserving projection is proposed. In kernel coupled metric learning approach, two elements of different collections are mapped to the unified high dimensional feature space by kernel function, and then generalized metric learning is performed in this space. Experiments based on Yale and CAS-PEAL-R1 face databases demonstrate that the proposed kernel coupled approach performs better in low-resolution and fuzzy face recognition and can reduce the computing time; it is an effective metric method.


Introduction
The metric is a function which gives the scalar distance between two patterns.Distance metric is an important basis for similarity measure between samples, and it is one of the core issues in pattern recognition.The aim of distance metric learning is to find a distance metric matrix; its essence is to obtain another representation method with better class separability by linear or nonlinear transformation.
In recent years, some researches about distance metric have been done by researchers [1][2][3][4][5][6][7].They learn a distance metric by introducing sample similarity constraint or category information; the distance metric is used to improve the data clustering or classification.These researches can be concluded to two categories: linear distance metric learning and nonlinear distance metric learning.The linear distance metric learning is equivalent to learning a linear transformation in sample space, including a variety of common linear dimensionality reduction methods, such as principal component analysis [8], linear discriminant analysis [9], and independent component analysis method [10].The nonlinear distance metric learning is equivalent to learning a nonlinear transformation in sample space; the locally linear embedding [11], isometric mapping [12], and Laplace mapping [13] are the traditional nonlinear methods.Recently, some new nonlinear distance metric methods have been proposed.Baghshah and Shouraki [14] proposed the nonlinear metric learning method based on pair-wise similarity and dissimilarity constraints and the geometrical structure of data.Babagholami-Mohamadabadi et al. [15] proposed the probabilistic nonlinear distance metric learning.The deep nonlinear metric learning method [16] based on neural networks is a new nonlinear metric learning method.In addition, there are some more flexible distance metric learning algorithms, which are based on kernel matrix [7,17,18].
These traditional distance metric learning methods are defined on the set of single attribute.If the elements belong to different sets with different attribute, these distance measurement methods are incapable for the distance metric.For example, for two images with different resolution, which can be considered to belong to different sets, obviously, the traditional distance metric method will not be able to directly

Related Works
The traditional distance metric learning algorithm is to learn a distance function (  ,   ) between the data points expressed as follows:  (  ,   ) =        −        = √ (  −   )   (  −   ).(1) Distance metric learning aims to find a distance metric matrix ; it is required that  is a real symmetric and positive semidefinite matrix; namely,  =   , where  is a transformation matrix Obviously, the distance metric learning is realized by learning a transformation matrix , so the process of distance metric learning is equivalent to the process of obtaining other representation forms with better separability through linear or nonlinear transformation of the samples.
If  ⊂    ,  ⊂    represent two different collections, respectively, the function (, ) is the distance metric between data  ∈  and data  ∈ .If   ̸ =   , the traditional method does not work for distance metric.Even if   =   , because the data  ∈  and data  ∈ , which belong to different attribute collections, the distance metric has no physical meaning.
The coupled distance metric is a distance function for the data elements of different kinds of collections.The elements of collections  and  are mapped from the original space to a common coupled space    by using the mapping functions   and   .Then, the distance metric is performed in the coupled space.The measured distance can be represented mathematically as where matrix  is a real symmetric and positive semidefinite matrix.Letting  =      , we can get The goal of coupled metric learning can be achieved by minimizing the distance function; the objective function is as follows: where  is a correlation matrix of elements in collections  and .According to different supervised information, we can obtain different matrix , so as to realize the different coupled metric learning.

The Coupled Metric Learning Based on Supervised Locality Preserving Projection (SLPP-CML)
In order to better illustrate the coupled metric learning algorithm based on supervised locality preserving projection, we provide a theorem about the matrix norm.Theorem 1. Letting  ∈  × , then the Frobenius norm has the following properties: (1) ‖‖ 2 = Tr(  ) = ∑  =1   (  ), where   is the eigenvalue of the matrix   ; (2) Tr(  ) = Tr(  ), Tr(⋅) represents the trace operation.
The coupled metric learning based on supervised locality preserving projection includes the following steps.
Step 1 (building the neighborhood relation in the same collection).We use the  nearest neighbor method.First, building within-class adjacency graph in the same collection, if the data point   (  ) is one of the  within-class nearest neighbors of data point   (  ), we connect these two data points; and then, building between-class adjacency graph in same collection; if the data point   (  ) is one of the  between-class nearest neighbors of data point   (  ), these two data points are connected.
Step 2 (building the connected relation between two collections).If the data points   and   in two different collections belong to the same class, then these two points are connected, otherwise not connected.
Step 3 (constructing the relation matrix in the same collection).According to the neighborhood relations, the relation matrixes (similarity matrixes) of within-class and betweenclass are constructed, respectively.
Within-class similarity matrix is  corresponding to within-class adjacency graph and the within-class similarity value is   .The definition is as follows: Between-class similarity matrix is  corresponding to between-class adjacency graph and the between-class similarity value is   .It can be defined as follows: where parameter  is the average distance between all sample points.
Step 4 (constructing the relation matrix  between two collections).The similarity value is as follows: Step 5 (calculating the final similarity matrix  between two collections).As shown in Figure 1, the similarity relations between element  1 ∈  and elements of collection  include the following several situations.
(a) The similarity between  1 and  1 : these two data points in different collections belong to the same class and they are connected to each other, so the similarity of which is  11 =  11 = 1.
(b) The similarity between  1 and  (d) The similarity between  1 and  9 : these two data points belong to different class; there are not any between-class neighborhood relations between the elements of class 1 and class 3 in collection ; namely,  19 = 0.
Step 6 (constructing the optimal objective function).Cosider the following: where the functions   and   are considered to be linear; that is,   () =    ,   () =    .The optimal objective function can be rewritten as follow: Letting   =     ,   =     , we can get Therefore, our method aims to learn two linear transformations   and   .
According to Theorem 1, ( 12) is an alternate matrix expression of ( 11): where Tr() represent computing the trace of matrix ,  1 () and  2 () are diagonal matrixes, and their diagonal elements are the row or column sums of similarity matrix , respectively.
Assuming that  = [ can be rewritten as follow: To make the equation have a unique solution,      =  and    = 0 as constraints are added, where  = [1, 1, . . .1]  is a vector with dimensions of (  +   ) × 1,   and   are the numbers of samples in collections  and .The solution to make (13) minimized is obtained by generalized eigendecomposition of (Γ  ) = (  ) and taking eigenvectors  2 ,  3 , . . . +1 corresponding to the second to ( + 1)th smallest eigenvalues  2 ,  3 , . . . +1 .Assuming that  = [ 2 ,  3 , . . . +1 ], its dimension is (  +   ) × ,   and   are the dimensions of samples in collections  and , so the transformation matrix   corresponds to the 1st to   th rows of  and   corresponds to the (  + 1)th to   th rows of .
Step 7. Bringing the matrix   and   to (11), the distance metric of the elements belonging to different collections can be realized.

The Kernel Coupled Metric Learning Based on Supervised Locality Preserving Projection (SLPP-KCML)
In practical dimension reduction and measurement process, the linear model is not well to represent the features, and it is difficult to map two complex collections to the same space using the linear transformation.So combining the kernel method, we extend the SLPP-CML algorithm; a nonlinear coupled metric learning methods based on the supervised locality preserving projection is proposed.
Assuming that the mapping functions   and   are nonlinear functions, namely,   =   (),   =   (), using the nonlinear mapping  :   → ,  →   (),  →   (), the sample data can be mapped to the high dimensional Hilbert space.The criterion can be defined by An alternate matrix expression is as follow: where Tr represent computing the trace of matrix,   and   are diagonal matrixes, and their diagonal elements are the row or column sums of similarity matrix , respectively.
Assuming that  = [ , similar to the SLPP-CML algorithm, solving the optimal solution can be transformed into the generalized eigenvalue problem.The generalized characteristic equation is  = , and  = Γ  ,  =   ;  is the eigenvector corresponding to eigenvaule .The eigenvectors corresponding to the minimum to the   th smallest eigenvalues construct the feature matrix ; the size of matrix  is (  +   ) ×   , where   and   are the numbers of training samples of collections  and .Finally, we can get the feature matrix   corresponding to the data matrix  and the feature matrix   corresponding to the data matrix .
In addition, the samples mapped to the high dimensional space need centering processing.In the linear coupled metric learning, the centering can be realized by abandoning the eigenvector corresponding to eigenvalue of "zero." However, the centering of nonlinear coupled metric learning in kernel space can be realized by centering the kernel matrix   and where  is the dimension of kernel matrix . 1  is a matrix with size of  ×  and all elements are one.

Introduction of the Face Database.
The proposed coupled metric learning approach is used for face recognition.It is tested on Yale face database [23] and CAS-PEAL-R1 face database [24].The Yale face database contains 165 pictures of 15 people with the size of 100 × 100 and 256 gray levels.These images were taken in different expression and illumination conditions.In experiment, we used the former 6 images per person as training samples, a total of 90, and the other images were used as test samples.
The CAS-PEAL-R1 face database contains 30863 face images, which was divided into two parts: (1) the frontal face image subset; (2) the nonfrontal face image subset.In the experiment, we used the accessory data set of the frontal face image subsets (CAS-PEAL-R1-FRONTAL-Accessory).The face images per person in CAS-PEAL-R1-FRONTAL-Accessory contain 6 different appendages; there are 3 images with different glasses and 3 images with different hats.We selected 300 images of 50 people with the size of 360 × 480 and 256 gray levels in the experiment; the odd-numbered images were used as training samples and even-numbered images were used as test samples, respectively.Some training images are shown in Figure 2 and some test images are shown in Figure 3.

The Low-Resolution Face Recognition.
Due to the differences between the different resolution cameras and the uncertainty of distance between camera and face, the resolution of face image that we collected is not uniform.Obviously, traditional measurement method can not be used to calculate the distance between two images with different resolution.The general handling method is interpolation operation, but the   interpolation operation is easy to introduce false information.With the increase in false information, the distortion degree increases, as shown in Figure 4. Aiming at the problem of recognition rate declining because of image distortion, the researchers realized the low-resolution image compensation by increasing the image restoration preprocessing.But the image restoration algorithm is more complex, and the quality of image restoration has great impact on final recognition results.
However, the proposed coupled metric learning method can directly realize the feature extraction and measurement of different resolution images.This method not only saves computing time, but also avoids the negative impact of image restoration on recognition performance.To better illustrate the experimental processes, Figure 5  The test samples are the low-resolution face images, which were generated through blurring and sampling original normal test face image introduced in Section 5.1.

Experiment 1: The Low-Resolution Face Recognition
Based on SLPP-CML.Through the theoretical analysis, the

SLPP-CML algorithm has two influence factors:
(1) the number of neighbors  of supervised locality preserving projection; (2) the reserved dimensions   of the feature.Therefore, the recognition results based on different parameters should be discussed and analyzed.Figure 6 shows the change of recognition rate with the change of feature dimensions, when the number of neighbors takes different values.These recognition rate curves are in two different face databases.These curves have a general change law, with the increase in feature dimensions; the recognition rate kept a decreasing trend after increasing, and the best recognition results can be achieved only in the optimal feature dimensions.
In Yale face database, the recognition rate kept a higher trend when feature dimensions remain 10-20.The optimal recognition rate is 86.67% when feature dimension is 10 and the number of neighbors is 5.In CAS-PEAL-R1 face database, the recognition rate can reach the maximum value 86.67%, when feature dimension is 40 and the number of neighbors is 2.
The experimental data show that the number of training samples of each class is 6 in Yale face database and the recognition effect is optimal when the number of neighbors is 5.In CAS-PEAL-R1 face database, we can obtain optimal recognition rate when the number of training samples of each class is 3 and the number of neighbors is 2. Obviously, the number of neighbors is  − 1;  is the number of training samples of each class.
In addition, in order to illustrate the effectiveness of SLPP-CML method, the comparative experiments were carried out.The experiment results are shown in Table 1.
The experimental data illustrated that the recognition results of feature extraction after restoration are not satisfactory.The coupled metric learning in [19] can not overcome the influences of within-class multiple modes, so the identification effect is not good.The coupled metric learning in [21] is conducive to resolving within-class multiple modes; the recognition effects have been greatly improved, but it does not fully consider the betweenclass relationships of training samples.The proposed SLPP-CML takes advantage of the supervisory of category information, while the within-class and between-class relationship information of training samples have been considered into the metric learning, so we can get better recognition results.

Experiment 2: The Low-Resolution Face Recognition
Based on SLPP-KCML.The SLPP-KCML algorithm is a nonlinear coupled metric learning algorithm.Through the analysis, there are three factors which affect this algorithm: (1) the number of neighbors  of supervised locality preserving projection; (2) the reserved dimensions   of the feature; (3) the kernel function.Based on the experiment result, the number of the nearest neighbors is the same as that of SLPP-CML.For the kernel function, we choose the Gauss function (, ) = exp(−‖ − ‖ 2 /); the value of adjustable factor  affects the function performance.So, in this paper, the experiments were carried out according to the different adjustable factors and the change of feature dimensions on recognition rate; the experimental results are shown in Figure 7.
The curves indicated that, in Yale face database, the optimal recognition rate is 89.33% when the value of  is 0.5 and the feature dimension is 20; compared with the SLPP-CML algorithm, the recognition rate increased by 2.66%.In CAS-PEAL-R1 face database, when  = 0.7 and the feature dimensions   = 40, recognition rate is 91.33%; compared with the SLPP-CML algorithm, the recognition rate increased by 4.66%.Obviously, the nonlinear coupled metric learning method can effectively extract the classification information of face image and obtain a high recognition rate.
Considering the training time, the SLPP-CML algorithm requires calculating the eigenvalue and eigenvector of the image covariance matrix.The resolution of clear face image is 64 × 64 pixel and low-resolution face image is 16 × 16 pixel.So the dimensions of image covariance matrix are 4352 × 4352 and the average training time is about 553.905 seconds.
However, the matrix of SLPP-KCML is related to the number of classes and the number of training samples of each class, so the dimension of the covariance matrix of Yale face database is 90 × 90, the size of covariance matrix of CAS-PEAL-R1 face database is 150 × 150, and the average training time is about 6.687 seconds.Obviously, the training speed of the SLPP-KCML algorithm is faster than SLPP-CML algorithm, and the recognition time of these two algorithms is about 0.0225 seconds.Based on the above analysis, the efficiency of SLPP-KCML algorithm is better than SLPP-CML algorithm.The experimental data is the best recognition rate of each algorithm; the number of neighbors of SLPP-CML and SLPP-KCML algorithm is  − 1, where  is the number of training samples of each class.The optimal value of adjustable factor of Gaussian kernel function in SLPP-KCML algorithm is 0.7.Table 3 gives the feature dimensions of training samples in Yale and CAS-PEAL-R1 face database of SLPP-CML and SLPP-KCML algorithm.

Conclusions
Aiming at the problem that the traditional metric methods can not calculate the distance of the elements in different data sets, we proposed the coupled metric learning method based on supervised locality preserving projection.First, the elements of different sets are mapped to the coupled space combined with the within-class and betweenclass information, and then the metric matrix learning is performed.Furthermore, we extended this algorithm to nonlinear space, and the kernel coupled metric learning method based on supervised locality preserving projection is proposed.In kernel coupled metric learning approach, two elements of different collections are mapped to the unified high dimensional feature space by kernel function, and then the traditional metric learning is performed in this space.In order to verify the effectiveness of the proposed algorithm, we have done a lot of experiments on two face databases.This algorithm can effectively extract the face nonlinear features, and the operation is simple.Low-resolution and fuzzy face recognition experiments show that the proposed method can obtain a higher recognition rate and has a high computational efficiency.

Appendix
Proof of Theorem 1

Figure 1 :
Figure 1: The relationship between the elements of different collections.

Figure 2 :
Figure 2: Some original training face images.
(a) Some images in Yale face database (b) Some images in CAS-PEAL-R1-FRONTAL-Accessory

Figure 3 :
Figure 3: Some original test face images.
gives the flow of the degraded face recognition.In experiment, the training samples include clear face and degraded face images.The size of original normal training face image is adjusted to 64 × 64 pixel, and these adjusted faces are used as clear face images.However, there are not original low-resolution face images in public face database, so we obtained the low-resolution training face image through blurring and sampling original normal training face image, and the size of low-resolution face image is 16 × 16.

Figure 5 :
Figure 5: Degraded face recognition process.In training process, we obtained the matrix   for degraded face and the matrix   for clear face image.In test process, matrices   and   transformed the degraded test face and clear test face to the coupled metric space, calculating the distance.

Figure 6 :
Figure 6: The recognition rate under different dimensions and different numbers of the nearest neighbors.
The experiment result in Yale CAS-PEAL-R1 face database

Figure 7 :
Figure 7: The recognition rate under different dimensions and different adjustable factors.
5 : these two data points belong to different class, but the relationship between  5 and  3 is the between-class neighborhood relation in same collection, and the similarity  35 is the maximum similarity value, so similarity between  1 and  5 is  15 =  35 .(c)The similarity between  1 and  6 : these two data points belong to different class and the 6does not have between-class neighborhood relation with any element of collection  of class 1.But there is a between-class neighborhood relation of same collection between  5 and  3 , and within-class neighborhood relation between  5 and  6 .So the similarity between  1 and  6 is defined as the product of between-class similarity  35 and within-class similarity  56 , which is the maximum similarity between  6 and  3 ; that is,  16 =  35 ⋅  56 .

Table 1 :
Experimental comparison of this method with other methods.

Table 2 :
Experimental comparison of this method with other methods.

Table 3 :
The feature dimensions in our proposed algorithm.