A Novel Multisupervised Coupled Metric Learning for Low-Resolution Face Matching

This paper presents a new multisupervised coupled metric learning (MS-CML) method for low-resolution face image matching. While coupled metric learning has achieved good performance in degraded face recognition, most existing coupled metric learning methods only adopt the category label as supervision, which easily leads to changes in the distribution of samples in the coupled space. And the accuracy of degraded image matching is seriously inﬂuenced by these changes. To address this problem, we propose an MS-CML method to train the linear and nonlinear metric model, respectively, which can project the diﬀerent resolution face pairs into the same latent feature space, under which the distance of each positive pair is reduced and that of each negative pair is enlarged. In this work, we deﬁned a novel multisupervised objective function, which consists of a main objective function and an auxiliary objective function. The supervised information of the main objective function is the category label, which plays a major supervisory role. The supervised information of the auxiliary objective function is the distribution relationship of the samples, which plays an auxiliary supervisory role. Under the supervision of category label and distribution information, the learned model can better deal with the intraclass multimodal problem, and the features obtained in the coupled space are more easily matched correctly. Experimental results on three diﬀerent face datasets validate the eﬃcacy of the proposed method.


Introduction
Image matching is an important task in computer vision and multimedia analysis, and numerous methods have been proposed to solve this issue under controlled conditions [1][2][3][4]. However, these methods often fail to gain a good performance in real-world degraded image matching. To address the problem of low resolution, some researchers usually improve image quality by image reconstruction [5][6][7][8]. However, image reconstruction methods usually require big data to train the model, which is complicated and time-consuming. More importantly, some invalid information is easily introduced into the reconstructed image, and sometimes the information is even interference for image matching.
erefore, to avoid invalid information interference in image reconstruction, some researchers proposed the image matching idea based on coupled metric learning.
In 2009, the linear coupled metric learning (LCML) [9][10][11] was proposed, which achieved good performance in lowresolution image matching. In order to better implement nonlinear data metric, some nonlinear coupled metric learning methods [12][13][14] combining the kernel technique were proposed. In addition, Zhang et al. [15] proposed the distance metric based on coupled edge discriminant mapping, which implemented the effective metric for different attribute data. Jiang et al. [16,17] proposed coupled discriminant multimanifold analysis for low-resolution face recognition, which effectively improved the recognition performance. Since 2014, deep learning has become more and more widespread, which is introduced to improve the metric learning model [18,19]. e deep coupled metric method maps samples to a coupled space by using deep networks, which can better extract nonlinear features and improve the performance of image matching. But deep networks require huge datasets for training and cannot be applied in small datasets.
Although there have been many studies on coupled metric learning, the existing metric methods still have some shortcomings. When only category labels are used as supervision, it is difficult to overcome the intraclass multimodal problem, as shown in Figure 1; the distribution of samples cannot be well maintained in the coupled space, which will result in a decrease of image-matching accuracy. erefore, to better use the spatial distribution information of samples to supervise the metric learning, this paper proposes a multisupervised coupled metric learning method fusing category label and distribution information of samples. Meanwhile, the proposed method is suitable for small sample datasets and avoids the deficiencies of deep coupled metric method.
In the proposed metric learning method, we first construct the category correlation matrix and distribution correlation matrix, respectively. Under the supervision of these two correlation matrices, the main objective function and auxiliary objective function are defined. en, the general objective function is generated by fusing these two objective functions and the new correlation matrix fusing category label and distribution information is obtained. In the general objective function, the category label is used as main supervised information and distribution information is used as auxiliary supervised information. Finally, the objective function is transformed into the generalized eigenvalue problem to solve. e experiments on Yale-B, ORL, and UMIST face datasets demonstrate that the multisupervised coupled metric extends the distance metric methods and effectively improves the image matching performance. To sum up, there are two main contributions: (1) We propose a multisupervised coupled metric learning method fusing category label and distribution relationship, which can overcome the defect of existing coupled metric methods. e method can map data in different spaces to the coupled space, and at the same time, it can better maintain the original distribution relationship between samples in the new space. (2) We construct the linear and nonlinear multisupervised coupled metric learning objective function, respectively. e nonlinear coupled metric learning can better map sample in different spaces to the coupled space for more reliable image matching. e rest of this paper is organized as follows. e definition and objective functions of coupled metric learning are described in Section 2. e multisupervised coupled metric learning method is described in detail in Section 3. e performance evaluation experiments based on three different face datasets are implemented in Section 4. Finally, Section 5 concludes our work.

Preliminary
Suppose that I X and I Y are two images with the same dimension. Conventionally, the image should be converted into a vector before distance measurement. Assuming that the vector forms are x and y, the distance between I X and I Y is defined as follows: where ‖·‖ A denotes norm operation and A is the distance metric matrix which is positive semidefinite matrix. It can be denoted as A � P T P; then, Obviously, the distance metric is converted to norm calculation after feature transformation. erefore, the unified description of distance metric is as follows:

Definition of Coupled Metric
Learning. e coupled metric is a distance function for samples from different datasets. Assuming that the two images I X and I Y have different resolutions, the corresponding vectors x ∈ R d x and y ∈ R d y .
e coupled distance metric is defined as follows: T P T f(x) − P T g(y) Set Y x 1 x 2 x 3 x 4  Figure 1: e coupled metric mapping supervised by category labels. ere are thirteen samples in set X and set Y, respectively; all these samples belong to three categories, denoted as square, rhombus, and circle. In set X, sample x 1 is close to sample x 2 , away from sample x 4 . However, sample x 1 is close to sample x 4 , away from sample x 2 in the coupled space. Although the samples within the same category are grouped together, the distribution relationship between the samples is not maintained, which sometimes affects the image-matching accuracy.
where f (·) and g(·) are the transformation functions to map samples into coupled space and the matrix P denotes the transformation and distance measurement in the coupled space.
After transformation, the samples with the same category label should be as close as possible in the coupled space. erefore, the optimal objective function of coupled metric learning is written as where S is a category correlation matrix between datasets X and Y, which is defined as follows: assuming that sample x i ∈ X, its category label is c x i ; sample y j ∈ Y and its category label is c y j . e element in the matrix S is defined as follows: Obviously, based on different supervised information, we can obtain different matrix S, so as to achieve different coupled metric approaches. Additionally, the mapping functions f (·) and g(·) may be a linear or nonlinear function, i.e., corresponding to linear or nonlinear coupled metric. (LCML). Assume that the datasets X ∈ R d x and Y ∈ R d y contain N samples, respectively. S ∈ R N×N is the correlation matrix between X and Y, and S ij is the elements in matrix S. e optimal objective function of coupled metric learning is as follows:

Linear Coupled Metric Learning
where x i ∈ X and y j ∈ Y. Assume that f (·) and g(·) are linear parameterized functions, which are defined as f(x i ) � W T x x i , g(y j ) � W T y y j . Let P x � W x P and P y � W y P, (7) is changed to After simplification, (8) can be rewritten as where S x and S y are diagonal matrices, the diagonal elements in S x are the cumulative sum of the corresponding row of matrix S, and the diagonal elements in S y are the cumulative sum of the corresponding column of matrix S.

Nonlinear Coupled Metric Learning (NCML)
. Assume that f (·) and g(·) are nonlinear functions; that is, f(x i ) � ϕ x (x i ), g(y j ) � ϕy(y j ). e samples x 1 , x 2 , . . ., x N and y 1 , y 2 , . . ., y N will be transformed to a high-dimensional feature space through the mapping function ϕ: R n ⟶ F. We can get the feature representation mode ϕ x (x 1 ), ϕ x (x 2 ), . . ., ϕ x (x N ), and ϕ y (y 1 ), ϕ y (y 2 ), . . ., ϕ y (y N ); then (7) can be rewritten as where α and β are the coefficient matrices. en, (10) can be rewritten as follows: where H x and H y are diagonal matrices, and the diagonal elements in these two matrices are cumulative sum of the corresponding row of matrix S and the cumulative sum of the corresponding column of matrix S. Constructing the inner product relationship by kernel function, (11) is changed to where matrices K x and K y based on the Gaussian kernel ) are defined as follows: 〈ϕ y y 1 · ϕ y y 1 〉 〈ϕ y y 1 · ϕ y y 2 〉 · · · 〈ϕ y y 1 · ϕ y y N 〉 〈ϕ y y 2 · ϕ y y 1 〉 〈ϕ y y 2 · ϕ y y 2 〉 · · · 〈ϕ y y 2 · ϕ y y N 〉 · · · · · · ⋱ · · · 〈ϕ y y N · ϕ y x 1 〉 〈ϕ y y N · ϕ y y 2 〉 · · · 〈ϕ y y N · ϕ y y N 〉

e Class Relation Matrix and Distribution Relation Matrix.
rough the analysis in Section 2, it is known that the matrix S reflects the correlativity between sets X and Y, which plays a decisive role in metric learning. In this work, we fuse category label and distribution information of samples to construct a novel correlation matrix S, where the category label supervises the distance metric between two datasets and the distribution information supervises the distance metric inside the same dataset. e correlation matrices are constructed based on the category label and distribution information, respectively.

Category Correlation Matrix.
Supposing that sample x i ∈ X, its class label is c x i . Sample y j ∈ Y and its class label is c y j . e category correlation matrix is C; the element in this matrix is defined as follows:

Distribution Correlation Matrix.
Assuming that in set X, sample x i is the k-nearest neighbor of sample x j , we connect samples x j and x i to each other. Similarly, the same operation is also implemented in set Y. Let T x and T y be the distribution correlation matrix for set X and set Y, respectively, and the elements in T x and T y can be calculated as follows: where the parameter t is the average value of distance between all samples.

e Linear Multisupervised Coupled Metric Learning (LMS-CML).
In the linear situation, in order to realize the joint supervision of category label and distribution information for coupled metric learning, we need to fuse the above correlation matrices. erefore, two coupled metric objective functions need to be constructed: (1) categorylabel-based coupled metric objective function and (2) distribution-information-based coupled metric objective function. en, the fusion of multisupervised information is realized through the fusion of these two objective functions.
To achieve the first objective function, combining (9) and (14), we construct the linear optimal objective function based on the category correlation matrix C: where P x and P y are the coupled transformation matrices; the diagonal matrices C x and C y are similar to S x and S y in (9). To achieve the second objective function, combining (9), (15), and (16), we construct the auxiliary optimal objective function for set X and set Y in the linear case, as follows: where T xx and T yy are diagonal matrices, and the diagonal elements of these two matrices are the cumulative sum of corresponding row of matrix T x and the cumulative sum of corresponding column of matrix T y . en, we combine (17), (18), and (19) to obtain the linear general objective function: where C x + 2c(T xx − T x ) and C y + 2η(T yy − T y ) are the novel correlation matrices fused category and distribution information. e coefficients c and η are within [0-1], which are used to control the strength of distribution information to supervise coupled metric learning.
e linear objective function (20) is converted into e solution of coupled transformation matrix A is equivalent to the calculation of the following generalized eigenvalue equation: After obtaining matrix A, the transformation matrix P x consists of the 1st to D x -th rows of matrix A; its size is D x × D c , where D c is the dimension of coupled space. e transformation matrix P y consists of the (D x + 1)-th to (D x + D y )-th rows of matrix A and its size is D y × D c .

e Nonlinear Multisupervised Coupled Metric Learning (NMS-CML).
To implement the coupled metric of nonlinear data, we exploit the kernel technique to improve the linear general objective function. e nonlinear optimal objective function is constructed based on the category correlation matrix C and the distribution correlation matrices T x and T y .
Firstly, we combine (12) and (14) to get the nonlinear main objective function: where α and β are the coupled transformation matrices; the diagonal matrices H x and H y are similar to H x and H y in (12). en, combining (12), (15), and (16), the nonlinear auxiliary optimal objective function is constructed as follows: where T xx and T yy are diagonal matrices, and the diagonal elements of these two matrices are the cumulative sum of the corresponding row of matrix T x and the cumulative sum of the corresponding column of matrix T y .

Advances in Multimedia
where H x + 2c(T xx − T x ) and H y + 2η(T yy − T y ) are the correlation matrices fusing category and distribution information in nonlinear situation. e coefficients c and η control the supervised intensity of the distribution information. Let e solution of optimal objective function is transformed into the solution of the generalized eigenvalue problem: where λ is eigenvalue and w is the corresponding eigenvector. e matrix W ∈ R (2N)×D c is constructed by the eigenvectors w corresponding to the minimum eigenvalue, the D c -th eigenvalue. According to the definition of W � [α β] T , we get α ∈ R N×D c and β ∈ R N×D c .

Complete
Steps of Algorithm Implementation. Assume that there are two training datasets X and Y. e elements in set X are high-dimensional data corresponding to clear images; the elements in set Y are low-dimensional data corresponding to low-resolution images. Additionally, there are some low-dimensional data as test samples. e implementation steps of the proposed method are as follows.

Training Process
Step 1. Based on the supervision of category label, establish the connection relations between set X and set Y; then according to (14), the category relation matrix C is calculated.
Step 2. Establish local neighbor relation inside set X and set Y, respectively, and then the corresponding distribution correlation matrices T x and T y are computed according to (15) and (16).
Step 3. Construct main objective function J L in (17) or J N in (23), where J L is the linear objective function and J N is the nonlinear objective function.
Step 4. Construct the linear auxiliary objective functions J LX and J LY in (18) and (19); moreover, construct the nonlinear auxiliary objective functions J NX and J NY in (24) and (25).
Step 5. According to (20) and (26), the general objective functions in linear and nonlinear cases are constructed, respectively.
Step 6. Solve the optimal objective functions. e linear optimal function can be solved according to (22), and then the nonlinear optimal function can be solved according to (27).
Finally, we obtain the coupled metric matrix A � P x P y T or W � [αβ] T , and the features of training samples in set X are calculated, namely,

Testing Process
Step 1. Test samples are indentified. Let I T denote one low-resolution test image. e coupled metric matrix P y or β is used to extract the feature of test image; namely, I f � P T y I T or I f � β T I T .
Step 2. After coupled mapping, the features of training sample and test sample have the same dimension, so the image matching is converted to the comparison of the features x if and I f .

Experiment and Analysis
In this section, we evaluated the linear and nonlinear multisupervised coupled metric methods on recognition tasks involving inconsistent matching issues. e face datasets and experimental settings were briefly described in

Datasets and Experimental Settings.
To prove the validity of the proposed coupled metric method, the Yale-B [20], ORL [21], and UMIST [22] face datasets were used for the low-resolution image matching experiments. Table 1 briefly shows the situation of these three face datasets.
In the Yale-B face dataset, each person's images are divided into 5 subsets based on the illumination changes. So, we randomly selected half number of images from the 5 subsets with illumination changes as training samples, a total of 2880 faces. e rest of the face images were used as the normal test samples.
In the ORL face dataset, we adopted 5 images of each volunteer as training samples, a total of 200 faces. e remaining 200 faces were used as normal test samples.
In the UMIST face dataset, we randomly selected 18 faces of each person as training samples, a total of 360 faces. e other faces were used as normal test samples. Images in the above three face datasets are all highquality samples, and there are not low-resolution faces. To verify the effect of our proposed method in low-resolution image matching, we need to artificially produce some lowquality images. In the experiment, we use the blurring and undersampling methods to produce low-resolution face images. For example, the resolution of clear face images is normalized to 64 × 64, and the resolution of corresponding low-quality face is 16 × 16. So, the training dataset includes two sample sets; namely, the high-quality images form the set X and low-quality images form the set Y. en, these two sets are used to learn the coupled transformation matrices. In testing process, the test images are also low-resolution images, which can be produced by blurring or undersampling from normal test face images.
In Sections 4.2, 4.3, and 4.5, the resolution of the degraded face image used in the training and testing process is set to 16 × 16. We repeated our experiments 10 times and took the average as the final matching results.

Linear Multisupervised Coupled Metric Experiments.
In the proposed LMS-CML, there are three influencing factors: : (1) the number k of nearest neighbors in the calculation of distribution correlation matrix, (2) the dimension of preserving features in solving generalized eigenvalue problems, and (3) coefficients c and η in general objective function. erefore, in the following experiments, we will discuss and analyze the impact of these factors on image recognition performance.

Experiment 1.
Firstly, we analyze the joint influence of the number k of nearest neighbors and dimension of preserving features. Figure 3 illustrates the experiment results.
It can be seen that with the dimension increasing of preserving features, the recognition rate slowly decreases after a rapid rise. In the initial stage, the feature dimension increases; the effective classification features are more reserved, so the recognition rate rises rapidly. en, when the feature dimension reaches the optimal value, the best recognition effect can be achieved. After that, the feature dimension continues to increase; the unnecessary interference factors are introduced in the classification feature, so the recognition rate will slowly decrease.
In Yale-B face dataset, when the dimension of preserved features is 15 and the number of nearest neighbors is 5, the highest recognition accuracy reaches 89.06%. In ORL face dataset, the dimension of reserved features is 35; the recognition rate is relatively high. When the number of neighbors is 7, the optimal recognition rate is 93%. In   UMIST face dataset, due to the great pose variation, it is necessary to reserve more features to obtain a better recognition effect. When the dimension of reserved features is 20 and the number of nearest neighbors is 9, the optimal recognition rate reaches 92.16%. Based on the above experiments, we see that the number of nearest neighbors has some influence on the recognition accuracy, but it does not change the overall trend of recognition rate curves.

Experiment 2.
In the linear general objective function, the coefficients c and η control the strength of the supervision of distribution information. So in experiment 2, we analyzed their impact on the recognition effect. Figure 4 shows the experiment results. Obviously, in different face datasets, the recognition results reach the optimal value when parameters c and η take different values, but the variation curves of recognition results have a stronger overall regularity. e optimal value of parameter c is within [0.6-0.9], and the optimal value of parameter η is within [0.3-0.5]. It can be seen that the impact of distribution information of high-resolution faces is stronger than that of low-resolution faces.
Additionally, we can see that the coefficients η � 0 and c � 0; that is, the distribution information is not used as supervision and only the category label is used as supervision; the recognition effect is not the worst, but it is not optimal. It indicates that the distribution information of samples as supervision effectively improves the performance of coupled metric method.
With the increasing of these two parameters, the recognition performance of LMS-CML decreases slightly. ere are two reasons for this variation: (1) When the distribution information of samples is considered more, the main supervisory role of category label is weakened. (2) In construction of distribution correlation matrix, the neighbor relation of samples with the same class is easily interfered by samples of other categories. erefore, with the increasing of η and c, the proportion of auxiliary function in the general objective function increases, which enlarges the interference intensity and affects the recognition effect.

Nonlinear Multisupervised Coupled Metric Experiments.
Different from the linear coupled metric learning, there are four influencing factors in the nonlinear coupled metric learning method; in addition to the three influencing factors as described in Section 4.2, the adjustable factor σ of Gauss kernel function is the fourth influencing factor. In experiment 1 of this section, we mainly analyzed the joint influence of adjustable factor σ and dimension of reserved features. e influence of the coefficients c and η in the general objective function is discussed in experiment 2. In addition, the experiment results show that the number of nearest neighbors in the nonlinear case is the same as that in the linear case.

Experiment 1.
In the following experiment, we analyzed the impact on recognition rate based on different adjustable factor σ and varied dimensions of reserved features. e results are shown in Figure 5. Figure 5 shows the average recognition rates of NMS-CML method on different datasets. Comparing with the experiment results in Section 4.2.1, we can see that the recognition effect is obviously improved in the nonlinear case. Furthermore, to achieve an optimal recognition rate, the dimension of reserved features in the nonlinear case is higher than that in the linear case. When the adjustable factor σ is near the optimal value, the recognition rate has an improvement dramatically, which indicates that the adjustable factor has a greater impact on the recognition performance.
e curves indicate that the dimension of features is 25 and σ� 0.5; the optimal face recognition rate of 91.15% is obtained in Yale-B face dataset. When the dimension of features is 30 and σ� 0.3, the best recognition rate is 97% in ORL face dataset. en, in UMIST face dataset, the dimension of features is 25 and σ� 0.7; the best recognition rate is 95.59%. ese results illustrate that the NMS-CML can better extract the low-resolution and pose-varied face features, but the improvement in recognition effect of lowresolution images with severe illumination is limited.

Experiment 2.
We experimented and discussed the effects of parameters c and η in the nonlinear general e conclusion is that the supervised effect of the distribution relationship between high-resolution images is stronger than that between low-resolution images.
e optimal values of c and η are 0.65 and 0.3, respectively, in Yale-B face dataset. e optimal values of c and η are 0.7 and 0.3 in ORL face dataset. e best values of c and η are 0.7 and 0.45 in UMIST face dataset. In general, the category label has the strongest supervision effect in MS-CML, and the distribution information of samples in highresolution image set has stronger supervision than that of low-resolution image set.

Matching Experiments of Images with Different
Resolutions. In the above experiments, the resolution of high-quality face images is set to 64 × 64 pixels, and the resolution of low-resolution face images is set to 16 × 16 pixels. To better illustrate the effectiveness of MS-CML method, in this section, we performed coupled metric experiments of low-quality face images with different resolutions. e resolution of low-quality face images is set to 32 × 32, 16 × 16, and 8 × 8 pixels. Figure 6 shows some faces with different resolutions, and the experiment results are summarized in Table 2.
e experimental results show that the resolution of lowquality faces is set to different values; the recognition performance of MS-CML is affected to some extent. When the resolution of low-quality face image is set to 8 × 8 pixels, the face image contains less effective classification information; thus, the recognition rate drops more. In addition, due to strong illumination interference in Yale-B face dataset and obvious pose variation in UMIST face dataset, the effective classification information further reduced, and the recognition rate decreased more than that of ORL face dataset. In addition, the nonlinear coupled metric has a stronger feature extraction capability, so the LMS-CML has a much lower recognition rate than NMS-CML.

Comparison Experiments with Other Methods.
In order to evaluate the performance of proposed coupled metric methods, based on the methods of feature extraction [23][24][25][26] after the image reconstruction [27,28], the methods in [9,11,[13][14][15], we make the comparison experiments, respectively. e experiment results are shown in Table 3.
We can see that the recognition effect of feature extraction after image reconstruction is not ideal. e main reason is that some false information is introduced in the image reconstruction, which is not obvious for the improvement of recognition effect, and even interferes with the recognition rate. In addition, the same person's face images have large illumination and pose changes; this intraclass multimodal problem will affect the metric learning and final classification results. e methods in [9,14] are distance metric learning only based on category label information, which cannot overcome the intraclass multimodal problem, so the recognition rate is lower. e distance metric methods in [11,13] are based on locality preserving relation in the same class, which are advantageous to solve the intraclass multimodal problem, so comparing with CML and KCML, the recognition effect is improved. e CMDM [15] combines discriminant information with data distribution of local neighbors, so the recognition effect is similar to the proposed LMS-CML method. However, the CMDM does not deal with the nonlinear data, so the final recognition effect is not as good as our proposed NMS-CML method.
rough comparison experiments, we get two conclusions. (1) e fusion of category label and distribution information can better supervise the learning of the metric matrix. (2) e nonlinear coupled transformation can extract the nonlinear essential features more effectively. So, in this paper, we propose the linear and nonlinear MS-CML method, which make full use of the supervision of category label and distribution information. Finally, we get the excellent recognition result in face recognition with low resolution and pose variations.

Conclusions
In this paper, a multisupervised coupled metric learning method was proposed to obtain the coupled mapping matrix, which can be used for low-resolution image matching. In the metric learning, we constructed the linear and   nonlinear coupled metric learning objective functions, respectively. Compared with linear metric learning, the mapping matrix obtained by nonlinear metric learning can better improve the accuracy of low-resolution image matching. In addition, experimental results prove that the proposed MS-CML can overcome the intraclass multimodal problem of existing metric methods and effectively improve the matching accuracy of low-resolution images under the condition of small samples.

Data Availability
e experimental data of face database setting and recognition rate used to support the findings of this study are included within the article. e download address of the three face databases has been added to the references; you can download it by yourself, if necessary.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.