Multiple imaging modalities based face recognition has become a hot research topic. A great number of multispectral face recognition algorithms/systems have been designed in the last decade. How to extract features of different spectrum has still been an important issue for face recognition. To address this problem, we propose a robust tensor preserving projection (RTPP) algorithm which represents a multispectral image as a third-order tensor. RTPP constructs sparse neighborhoods and then computes weights of the tensor. RTPP iteratively obtains one spectral space transformation matrix through preserving the sparse neighborhoods. Due to sparse representation, RTPP can not only keep the underlying spatial structure of multispectral images but also enhance robustness. The experiments on both Equinox and DHUFO face databases show that the performance of the proposed method is better than those of related algorithms.
1. Introduction
Multibiometrics can be considered the fusion of different sensor modalities in a single recognition system. The reason of using two or more sensor modalities is to improve the recognition accuracy. Multiple imaging modalities based face recognition has become a hot research topic [1–7]. Recent studies have shown that multispectral face recognition offers several advantages, such as invariance to illumination changes [8, 9]. Multispectral image also reveals anatomical information of a subject [1]. Socolinsky and Selinger [3, 10] developed different recognition algorithms on visible and thermal infrared face image databases and obtained good performances. Chen et al. [11] tested the effect of illumination, facial expression, and passage of time between the training and testing images. Wang et al. [12] showed that color space combination represents a viable approach for improving face recognition performance. The image-based fusion designed in the wavelet domain and the feature-based fusion developed in the eigenspace domain were shown in [13]. Heo et al. [14] proposed to fuse visual and thermal images for robust face recognition. Multisensory biometric fusion algorithms were investigated for personal identification [15]. Pan et al. [4, 5] analyzed the facial tissue spectral measurements in the near-infrared spectral range (0.7 μm–1.0 μm) for face recognition. Denes et al. [6] tested the spectral asymmetry with three visible bands (0.6 μm, 0.7 μm, and 0.8 μm). Chang et al. [16] fused the multispectral images in the visible spectrum (0.4 μm–0.72 μm) into a single image to enhance face recognition accuracies. Chou and Bajcsy [7] preprocessed the multispectral images (visible: 0.4 μm–0.72 μm and near-infrared: 0.65 μm–1.1 μm) by principal component analysis (PCA) to perform face detection. Based on visible images, Wong and Zhao [17] adopted kernel PCA to remove eyeglasses of thermal face images.
The above algorithms are mainly developed to preserve the global structure information of the multispectral data. They do not clearly treat the manifold structure of the data. However, research results of manifold learning algorithms presented in the past decade demonstrate that the local geometric structure is more important than the global structure since the high-dimensional data often lies on the low-dimensional manifold [18]. Due to the low-dimensional manifold structure of the face images, the manifold-learning-based linear dimension reduction algorithms [4–9, 12–14, 19–24] become popular.
Since these linear feature extraction algorithms cannot deal with high-order tensor data, some of these algorithms were further extended to be multilinear cases, and a lot of tensor-based manifold learning algorithms were proposed by using higher order tensor decomposition [25–27]. Within the past ten years, there has been great interest in high-order tensor feature extraction, and the tensor-based methods have been popular in computer vision and pattern recognition [28–31]. For example, Igarashi et al. proposed tensor subspace analysis (TSA) [32] for second-order learning. Dai and Yeung proposed tensor NPE (TNPE) [19]. Recently, orthogonal tensor neighborhood preserving embedding (OTNPE) was proposed for facial expression recognition. Some variations were also proposed for gait recognition, action recognition, and so forth. For more details, please see the latest survey of multilinear subspace learning [1–3].
Recent research demonstrates that the high-order tensor based manifold learning algorithms, such as tensor preserving embedding (tensor NPE) [20], can obtain better performance than the classical feature extraction algorithms on tensor data set. Unfortunately, the tensor data contain large quantities of information redundancy and thus not all the features/variables are important to feature extraction and classification [21–23]. It was shown that integrating sparse representation and manifold learning for feature extraction may obtain better performance [24]. It has been shown that the sparse representation methods can obtain better performance than their corresponding nonsparse methods in the real data. And these sparse methods can give an intuitionistic or semantic interpretation for the transformed features [25].
Till now, the field in high-order tensor data embedding with sparse manner has not been widely investigated and how to extend the manifold learning algorithms integrating sparseness and manifold structure for multispectral face recognition is unsolved. In this paper, motivated by tensor data embedding and sparse representation, we propose a novel method called robust tensor preserving projection (RTPP) for multispectral image feature extraction. The multispectral image is considered a third-order tensor. The aim of RTPP is to obtain transformation matrices through preserving the sparse information of the third-order tensors.
The rest of the paper is organized as follows. In Section 2, we give the related definitions to tensor. In Section 3, the introduction of tensor locality preserving projection is provided. In Section 4, a novel sparse tensor embedding method is presented. Experiments are carried out to evaluate the proposed tensor learning method in Section 5, and conclusions are given in Section 6.
2. Tensor Fundamentals
A tensor is a multidimensional array. It is the higher order generalization of scalar (zero-order tensor), vector (1st-order tensor), and matrix (2nd-order tensor). In this paper, lowercase letters (i.e., a, b, c) denote scalars, bold lowercase letters (i.e., a,b,c) denote vectors, uppercase letters (i.e., A, B, C) denote matrices, and bold uppercase letters (i.e., A,B,C) denote the tensors. It is assumed that the training samples are represented as the nth-order tensor {Ai∈Rm1×m2×⋯×mn,i=1,2,…,N}, where N denotes the total number of training samples.
Definition 1.
The inner product of two tensors A,B∈Rm1×m2×⋯×mn is defined as 〈A,B〉=∑i1∑i2⋯∑inAi1,i2…,inBi1,i2…,in. The Frobenius norm of a tensor A∈Rm1×m2×⋯×mn is then defined as ∥A∥=〈A,A〉. And the distance between two tensors A,B∈Rm1×m2×⋯×mn is defined as D(A,B)=∥A-B∥.
Definition 2.
The k-mode flattening of the nth-order tensor Ai∈Rm1×m2×···×mn(i=1,2,…,N) into matrix A(k)∈Rmk×∏i≠kmi, that is, A(k)⇐kA, is defined as Aik,j(k)=Ai1,i2…,in, j=1+∑q=1,q≠kn(iq-1)∏p=q+1,p≠knmp.
Definition 3.
The k-mode product of a tensor A∈Rm1×m2×⋯×mn by a matrix U∈Rmk′×mk, denoted by B=A×kU, is an (m1×m2×⋯×mk-1×mk′×mk+1×⋯×mn)-tensor of which the entries are given by Bi1,…,ik-1,i,ik+1,…,,in=∑j=1mkAi1,…,ik-1,j,ik+1,…,,inUi,j(j=1,2,…,mk′).
The aim of tensor learning algorithm is to obtain a set of projection matrices {Ui∈Rdi×mi,di≤mi,i=1,2,…,n} and map the original tensor into a new tensor:
(1)Bi=Ai×1U1×2U2⋯×nUn.
3. Tensor Neighborhood Preserving Embedding
Let A1,A2,…,AN be the multispectral face images in a high-order tensor form and Ai∈Rm1×m2×⋯×mn(i=1,2,…,N), N, the number of individuals. Assume that A1,A2,…,AN are from an unknown manifold M embedding in a tensor space Rd1×d2×⋯×dn. The aim of tensor NPE is to find optimal transformation matrices U1,U2,…,Un such that the local topological structure of M is preserved and the intrinsic geometric property is effectively captured. The optimal transformation matrices Uj∈Rdj×mj(dj≤mj,j=1,2,…,n) project high-dimensional Ai into low-dimensional representation Bi, where Bi=Ai×1U1×2U2⋯×nUn.
We construct a neighborhood graph to represent the intrinsic geometric structure of M and apply the heat kernel to define the affinity matrix W=[wij]N×N as
(2)wij={exp(-∥Ai-Aj∥2t),ifAj∈O(K,Ai),0,otherwise,
where O(K,Ai) denotes the set of K nearest neighbors of Ai and t is a positive constant. The affinity matrix W is then normalized such that each row sums to one. In order to preserve the geometric structure explicitly, we define the following objective function based on the Frobenius norm of a tensor:
(3)argminJ(U1,U2,…,Un)=∑i∥Bi-∑jwijBj∥2=∑i∥Ai×1U1⋯×nUn-∑jwijAj×1U1⋯×nUn∥2.
To eliminate an arbitrary scaling factor in the projection matrices, we impose the following constraint: ∑i∥Bi∥2=∑i∥Ai×1U1⋯×nUn∥2=1. Then the optimization problem for tensor NPE can be expressed as
(4)argminiiJ(U1,U2,…,Un)mmmmm=∑i∥Ai×1U1⋯×nUn-∑jwijAj×1U1⋯×nUn∥2,s.t.mmm∑i∥Ai×1U1⋯×nUn∥2=1.
Note that this optimization problem is a high-order nonlinear programming problem with a high-order nonlinear constraint, making direct computation of the projection matrices infeasible. In general, this type of problems can be solved approximately by employing an iterative scheme which was proposed for low-rank approximation. The optimization problem in (4) can be solved by such an iterative scheme. Assuming that U1,U2,…,Uk-1,Uk+1,…,Un are known, let Bik=Ai×1U1⋯×k-1Uk-1×k+1Uk+1⋯×nUn. In addition, since Bi(k)⇐kBik and based on the properties of tensor and trace, we rewrite the optimization function and the constraint in (4) as follows:
(5)argminJk(Uk)=∑i∥Bik×kUk-∑jwijBjk×kUk∥2=∑i∥UkBi(k)-∑jwijUkBj(k)∥2=∑i,jtr{(Bi(k)-wijBj(k))TUk((Bi(k)-wijBj(k))T(Bi(k)-wijBj(k))mmmmmmmm×(Bi(k)-wijBj(k))T)UkT(Bi(k)-wijBj(k))T}=tr{(∑i,j(Bi(k)-wijBj(k))Uk(∑i,j(Bi(k)-wijBj(k))mmmmmimi×(Bi(k)-wijBj(k))T∑i,j(Bi(k)-wijBj(k)))UkT∑i,j(Bi(k)-wijBj(k))},∑i∥Bik×kUk∥2=∑i∥UkBi(k)∥2=tr{Uk(∑iBi(k)Bi(k)T)UkT}.
Thus, the optimization problem in (4) can be reformulated as
(6)argminJk(Uk)mmmmm=tr{Uk(∑i,j(Bi(k)-wijBj(k))mmmmmmmmmmmm×(Bi(k)-wijBj(k))T∑i,j(Bi(k)-wijBj(k)))UkT},s.t.mmmmtr{Uk(∑iBi(k)Bi(k)T)UkT}=1.
The unknown transformation matrix Uk can be obtained by solving the eigenvectors corresponding to the kth smallest eigenvalues in the generalized eigenvalue equation
(7)(∑i,j(Bi(k)-wijBj(k))(Bi(k)-wijBj(k))T)u=λ(∑iBi(k)Bi(k)T)u.
The other transformation matrices can be obtained in a similar manner.
3.1. Tensor Locality Preserving Projection
Different from tensor NPE, the optimization problem for tensor LPP can be expressed as
(8)argminJ(U1,U2,…,Un)=∑i,j∥Bi-Bj∥2wijmmmn=∑i,j∥Ai×1U1⋯×nUn-Aj×1U1⋯×nUn∥2wij,s.t.mmin∑i∥Ai×1U1⋯×nUn∥2dii=1.
In general, the larger the value of dii=∑jwij is, the more important the tensor Bi is in the embedded tensor space for representing the original tensor Ai. It is easy to see that the objective function will give a high penalty if neighboring tensors Ai and Aj are mapped far apart. Thus if two tensors Ai and Aj are close to each other, then the corresponding tensors Bi and Bj in the embedded tensor space are also expected to be close to each other.
The optimization function of tensor LPP can be formulated as follows:
(9)argminJk(Uk)=∑i,j∥Bik×kUk-Bjk×kUk∥2wijmmmmn=∑i,j∥UkBi(k)-UkBj(k)∥2wijmmmmn=∑i,jtr{(Bi(k)-Bj(k))TUk(×(Bi(k)-Bj(k))Twij(Bi(k)-Bj(k))mmmmnmmmmmmm×(Bi(k)-Bj(k))Twij)UkT}mmmmn=tr{Uk(∑i,j(Bi(k)-Bj(k))mmmmnmmmimmm×(Bi(k)-Bj(k))Twij(∑i,j(Bi(k)-Bj(k)))UkT(∑i,j(Bi(k)-Bj(k))},s.t.mmmtr{Uk(∑iBi(k)Bi(k)Tdii)UkT}=1.
Moreover the transformation matrix Uk can be computed by solving the eigenvectors corresponding to the kth smallest eigenvalues in the generalized eigenvalue equation
(10)(∑i,j(Bi(k)-Bj(k))(Bi(k)-Bj(k))Twij)u=λ(∑iBi(k)Bi(k)Tdii)u.
4. Robust Tensor Preserving Projection
Sparse representation algorithms have been widely studied in signal processing, computer vision, and pattern recognition. Wright et al. [25] used sparse representation for robust face reconstruction and recognition, Qiao et al. [26] proposed sparse preserving projections, and Cheng et al. [27] used the L1 graph for image clustering. As demonstrated in [26, 27], the graphs constructed by the L1 norm have the advantages of greater robustness to noise and information redundancy. In the following, we fuse the sparse representation with tensor feature extraction.
4.1. Sparse Tensor Representation
In this part, we present the sparse representation for the tensor data A1,A2,…,AN. Let Z=[zij]N×N be the optimal sparse representation coefficients. Sparse representation assumes that the training sample Ai(i=1,2,…,N) can be sparsely represented as a linear combination of the other data. Based on this assumption, the following sparse optimization problem was proposed in [25]:
(11)min∥Zi,:∥0s.t.mm∥Ai-∑j,j≠izijAj∥2≤ε,
where N×N matrix Z=[zij]N×N is the representation coefficient matrix satisfying diag(Z)=0 and Zi,: denotes the ith row vector of Z. The parameter ε is a small constant set by users. However, the above optimization problem is NP hard. One can use the convex relaxation method to the NP-hard problem and solve the following optimization problem:
(12)min∥Zi,:∥1s.t.mii∥Ai-∑j,j≠izijAj∥2≤ε.
Due to the sparseness of zij, then only a few zij≠0 (i,j=1,2,…,N). It means that, for tensor Ai(i=1,2,…,N), not all the other tensors are used in the representation. Let zijki(i=1,2,…,N;k=1,2,…,Ki) be the nonzero coefficients. Then the representation error is as follows:
(13)d(Ai)=∥Ai-∑j,j≠izijAj∥2(i=1,2,…,N).
In this paper, we also use sparse representation classifier (SRC) for RTPP. SRC classifies the test sample to the class with the least within-class reconstruction error. For more details of SRC, please refer to [25].
Define the affinity matrix S=[sij]N×N. Let sij=0, if zij=0. We can compute the sijki(i=1,2,…,N;k=1,2,…,Ki) as follows:
(14)min∥Ai-∑k=1KisijkiAjki∥2is.t.mii∑k=1Kisijki=1sijki≥0(k=1,2,…,Ki).
Here we obtain the weights in a similar way as in LLE, except the constraints sijki≥0(k=1,2,…,Ki). The nonnegative constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. Previous studies have shown that there is psychological and physiological evidence for parts-based representation in human brain [11, 13, 17]. The sum-to-one constraint ∑k=1Kisijki=1 is used to make the weights invariant to translation.
Discriminant information can be naturally preserved in the weights, even if no class information is available. In face recognition, one particularly simple but reasonable assumption is that the samples from the same class lie on a linear subspace. In other words, the nonzero weights mostly correspond to the samples from the same class, which implies that the nonzero weights may help distinguish that class from the others. Therefore, the weights tend to include potential discriminant information.
In the design of the proposed RTPP, we use S=[sij]N×N instead of the similarities used in tensor NPE. One advantage of the proposed technique is that the difficulty in selecting the size K of the local neighborhood can be avoided in tensor NPE. Moreover, the similarities can give intuitionistic or semantic interpretation of the represented tensor data. Another advantage is that sparse representation has the potential discriminative ability since most nonzero sparse representation coefficients are located on the samples in the same class as the represented sample.
4.2. Algorithm of Robust Tensor Preserving Projection
For convenience, in this section, we use the notations in Section 3 to derive RTPP. Assuming that U1,U2,…,Uk-1,Uk+1,…,Un are known, now we want to compute the projection matrix Uk. Using the similarities in (13), we have the optimization function as follows:
(15)argminJk(Uk)=∑i∥UkBi(k)-∑jsijUkBj(k)∥2mmmmm=tr{Uk(∑i,j(Bi(k)-sijBj(k))mmmmmmmmmmm×(Bi(k)-sijBj(k))T(∑i,j(Bi(k)-sijBj(k)))UkTUk(∑i,j(Bi(k)-sijBj(k))}s.t.mmmitr{Uk(∑iBi(k)Bi(k)T)UkT}=1.
Let B(k)=[B1(k),B2(k),…,BN(k)], ϵi be a N-dimensional unit vector with the ith element 1, 0 otherwise, and Si,: denotes the ith row vector of S. With simple formulation, we can get
(16)∑i∥UkBi(k)-∑jsijUkBj(k)∥2=tr{Uk(∑i,j(Bi(k)-sijBj(k))(Bi(k)-sijBj(k))T)UkT}=tr{Uk(∑i(Bi(k)-B(k)Si,:)(Bi(k)-B(k)Si,:)T)UkT}=tr{Uk(∑i(B(k)ϵi-B(k)Si,:T)mmmmim×(B(k)ϵi-B(k)Si,:T)T∑i(B(k)ϵi-B(k)Si,:T))UkT}=tr{Uk(∑i(B(k)ϵi-B(k)Si,:T)mmmmmim×(B(k)ϵi-B(k)Si,:T)T∑i(B(k)ϵi-B(k)Si,:T))UkT}=tr{UkB(k)(∑i(ϵi-Si,:T)(ϵi-Si,:T)T)UkT}=tr{UkB(k)(I-S)T(I-S)B(k)TUkT},
where I is N×N identity matrix. We can also obtain
(17)tr{Uk(∑iBi(k)Bi(k)T)UkT}=tr{UkB(k)B(k)TUkT}.
Then the optimization problem in (14) can be rewritten as
(18)argminJk(Uk)=tr{UkB(k)(I-S)T(I-S)B(k)TUkT}s.t.mmmitr{UkB(k)B(k)TUkT}=1.
Then the transformation matrix Uk can be obtained by solving the eigenvectors corresponding to the dk smallest eigenvalues in the generalized eigenvalue equation
(19)(B(k)(I-S)T(I-S)B(k)T)u=λ(B(k)B(k)T)u.
The other transformation matrices can be obtained in a similar manner.
5. Experimental Results
In this section, experiments on Equinox data set and DHUFO data set are presented to evaluate RTPP in Algorithm 1 for recognition tasks. In the experiments, we compare the RTPP algorithm with the tensor NPE method. Besides tensor NPE, we also perform NPE directly on the serial combined data. The serial combined data zserial is a super vector by combining x1,x2,…,xs∈Rm as zserial=[x1T,x2T,…,xsT]T. For visible and thermal infrared data, such as xvisibleandxIR, the serial combined data zserial=[xvisibleT,xIRT]T.
Algorithm 1: The RTPP algorithm.
Input:A1, A2,…,AN(Ai∈Rm1×m2×⋯×mn) and d1×d2×⋯×dn;
(1) Construct the similarity matrix S by (12) and (14);
(2) Compute the embedding as follows:
Initialize U10=Id1×m1, Id2×m2,…,Un0=Idn×mn;
fort=1, 2,…,Tmaxdo
fork=1, 2,…,ndo
Bik=Ai×1U1⋯×k-1Uk-1×k+1Uk+1⋯×nUn;
Bi(k)⇐kBik;
H1=B(k)(I-S)T(I-S)B(k)T;
H2=B(k)B(k)T;
Compute Ukt∈Rdk×mk by solving the eigen function: H1Ukt=H2UktΛk;
if∥Ukt-Ukt-1∥<ε for each k then
break;
end if
end for
end for
Output:Ui=Uit∈Rdi×mi(i=1, 2,…,n).
For the purpose of evaluating the performance of RTPP, we used face verification rate as the criteria. The FERET Verification Testing Protocol [28] recommends using the receiver operating characteristic (ROC) curves to depict the relations between the face verification rate (FVR) and the false accept rate (FAR). The ROC curves were plotted by using the Statistical Learning Toolbox according to the obtained score matrix. For tensor operations, we used the tensor toolbox developed by Bader and Kolda in MATLAB [29]. The sparse representations were obtained by Friedman et al. [30]. In the following experiments, we set η=0.1 for both data sets.
5.1. Experiments on Equinox Data Set
The National Institute of Standards and Technology and Equinox Corporation have developed a database (http://www.equinoxsensors.com/products/HID.html) of face images using registered broadband-visible/IR camera sensors for experimentation and performance evaluations [10]. Since the registration of the thermal images and the corresponding visible images is fulfilled by camera sensors, in our experiments, we did not need to do these procedures.
We used the long-wave infrared (LWIR) (i.e., 8 μm–12 μm) and the corresponding visible spectrum images from this database. The data were collected during a two-day period. Each pair of LWIR and visible light images was taken simultaneously and coregistered with 1/3 pixel accuracy. The LWIR images were radiometrically calibrated and stored as grayscale images with 12 bits per pixels. The visible images were also grayscale images represented with 8 bits per pixel [10].
The database contains frontal faces under the following scenarios: (1) three different light directions: frontal and lateral (right and left); (2) three facial expressions: frown, surprise, and smile; (3) vocals pronunciation expressions: subjects were asked to pronounce several vocals from which three representative frames were chosen; and (4) presence of glasses: for subjects wearing glasses, all of the above scenarios were repeated with and without glasses.
In our experiments, 1320 images (660 thermal images and 660 corresponding registered visible images) were used. These images belonged to 33 individuals. For each individual, we had 20 thermal images and 20 corresponding visible images. Original 12-bit gray level thermal images were converted into 8 bits. All images (including thermal images and visible images) were cut off the background, aligned, and then normalized with a resolution of 28 × 24. The goal of the preprocessing was to remove background and scale the faces. Figure 1 shows sample images of one person in the Equinox data set.
Sample images of one individual from Equinox data set.
For any thermal image and its corresponding visible image, the tensor sample was represented in the size of 28×24×2 pixels. In the experiments, 10 tensor samples (10 thermal images and 10 corresponding visible images) of each individual were randomly selected and used as training set and the remaining 10 tensor samples as test set. The experiments were independently performed 20 times and the average results were calculated.
For our proposed RTPP algorithm and tensor NPE algorithm, the reduced dimensions d1×d2×d3 of the extracted features were 14×12×1 and 10×8×1, respectively. For NPE algorithms performed on IR feature, visible feature, and serial combined feature, the corresponding reduced dimensions were d=168 and d=80, respectively. For tensor NPE algorithm, we performed experiments to obtain the best parameter K (the number of nearest neighbors) for Equinox data set. Figures 2 and 3 showed the ROC curves of the proposed RTPP algorithm and tensor NPE algorithm using different K’s (K=5,10,15,20,25,30,35). From the ROC curves, we can find that the best performance could be obtained when K=5 (for both d=168 and d=80). And the performance of the proposed RTPP algorithm is much better than the tensor NPE algorithm, no matter which K was selected.
ROC curves of RTPP and NPE (NPE was performed on serial combined data with different K’s) on Equinox data set with d1×d2×d3=10×8×1 and d=80.
ROC curves of RTPP and NPE (NPE was performed on serial combined data with different K’s) on Equinox data set with d1×d2×d3=14×12×1 and d=168.
The ROC curves of the different methods were shown in Figures 4 and 5. The NPE algorithm was also separately performed on visible data and thermal infrared data. In the NPE algorithm, we set K=5. The results indicate that the performance of the proposed RTPP algorithm is better than other algorithms.
ROC curves of RTPP and NPE on Equinox data set with d1×d2×d3=10×8×1 and d=80.
ROC curves of RTPP and NPE on Equinox data set with d1×d2×d3=14×12×1 and d=168.
5.2. Experiments on DHUFO Data Set
DHUFO is a database of face images using registered visible/IR camera sensors for experimentation and performance evaluations. The data set was designed by the researchers. In our experiments, the long-wave infrared (LWIR) (i.e., 8 μ–12 μ) sensor was used. The registration of the thermal images and the corresponding visible images was fulfilled by the camera sensors. Face image variations in the DHUFO database included illumination, facial expression, and glasses. In our experiments, 1020 images, which involved variations in illumination and facial expressions, were selected. We manually cropped the face portion of the images. These images belonged to 17 individuals. For each individual, there were 30 thermal images and 30 corresponding visible images. All images (including thermal images and visible images) were cut off the background, aligned, and then normalized with a resolution of 28×24.
For any thermal image and its corresponding visible image, the tensor sample was represented in the size of 28×24×2 pixels. In the experiments, 15 tensor samples (15 thermal images and 15 corresponding visible images) of each individual were randomly selected and used as training set and the remaining 15 tensor samples as test set. The experiments were independently performed 20 times and the average results were calculated.
For our proposed RTPP algorithm, tensor NPE algorithm, and tensor LPP algorithm, the reduced dimensions d1×d2×d3 of the extracted features were 14×12×1 and 10×8×1, respectively. For both NPE and LPP performed on the serial combined feature, the corresponding reduced dimensions were d=168 and d=80, respectively. The ROC curves of the different methods were shown in Figures 6 and 7. In both NPE and LPP algorithm, we set K=5. The results indicate that the performance of the proposed RTPP algorithm is better than other algorithms.
ROC curves of RTPP, NPE, and LPP on DHUFO data set with d1×d2×d3=10×8×1 and d=80.
ROC curves of RTPP, NPE, and LPP on DHUFO data set with d1×d2×d3=14×12×1 and d=168.
5.3. Discussion
Based on the experimental results, the following observations are obtained.
From the ROC curves of different methods on Equinox data set and DHUFO data set, the proposed RTPP algorithm obtained the best performance. The experimental results indicate that combining the L1 norm for sparse tensor learning is a better way than using local information reconstruction.
RTPP does not introduce the local neighborhood parameter K and thus there is essential difference. In RTPP, the L1 norm is combined for the reconstruction coefficients with sparse properties; thus the advantages of robustness to data distortion and the potential discriminative ability proven in [25–27] are encoded in the representation coefficients, which are preserved in the low-dimensional subspace. These are the essential reasons for RTPP to achieve good performance.
Since RTPP well preserves the spatial structure of the original multispectral face images, RTPP outperforms serial combined feature extraction algorithms.
6. Conclusion
We have proposed in this paper a novel tensor learning algorithm, called robust tensor preserving projection (RTPP), for multispectral face recognition. The STE algorithm incorporates tensor manifold criterion to learn multiple subspaces in high-order tensor space by preserving the sparse representation information of the multispectral images. RTPP cannot only keep the underlying spatial structure of multispectral images but also enhance robustness. Experimental results demonstrate the excellent performance of RTPP.
Since RTPP is an unsupervised learning algorithm, one of our future works will be supervised tensor learning algorithms. We also plan to enforce the sparsity on the projection matrix/vector and investigate the sparse projection learning methods for tensor recognition.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Grant no. 61375007) and the Shanghai Pujiang Program (Project no. 12PJ1402200).
KongS. G.HeoJ.BoughorbelF.ZhengY.AbidiB. R.KoschanA.YiM.AbidiM. A.Multiscale fusion of visible and thermal IR images for illumination-invariant face recognition200771221523310.1007/s11263-006-6655-02-s2.0-33748483465KongS. G.HeoJ.AbidiB. R.PaikJ.AbidiM. A.Recent advances in visual and infrared face recognition—a review200597110313510.1016/j.cviu.2004.04.0012-s2.0-11144226973SocolinskyD. A.SelingerA.A comparative analysis of face recognition performance with visible and thermal infrared imageryProceedings of the 16th International Conference on Pattern Recognition2002IEEE217222PanZ.HealeyG.PrasadM.TrombergB.Face recognition in hyperspectral images200325121552156010.1109/TPAMI.2003.12511482-s2.0-0347380867PanZ.HealeyG.PrasadM.TrombergB.Illumination-invariant face recognition in hyperspectral images5093Proceedings of SPIE: Algorithms and Technologies for Multispectral, Hyperspectral , and Ultraspectral Imagery IX2003275282DenesL.MetesP.LiuY.Hyperspectral face database2002CMU-RI-TR-02-25ChouY.BajcsyP.Toward face detection, pose estimation and human recognition from hyperspectral imagery2004NCSA-ALG-04-0005http://isda.ncsa.uiuc.edu/peter/WangS.LiuZ.LvS.LvY.WuG.PengP.ChenF.WangX.A natural visible and infrared facial expression database for expression recognition and emotion inference201012768269110.1109/TMM.2010.20607162-s2.0-77958123122DiW.ZhangL.ZhangD.PanQ.Studies on hyperspectral face recognition in visible spectrum with feature band selection20104061354136110.1109/TSMCA.2010.20526032-s2.0-77958110847SelingerA.SocolinskyD. A.Appearance-based facial recognition using visible and thermal imagery: a comparative study2000Equinox CorporationChenX.FlynnP.BowyerK.PCA-based face recognition in infrared imagery: baseline and comparative studiesProceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures2003WangS.YangJ.SunM.PengX.SunM.ZhouC.Sparse tensor discriminant color space for face verification201223687688810.1109/TNNLS.2012.21916202-s2.0-84866609797SinghS.GyaourovaA.BebisG.PavlidisI.Infrared and visible image fusion for face recognition5404Biometric Technology for Human IdentificationApril 2004585596Proceedings of the SPIE10.1117/12.5435492-s2.0-8844241605HeoJ.KongS. G.AbidiB. R.AbidiM. A.Fusion of visual and thermal signatures with eyeglass removal for robust face recognitionProceedings of IEEE Workshop on Object Tracking and Classification Beyond the Visible Spectrum20049499ArandjelovicO.Multi-sensory face biometric fusion (for personal identification) method detailsProceedings of IEEE Workshop on Object Tracking and Classification Beyond the Visible Spectrum200618ChangH.HarishwaranH.YiM.KoschanA.AbidiB.AbidiM.An indoor and outdoor, multimodal, multispectral and multi-illuminant database for face recognitionProceedings of the Conference on Computer Vision and Pattern Recognition Workshop (CVPRW '06)June 20065410.1109/CVPRW.2006.28WongW. K.ZhaoH.Eyeglasses removal of thermal image based on visible information201314216317610.1016/j.inffus.2011.09.0022-s2.0-80054114450HeX.YanS.HuY.NiyogiP.ZhangH.Face recognition using Laplacianfaces200527332834010.1109/TPAMI.2005.552-s2.0-15044358511DaiG.YeungD. Y.Tensor embedding methods1Proceedings of the 21st AAAI Conference on Artificial Intelligence2005330335ZhengD.DuX.CuiL.Tensor locality preserving projections for face recognition1Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '10)October 20102347235010.1109/ICSMC.2010.56420002-s2.0-78751524830LuH.PlataniotisK. N.VenetsanopoulosA. N.MPCA: multilinear principal component analysis of tensor objects2008191183910.1109/TNN.2007.9012772-s2.0-38749143022LuH.PlataniotisK. N. K.VenetsanopoulosA. N.Uncorrelated multilinear principal component analysis for unsupervised multilinear subspace learning200920111820183610.1109/TNN.2009.20311442-s2.0-75549085623XuD.YanS.ZhangL.LinS.ZhangH.HuangT. S.Reconstruction and recognition of tensor-based objects with concurrent subspaces analysis2008181364710.1109/TCSVT.2007.9033172-s2.0-55149087688ZhouT.TaoD.WuX.Manifold elastic net: a unified framework for sparse dimension reduction201122334037110.1007/s10618-010-0182-xMR27851262-s2.0-79960810210WrightJ.YangA. Y.GaneshA.SastryS. S.MaY.Robust face recognition via sparse representation200931221022710.1109/TPAMI.2008.792-s2.0-61549128441QiaoL.ChenS.TanX.Sparsity preserving projections with applications to face recognition201043133134110.1016/j.patcog.2009.05.005ZBL1186.684212-s2.0-69049112203ChengB.YangJ.YanS.FuY.HuangT. S.Learning with l1-graph for image analysis201019485886610.1109/TIP.2009.2038764MR27520892-s2.0-77949722130MoonH.PhillipsP.The FERET verification testing protocol for face recognition algorithmsProceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition19984853BaderB.KoldaT. G.2009Sandia National Laboratorieshttp://www.sandia.gov/~tgkolda/TensorToolbox/index-2.3.htmlFriedmanJ.HastieT.TibshiraniR.Regularization paths for generalized linear models via coordinate descent20103311222-s2.0-77950537175RosenM.JiangX.Lippmann2000: a spectral image database under constructionProceedings of the International Symposium on Multispectral Imaging and Color Reproduction for Digital Archives1999117122IgarashiT.NishinoK.NayarS.The appearance of human skin2005CUCS-02405Columbia University