Robust Tensor Preserving Projection for Multispectral Face Recognition

Multiple imaging modalities based face recognition has become a hot research topic. A great number of multispectral face recognition algorithms/systems have been designed in the last decade. How to extract features of different spectrumhas still been an important issue for face recognition. To address this problem, we propose a robust tensor preserving projection (RTPP) algorithm which represents a multispectral image as a third-order tensor. RTPP constructs sparse neighborhoods and then computes weights of the tensor. RTPP iteratively obtains one spectral space transformation matrix through preserving the sparse neighborhoods. Due to sparse representation, RTPP can not only keep the underlying spatial structure of multispectral images but also enhance robustness. The experiments on both Equinox and DHUFO face databases show that the performance of the proposed method is better than those of related algorithms.


Introduction
Multibiometrics can be considered the fusion of different sensor modalities in a single recognition system.The reason of using two or more sensor modalities is to improve the recognition accuracy.Multiple imaging modalities based face recognition has become a hot research topic [1][2][3][4][5][6][7].Recent studies have shown that multispectral face recognition offers several advantages, such as invariance to illumination changes [8,9].Multispectral image also reveals anatomical information of a subject [1].Socolinsky and Selinger [3,10] developed different recognition algorithms on visible and thermal infrared face image databases and obtained good performances.Chen et al. [11] tested the effect of illumination, facial expression, and passage of time between the training and testing images.Wang et al. [12] showed that color space combination represents a viable approach for improving face recognition performance.The image-based fusion designed in the wavelet domain and the feature-based fusion developed in the eigenspace domain were shown in [13].Heo et al. [14] proposed to fuse visual and thermal images for robust face recognition.Multisensory biometric fusion algorithms were investigated for personal identification [15].Pan et al. [4,5] analyzed the facial tissue spectral measurements in the near-infrared spectral range (0.7 m-1.0m) for face recognition.Denes et al. [6] tested the spectral asymmetry with three visible bands (0.6 m, 0.7 m, and 0.8 m).Chang et al. [16] fused the multispectral images in the visible spectrum (0.4 m-0.72 m) into a single image to enhance face recognition accuracies.Chou and Bajcsy [7] preprocessed the multispectral images (visible: 0.4 m-0.72 m and near-infrared: 0.65 m-1.1 m) by principal component analysis (PCA) to perform face detection.Based on visible images, Wong and Zhao [17] adopted kernel PCA to remove eyeglasses of thermal face images.
The above algorithms are mainly developed to preserve the global structure information of the multispectral data.They do not clearly treat the manifold structure of the data.However, research results of manifold learning algorithms presented in the past decade demonstrate that the local geometric structure is more important than the global structure since the high-dimensional data often lies on 2 Mathematical Problems in Engineering the low-dimensional manifold [18].Due to the lowdimensional manifold structure of the face images, the manifold-learning-based linear dimension reduction algorithms [4-9, 12-14, 19-24] become popular.
Since these linear feature extraction algorithms cannot deal with high-order tensor data, some of these algorithms were further extended to be multilinear cases, and a lot of tensor-based manifold learning algorithms were proposed by using higher order tensor decomposition [25][26][27].Within the past ten years, there has been great interest in high-order tensor feature extraction, and the tensor-based methods have been popular in computer vision and pattern recognition [28][29][30][31].For example, Igarashi et al. proposed tensor subspace analysis (TSA) [32] for second-order learning.Dai and Yeung proposed tensor NPE (TNPE) [19].Recently, orthogonal tensor neighborhood preserving embedding (OTNPE) was proposed for facial expression recognition.Some variations were also proposed for gait recognition, action recognition, and so forth.For more details, please see the latest survey of multilinear subspace learning [1][2][3].
Recent research demonstrates that the high-order tensor based manifold learning algorithms, such as tensor preserving embedding (tensor NPE) [20], can obtain better performance than the classical feature extraction algorithms on tensor data set.Unfortunately, the tensor data contain large quantities of information redundancy and thus not all the features/variables are important to feature extraction and classification [21][22][23].It was shown that integrating sparse representation and manifold learning for feature extraction may obtain better performance [24].It has been shown that the sparse representation methods can obtain better performance than their corresponding nonsparse methods in the real data.And these sparse methods can give an intuitionistic or semantic interpretation for the transformed features [25].
Till now, the field in high-order tensor data embedding with sparse manner has not been widely investigated and how to extend the manifold learning algorithms integrating sparseness and manifold structure for multispectral face recognition is unsolved.In this paper, motivated by tensor data embedding and sparse representation, we propose a novel method called robust tensor preserving projection (RTPP) for multispectral image feature extraction.The multispectral image is considered a third-order tensor.The aim of RTPP is to obtain transformation matrices through preserving the sparse information of the third-order tensors.
The rest of the paper is organized as follows.In Section 2, we give the related definitions to tensor.In Section 3, the introduction of tensor locality preserving projection is provided.In Section 4, a novel sparse tensor embedding method is presented.Experiments are carried out to evaluate the proposed tensor learning method in Section 5, and conclusions are given in Section 6.
Definition 2. The -mode flattening of the th-order tensor (1)

Tensor Neighborhood Preserving Embedding
Let A 1 ,A 2 , . . .,A  be the multispectral face images in a highorder tensor form and A  ∈ We construct a neighborhood graph to represent the intrinsic geometric structure of M and apply the heat kernel to define the affinity matrix  = [  ] × as where (, A  ) denotes the set of  nearest neighbors of A  and  is a positive constant.The affinity matrix  is then normalized such that each row sums to one.In order to preserve the geometric structure explicitly, we define the following objective function based on the Frobenius norm of a tensor: To eliminate an arbitrary scaling factor in the projection matrices, we impose the following constraint: Then the optimization problem for tensor NPE can be expressed as Note that this optimization problem is a high-order nonlinear programming problem with a high-order nonlinear constraint, making direct computation of the projection matrices infeasible.In general, this type of problems can be solved approximately by employing an iterative scheme which was proposed for low-rank approximation.The optimization problem in (4) can be solved by such an iterative scheme.Assuming that  1 ,  2 , . . .,  −1 ,  +1 , . . .,   are known, let ⇐  B   and based on the properties of tensor and trace, we rewrite the optimization function and the constraint in (4) as follows: Thus, the optimization problem in (4) can be reformulated as The unknown transformation matrix   can be obtained by solving the eigenvectors corresponding to the th smallest eigenvalues in the generalized eigenvalue equation The other transformation matrices can be obtained in a similar manner.

Tensor Locality Preserving Projection.
Different from tensor NPE, the optimization problem for tensor LPP can be expressed as In general, the larger the value of   = ∑    is, the more important the tensor B  is in the embedded tensor space for representing the original tensor A  .It is easy to see that the objective function will give a high penalty if neighboring tensors A  and A  are mapped far apart.Thus if two tensors A  and A  are close to each other, then the corresponding tensors B  and B  in the embedded tensor space are also expected to be close to each other.The optimization function of tensor LPP can be formulated as follows: Moreover the transformation matrix   can be computed by solving the eigenvectors corresponding to the th smallest eigenvalues in the generalized eigenvalue equation

Robust Tensor Preserving Projection
Sparse representation algorithms have been widely studied in signal processing, computer vision, and pattern recognition.Wright et al. [25] used sparse representation for robust face reconstruction and recognition, Qiao et al. [26] proposed sparse preserving projections, and Cheng et al. [27] used the  1 graph for image clustering.As demonstrated in [26,27], the graphs constructed by the  1 norm have the advantages of greater robustness to noise and information redundancy.
In the following, we fuse the sparse representation with tensor feature extraction.
In this paper, we also use sparse representation classifier (SRC) for RTPP.SRC classifies the test sample to the class with the least within-class reconstruction error.For more details of SRC, please refer to [25].
Here we obtain the weights in a similar way as in LLE, except the constraints     ≥ 0 ( = 1, 2, . . .,   ).The nonnegative constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations.Previous studies have shown that there is psychological and physiological evidence for parts-based representation in human brain [11,13,17].The sum-to-one constraint ∑   =1     = 1 is used to make the weights invariant to translation.
Discriminant information can be naturally preserved in the weights, even if no class information is available.In face recognition, one particularly simple but reasonable assumption is that the samples from the same class lie on a linear subspace.In other words, the nonzero weights mostly correspond to the samples from the same class, which implies that the nonzero weights may help distinguish that class from the others.Therefore, the weights tend to include potential discriminant information.
In the design of the proposed RTPP, we use  = [  ] × instead of the similarities used in tensor NPE.One advantage of the proposed technique is that the difficulty in selecting the size  of the local neighborhood can be avoided in tensor NPE.Moreover, the similarities can give intuitionistic or semantic interpretation of the represented tensor data.Another advantage is that sparse representation has the potential discriminative ability since most nonzero sparse representation coefficients are located on the samples in the same class as the represented sample.

Algorithm of Robust
Let  () = [ () 1 ,  () 2 , . . .,  ()  ],   be a -dimensional unit vector with the th element 1, 0 otherwise, and  ,: denotes the th row vector of .With simple formulation, we can get where  is  ×  identity matrix.We can also obtain Then the optimization problem in ( 14) can be rewritten as Then the transformation matrix   can be obtained by solving the eigenvectors corresponding to the   smallest eigenvalues in the generalized eigenvalue equation The other transformation matrices can be obtained in a similar manner.

Experimental Results
In For the purpose of evaluating the performance of RTPP, we used face verification rate as the criteria.The FERET Verification Testing Protocol [28] recommends using the receiver operating characteristic (ROC) curves to depict the relations between the face verification rate (FVR) and the false accept rate (FAR).The ROC curves were plotted by using the Statistical Learning Toolbox according to the obtained score matrix.For tensor operations, we used the tensor toolbox developed by Bader and Kolda in MATLAB [29].The sparse representations were obtained by Friedman et al. [30].In the following experiments, we set  = 0.1 for both data sets.

Experiments on Equinox Data
Set.The National Institute of Standards and Technology and Equinox Corporation have developed a database (http://www.equinoxsensors.com/products/HID.html) of face images using registered broadband-visible/IR camera sensors for experimentation and performance evaluations [10].Since the registration of the thermal images and the corresponding visible images is fulfilled by camera sensors, in our experiments, we did not need to do these procedures.We used the long-wave infrared (LWIR) (i.e., 8 m-12 m) and the corresponding visible spectrum images from this database.The data were collected during a two-day period.Each pair of LWIR and visible light images was taken simultaneously and coregistered with 1/3 pixel accuracy.The LWIR images were radiometrically calibrated and stored as grayscale images with 12 bits per pixels.The visible images were also grayscale images represented with 8 bits per pixel [10].
The database contains frontal faces under the following scenarios: (1) three different light directions: frontal and lateral (right and left); (2) three facial expressions: frown, surprise, and smile; (3) vocals pronunciation expressions: subjects were asked to pronounce several vocals from which three representative frames were chosen; and (4) presence of glasses: for subjects wearing glasses, all of the above scenarios were repeated with and without glasses.
In our experiments, 1320 images (660 thermal images and 660 corresponding registered visible images) were used.These images belonged to 33 individuals.For each individual, we had 20 thermal images and 20 corresponding visible images.Original 12-bit gray level thermal images were converted into 8 bits.All images (including thermal images and visible images) were cut off the background, aligned, and then normalized with a resolution of 28 × 24.The goal of the preprocessing was to remove background and scale the faces.
Figure 1 shows sample images of one person in the Equinox data set.
For any thermal image and its corresponding visible image, the tensor sample was represented in the size of 28 × 24 × 2 pixels.In the experiments, 10 tensor samples (10 thermal images and 10 corresponding visible images) of each individual were randomly selected and used as training set and the remaining 10 tensor samples as test set.The experiments were independently performed 20 times and the average results were calculated.
For our proposed RTPP algorithm and tensor NPE algorithm, the reduced dimensions  1 × 2 × 3 of the extracted features were 14 × 12 × 1 and 10 × 8 × 1, respectively.For NPE algorithms performed on IR feature, visible feature, and serial combined feature, the corresponding reduced dimensions were  = 168 and  = 80, respectively.For tensor NPE algorithm, we performed experiments to obtain the best parameter  (the number of nearest neighbors) for Equinox data set.Figures 2 and 3 showed the ROC curves of the proposed RTPP algorithm and tensor NPE algorithm using different 's ( = 5, 10, 15, 20, 25, 30, 35).From the ROC curves, we can find that the best performance could be obtained when  = 5 (for both  = 168 and  = 80).And the performance of the proposed RTPP algorithm is much better than the tensor NPE algorithm, no matter which  was selected.
The ROC curves of the different methods were shown in Figures 4 and 5.The NPE algorithm was also separately performed on visible data and thermal infrared data.In the NPE algorithm, we set  = 5.The results indicate that the performance of the proposed RTPP algorithm is better than other algorithms.

Experiments on DHUFO Data
Set. DHUFO is a database of face images using registered visible/IR camera sensors for experimentation and performance evaluations.The data set was designed by the researchers.In our experiments, the long-wave infrared (LWIR) (i.e., 8 -12 ) sensor was used.The registration of the thermal images and the corresponding visible images was fulfilled by the camera sensors.Face image variations in the DHUFO database included illumination, facial expression, and glasses.In our experiments, 1020 images, which involved variations in illumination and facial expressions, were selected.We manually cropped the face portion of the images.These images belonged to 17 individuals.For each individual, there were 30 thermal images and 30 corresponding visible images.All images (including thermal images and visible images) were cut off the background, aligned, and then normalized with a resolution of 28 × 24.
For any thermal image and its corresponding visible image, the tensor sample was represented in the size of 28 × 24 × 2 pixels.In the experiments, 15 tensor samples (15 thermal images and 15 corresponding visible images) of each individual were randomly selected and used as training set and the remaining 15 tensor samples as test set.The experiments were independently performed 20 times and the average results were calculated.
For our proposed RTPP algorithm, tensor NPE algorithm, and tensor LPP algorithm, the reduced dimensions  1 ×  2 ×  3 of the extracted features were 14 × 12 × 1 and 10 × 8 × 1, respectively.For both NPE and LPP performed on the serial combined feature, the corresponding reduced dimensions were  = 168 and  = 80, respectively.The ROC curves of the different methods were shown in Figures 6 and 7.In both NPE and LPP algorithm, we set  = 5.The results indicate that the performance of the proposed RTPP algorithm is better than other algorithms.norm for sparse tensor learning is a better way than using local information reconstruction.
(2) RTPP does not introduce the local neighborhood parameter  and thus there is essential difference.
In RTPP, the  1 norm is combined for the reconstruction coefficients with sparse properties; thus the advantages of robustness to data distortion and the potential discriminative ability proven in [25][26][27] are encoded in the representation coefficients, which are preserved in the low-dimensional subspace.These are the essential reasons for RTPP to achieve good performance.
(3) Since RTPP well preserves the spatial structure of the original multispectral face images, RTPP outperforms serial combined feature extraction algorithms.

Conclusion
We have proposed in this paper a novel tensor learning algorithm, called robust tensor preserving projection (RTPP),   for multispectral face recognition.The STE algorithm incorporates tensor manifold criterion to learn multiple subspaces in high-order tensor space by preserving the sparse representation information of the multispectral images.RTPP cannot only keep the underlying spatial structure of multispectral images but also enhance robustness.Experimental results demonstrate the excellent performance of RTPP.
Since RTPP is an unsupervised learning algorithm, one of our future works will be supervised tensor learning algorithms.We also plan to enforce the sparsity on the projection matrix/vector and investigate the sparse projection learning methods for tensor recognition.

Figure 1 :
Figure 1: Sample images of one individual from Equinox data set.