A Gabor-Block-Based Kernel Discriminative Common Vector Approach Using Cosine Kernels for Human Face Recognition

In this paper a nonlinear Gabor Wavelet Transform (GWT) discriminant feature extraction approach for enhanced face recognition is proposed. Firstly, the low-energized blocks from Gabor wavelet transformed images are extracted. Secondly, the nonlinear discriminating features are analyzed and extracted from the selected low-energized blocks by the generalized Kernel Discriminative Common Vector (KDCV) method. The KDCV method is extended to include cosine kernel function in the discriminating method. The KDCV with the cosine kernels is then applied on the extracted low-energized discriminating feature vectors to obtain the real component of a complex quantity for face recognition. In order to derive positive kernel discriminative vectors, we apply only those kernel discriminative eigenvectors that are associated with nonzero eigenvalues. The feasibility of the low-energized Gabor-block-based generalized KDCV method with cosine kernel function models has been successfully tested for classification using the L 1, L2 distance measures; and the cosine similarity measure on both frontal and pose-angled face recognition. Experimental results on the FRAV2D and the FERET database demonstrate the effectiveness of this new approach.


Introduction
Face authentication has gained considerable attention in the near past through the increasing need for access verification systems using several modalities like voice, face image, fingerprints, pin codes, and so forth. Such systems are used for the verification of a user's identity on the Internet, when using automated banking system, or when entering into a secured building, and so on. The Gabor wavelet transformation (GWT) models well the receptive field profiles of the cortical simple cells and also has the properties of multiscale and multidirectional filtering. These properties are in accordance with the characteristics of human vision [1][2][3]. Further, the discriminant analysis is an effective image feature extraction and recognition technique as they allow the extraction of discriminative features, reduce dimensionality, and consume less computing time [4,5]. In our previous work [6], we combined the GWT and Bayesian principal component analysis (PCA) techniques and presented a GWT-Bayesian PCA face recognition method which outperforms some conventional linear discriminating methods. As an extension of linear discriminant technique, the kernel based nonlinear discriminant analysis technique has now been widely applied to the field of pattern recognition. Baudat and Anouar [7] developed a commonly used generalized discriminant analysis (GDA) method for nonlinear discrimination. Jing et al. [8] put forward a Kernel Discriminative Common Vectors (KDCVs) method. In this paper we develop blockbased GWT KDCV and propose a block-based low-energized nonlinear GWT discriminant feature extraction for enhanced face recognition. As the high energized blocks of GWT image generally have larger nonlinear discriminability values. Then the nonlinear discriminant features are extracted from the selected low-energized block of GWT image by presenting a new generalized KDCV method is then extended to include cosine kernel model which extracts the nonlinear discriminating features from the selected blocks to get the best recognition result. These features are finally used for classification using three different classifiers. The experimental results demonstrate the effectiveness of this new approach.

Computational Intelligence and Neuroscience
In this paper a novel method is proposed based on selecting low-energized blocks of Gabor wavelet responses as feature points, which contain discriminate facial feature information, instead of using predefined graph nodes as in elastic graph matching (EGM) [9], which reduces representative capability of Gabor wavelets. This corresponds to enhancement of edges for eyes, mouth, nose, which are supposed to be the most important points of a face; hence the algorithm allows these facial features to keep overall face information along with local characteristics.
The remainder of this paper is organized as follows. Section 2 describes the derivation of low-energized blocks from the GWT images. Section 3 details the generalized KDCV method with cosine kernel function for enhanced face recognition. Section 4 shows the performance of the proposed method on the face recognition by applying it on the datasets from the FERET [10], and FRAV2D [11] face databases, and by comparing it with some of the previous KDCV methods and we conclude our paper in Section 5.

2D Gabor Wavelets
Gabor wavelets are used in image analysis because of their biological relevance and computational properties [12,13]. The Gabor transform is suitable for analyzing gradually changing data such as the face, iris, and eyebrow images. The Gabor filter used here has the following general form: where μ and ν define the orientation and scale of Gabor kernels, respectively, z = (x, y) is the variable in spatial domain, · denotes the norm operator, and k μ,ν is the frequency vector which determines the scale and orientation of Gabor kernels, k μ,ν = k ν e iφμ where k ν = k max / f ν and k max = π/2, φ μ = πμ/8, μ = 0, . . . , 7, where f is the spacing factor. Here Gabor wavelets at five different scales, ν ∈ {0, . . . , 4} and eight orientations μ ∈ {0, . . . , 7} are chosen.
The term e −(σ 2 /2) is subtracted from (1) in order to make the kernel DC-free, thus becoming insensitive to illumination. The magnitude of the convolution outputs is indicated as O μ,υ (z). The kernels exhibit strong characteristics of spatial locality and orientation selectivity, making them a suitable choice for image feature extraction when one's goal is to derive local and discriminating features for (face) classification.

Gabor-Based Feature
Representation. The Gabor wavelet representation of an image is the convolution of the image with a family of Gabor kernels as defined in (1). Let I(x, y) be the gray level distribution of an image, the convolution output of image I and a Gabor kernel ϕ μ,υ is defined as where z = (x, y), and * denotes the convolution operator.

Low-Energized Block Based GWT Feature Extraction.
It is to be noted that we considered the magnitude of O μ,υ (z), but did not use the phase, which is consistent with the application of Gabor representations [14,15]. As the outputs (O μ,ν (z) : μ ∈ {0, . . . , 4}, v ∈ {0, . . . , 7}) consist of 40 different local scale and orientation features, the dimensionality of the Gabor transformed image space is very high. So the following technique is applied for the extraction of low-energized discriminability feature vector χ k from the convolution outputs. The method for the extraction of low-energized block based features from the GWT image is explained in Algorithm 1.

Algorithm 1. Consider
Step 1. Find the convolution outputs of the original image with all the Gabor kernels. As the convolution outputs contain complex values, so replace each pixel value of the convolution output by its modulus and the resultant image is termed as G I , where I = 1, 2, . . . , k, k = total Gabor kernels.
Step 2. Obtain the final single Gabor transformed image I GF = k I=1 G I , k = no. of Gabor kernels.
Step 3. Compute the overall mean (g) of the final Gabor transformed image I GF as, g = (1/(m × n)) x,y I GF (x, y), where m × n is the size of image.
Step 4. Divide the final Gabor transformed image I GF into windows of size ω × ω. Thus the total number of windows, l = m/ω × n/ω .
Step 5. For each window w i , if minimum (w i ) ≤ g, then extract a block B i of size c × c from w i , with centre pixel as the minimum (w i ). The value of c must be odd integer and less than ω/2.
Step 6. For each window w i , if minimum (w i ) ≤ g, and there does not exist block B i of size c × c from w i as mentioned in Step 5, with centre pixel as the minimum (w i ), then create a block B i of size c × c, by considering the unavailable pixel values as g.
Step 7. For each window w i , if minimum (w i ) > g, then create a pseudo block B i of size c × c with all elements as g.
Computational Intelligence and Neuroscience 3 Step 8. Extract feature vector f i from each block B i in a systematic order, where f i contains all elements of the block B i .
Step 9. Concatenate all the feature vectors f i , i = 1, 2, . . . , l to obtain the final feature vector χ, which is the final extracted low-energized feature vector. This extracted feature vector χ encompasses the low valued discriminable elements of the Gabor transformed image, and the size of this feature vector is [(Total no. of blocks) × (size of the block)] which is much lower in dimension in comparison to the original image (dimension: m × n) and the GWT image (dimension: Thus this augmented Gabor feature vector encompasses most of the discriminable feature elements of the Gabor wavelet representation set, S = (O μ,υ (z) : μ ∈ {0, . . . , 4}, υ ∈ {0, . . . , 7}). The window size ω × ω is one of the important features of the above algorithm, as it must be chosen small enough to capture most of the important features and large enough to avoid redundancy. Since it is observed that there are some windows each of whose minimum value is not less than the overall mean, so Step 7 is applied in order not to get stuck on a local minimum.
In the experiments we took a window and a block of size 7 × 7 and 3 × 3, respectively, to extract the lowenergized feature vector. Thus the extracted facial features can be compared locally, instead of using a general structure, allowing us to make a decision from the parts of the face.

Generalized Kernel Discriminative Common Vector (KDCV) Method
Sometimes the discriminative common vectors are not distinct in the original sample space. In such cases one can map the original sample space to a higher-dimensional space F, where the new discriminative common vectors in the mapped space are distinct from one another. This is because a mapping, Φ : R N → F, x → φ(x), can map two vectors that are linearly dependent in the original sample space onto two vectors that are linearly independent in F. As the mapped space can have arbitrarily large, possibly infinite, dimensionality, hence it is reasonable to use the DCV method.
Nc )} represent the matrix whose columns are the transformed training samples in F. Here c is the number of training classes; the ith class contains N i samples. The within-class scatter matrix S Φ W , the between-class scatter matrix S Φ B , and the total scatter matrix S Φ T in F are given by where μ Φ is the mean of all samples, and μ Φ i is the mean of samples of the ith class in F.
R M×M is a block-diagonal matrix and each G i ∈ R Ni×Ni is a matrix with all its elements equal to 1/N i ; U = diag(u 1 , u 2 , . . . , u c ) is a block-diagonal matrix and each u i ∈ R Ni×1 is a vector with all its elements equal to 1/N i ; L = diag(l 1 , l 2 , . . . , l c ) ∈ R M×C is a block-diagonal matrix and each l i ∈ R M×1 is a vector with the entries N i /M; J M ∈ R M×M is a matrix with entries 1/ √ M. The aim of the DCV algorithm is to acquire the optimal projection transform W in the null space of S W [16]: The approach for computing this optimal projection vector is as follows.
Step 10. Project the training set samples onto the range Step 11. Find vectors V that span the null space of S Φ W .
Step 12. Remove the null space of Step 13. Obtain the final projection matrix W, which will where Λ is the diagonal matrix with nonzero eigenvalues, U, the associated matrix of normalized eigenvectors, and V is the basis for the null space of S Φ W , here there are at most (C − 1) projection vectors.
Let the common vector be Φ(x i com ), then each of the feature vectors can be written as represent the common and different parts of Φ(x i m ) separately. It has been proved by Gülmezoglu et al. [17] that for all samples of the ith class, their common vector parts are same. The common vector can be written as Φ( . Thus, a set of common vectors for is obtained as: Compute the optimal projection transform. Let S Φ com denote the total scatter matrix of Q. W com is composed of the eigenvectors corresponding to the positive eigenvalues of S Φ com . W com is designed to satisfy the criteria: W diff is calculated from the different vectors. W diff is composed of the eigenvectors corresponding to the positive eigenvalues of S Φ T diff . The optimal projection transform W is obtained as Thus for each sample Φ(x) in the kernel space using the generalized nonlinear KDCV method, we construct W and 4 Computational Intelligence and Neuroscience then extract the kernel discriminative common and different vector Y com and Y diff . Then, Thus we obtain a new sample set Y corresponding to X. This sample set Y is used for image classification. All mathematical properties of the linear DCV are carried over to the kernel DCV method with the modifications that are applied to the mapped samples, After performing the feature extraction, all training set samples of each class typically give rise to a single distinct discriminative common vector.

KDCV Approach Using Cosine Kernel Function.
Let χ 1 , χ 2 , . . . , χ n ∈ R N be the data in the input space, and Φ be a nonlinear mapping between the input space and the feature space; Φ : R N → F. Generally three classes of kernel functions are used for nonlinear mapping: (a) the polynomial kernels, (b) the Radial Basis Function (RBF) kernels, and (c) the sigmoid kernels [18].
The RBF kernels, are also known as isotropic stationary kernels, are defined by Φ : and · is the norm operator. Normally a Gaussian function is preferred as the RBF, in most of the RBF kernels in pattern classification applications. The Gaussian function for RBF kernels is given by . But the globally used RBF kernels yield dense Gram matrices, which can be highly illconditioned for large datasets.
So in this work the cosine kernel function is considered as the kernel function Φ, defined by This result can be expressed in terms of the angle θ between the inputs: θ = cos −1 (Φ(x, y)/ (Φ(x, x))Φ(y, y)). This shows that this kernel has a dependence on the angle between the inputs.
As a practical matter, we note that cosine kernels do not have any continuous tuning parameters (such as kernel width in RBF kernels), which can be laborious to set by cross validation.
Large margin classifiers are known to be sensitive to the way features are scaled [19]. Therefore it is essential to normalize either the data or the kernel itself. The recognition accuracy can severely degrade if the data is not normalized [19]. Normalization can be performed at the level of the input features or at the level of the kernel. It is often beneficial to scale all features to a common range, for example, by standardizing the data. An alternative way to normalize is to convert each feature vector into a unit vector. If the data is explicitly represented as vectors one can normalize the data by dividing each vector by its norm such that x = 1, after normalization. Here normalization is performed at the level of the kernel, that is, normalizing in feature-space, leading to φ(x) = 1 (or equivalently that k(x, x) = 1). This is accomplished by using the cosine kernel which normalizes a kernel x ). Normalizing data to unit vectors reduces the dimensionality of the data by one since the data is projected to the unit sphere.
In order to derive positive kernel nonlinear discriminating features (9), we consider only those eigenvectors that are associated with positive eigenvalues.

Similarity Measures and Classification
Finally the lower-dimensional, low-energized extracted feature vector of the GWT image is used as the input data instead of the whole image in the proposed method to derive the kernel discriminative feature vector, W, using (8). Let M k be the mean of the training samples for class w k , where k = 1, 2, . . . , l where l is the number of classes. The classifier then applies, the nearest neighbor (to the mean) rule for classification using the similarity (distance) measure δ: The low-energized KDCV vector Y L is classified to that class of the closest mean M k using the similarity measure δ. The similarity measures used here are L 1 distance measure, δ L1 , L 2 distance measure, δ L2 , and the cosine similarity measure, δ cos , which are defined as where T is the transpose operator and · denotes the norm operator. Note that the cosine similarity measure includes a minus sign in (14) because the nearest neighbour (to the mean) rule (11) applies minimum (distance) measure rather than maximum. Further studies have been made on the FERET dataset using the standard protocols, that is, the Fa, Fb, DupI, and DupII set to assess the performance of the proposed method.        exclude those, and are transformed into gray images and is scaled to m × n here (92 × 112) is used. Figure 3 shows all samples of one subject. The details of the images are as follows: Figure 3 Figure 4 shows all sample images of one class of the dataset used from FERET database.

Further Evaluation on the FERET Face Dataset.
We further use the FERET face database for testing our proposed method, as it is one of the most widely used databases to evaluate face recognition algorithms [20]. FERET contains a gallery set, Fa, and four testing sets: Fb, Fc, DupI, and Dup II. In the Fa set, there are 1196 images and contains one image per individual. The Fb set has 1195 images of people with   Figure 4 shows samples from FERET face database. All images are aligned and cropped to 112 × 92 according to [24]. The extracted low-energized Gabor feature vector is considered as input to a trained KDCV with cosine kernel and its output is compared with a gallery set using the L 1 , L 2 , and cosine similarity measure. The recognition rates of different methods on the FERET probe sets are shown in Table 5. The results are compared with the most recent stateof-the-art with the FERET database. Our results with the FERET database are equivalent (with a difference of ±1%) to the most recent works on the FERET dataset. Note that the methods described in [21,22] use the Gabor wavelets to generate their feature vectors. As the Gabor wavelets have a much higher algorithmic complexity, so the overall 8 Computational Intelligence and Neuroscience computing cost is very high. On the other hand, our blockbased low-energized Gabor feature is a very low-dimensional feature vector which reduces the algorithmic complexity. Also with the use of cosine kernel as the kernel function in the KDCV makes the proposed method quite fast and more suitable to real applications. Furthermore, we compare the face recognition performance of our proposed low-energized GWT-KDCV method using cosine kernels, with some other well-known methods like generalized discriminant analysis (GDA) method [7], (EGM) [9], Discrete Cosine transformation (DCT) and linear discriminant analysis (LDA), DCT-LDA method [4], DCT-GDA [23], GWT-LDA, DCT-KDCV method [25], and the Gabor fusion KDCV method [8]. Classification results obtained from the proposed method are comparable or even better in some cases than above-mentioned methods. Also the cosine similarity measure is more suitable for classifying the nonlinear real KDCV features shown in the tables, Tables 1, 2, 3, 4, 5, and 6.
As the proposed method performs best with the cosine similarity classifier, so the specificity rate of the proposed method is evaluated for the FERET and FRAV2D dataset using the cosine similarity measure shown in Figure 7

Results.
Experiments conducted using the low-energized block based Gabor KDCV method, with three different similarity measures on both the FERET and FRAV2D databases are shown in Figures 5 and 6. Considering, only the low dimensional, low-energized features of GWT image, greatly improves the computing speed of nonlinear discriminant method. The results of recognition accuracy (in terms of sensitivity) versus dimensionality reduction (number of features) and the cumulative match curves using the three   Figures 5 and 6. From the results on both FERET and FRAV2D database it is seen that the cosine similarity measure performs the best, followed in order by the L 2 and the L 1 measure. Figures 5 and 6 indicate that the proposed method performs well with a lower dimension as well. These results show that there is certain robustness to age and illumination. Our results indicate that (i) the low-energized block-based Gabor features with KDCV approach greatly enhance the face recognition performance as well as reduce the dimensionality of the feature space when compared with the Gabor features as shown in Tables 1 and 3. For example, the similarity measure improves the face recognition accuracy by almost 10% using only the few lowenergized Gabor features with improved discriminative power compared to the original Gabor features as shown in Tables 1 and 3.
(ii) The proposed method further enhances face recognition with the use of cosine kernel along with the cosine similarity measure.
Experimental result indicates that the use of cosine kernels in the KDCV further increases the discriminative power of the feature vector extracted from the low-energized block of GWT image and hence, is an effective feature extraction approach, performing better way to extract more effective discriminating features than the GDA. The extracted low-energized feature vector by the proposed Algorithm 1 of Section 2.2 enhances the face recognition performance in presence of occlusions. Experimentally it has been observed that this method is less time consuming than the EGM and other well-known algorithms [4,21,22,24,25]. (a) Our results show that the proposed method greatly enhances recognition performance, (b) reduces the dimensionality of the feature space, and (c) the Cosine similarity distance classifier further increases the face recognition accuracy, as it calculates the angle between two vectors and is not affected by their magnitude.

Conclusion
This paper introduces a novel block-based GWT generalized KDCV method using the cosine kernel for frontal and pose-angled face recognition. As cosine kernel function is used here, so there is no need of data normalization and the parameter tuning can be avoided as in the case of (RBF) Gaussian kernels. Also the derived low dimensional low-energized features are characterized by spatial frequency, locality, and orientation selectivity to cope with the variations due to illumination and facial expression changes as a property of Gabor kernels. Such characteristics produce salient local features, such as the eyes, nose, and mouth, that are suitable for face recognition. The KDCV method extended with cosine kernel is then applied on these extracted feature vectors to finally obtain only the real nonlinear discriminating kernel feature vector with improved discriminative power, containing salient facial features that are compared locally instead of a general structure, and hence allows to make a decision from the different parts of a face and thus maximizes the benefit of applying the idea of "recognition by parts." So the method performs well in presence of occlusions (e.g., sunglass, scarf  etc.) that is when there are sunglasses or any other obstacles the algorithm compares face in terms of mouth, nose, and other features rather than the eyes.