Modified SIFT Descriptors for Face Recognition under Different Emotions

The main goal of this work is to develop a fully automatic face recognition algorithm. Scale Invariant Feature Transform (SIFT) has sparingly been used in face recognition. In this paper, a Modified SIFT (MSIFT) approach has been proposed to enhance the recognition performance of SIFT. In this paper, the work is done in three steps. First, the smoothing of the image has been done using DWT. Second, the computational complexity of SIFT in descriptor calculation is reduced by subtracting average from each descriptor instead of normalization. Third, the algorithm is made automatic by using Coefficient of Correlation (CoC) instead of using the distance ratio (which requires user interaction). The main achievement of this method is reduced database size, as it requires only neutral images to store instead of all the expressions of the same face image. The experiments are performed on the Japanese Female Facial Expression (JAFFE) database, which indicates that the proposed approach achieves better performance than SIFT based methods. In addition, it shows robustness against various facial expressions.


Introduction
Face recognition systems are mostly used as a mass security measure and user authentication and so forth; the faces can be easily recognized by humans, but automatic recognition of face by machine is a difficult and complex task.Furthermore, it is not possible that a human being always conveys the same expression of face.The expression of human face randomly changes with respect to his mood.Thus, it becomes more challenging task to compare face under different emotions with only neutral faces which are stored in the database.Many approaches have been proposed for face recognition.The SIFT has properties to match different images and objects [1].The SIFT algorithm extracts the interesting key points from an image to produce a feature description.These extracted features are invariant to orientation, scaling, illumination changes, and affine transforms; therefore, they are very well suited for face description [2].
This paper is organized as follows.Section 2 of the paper describes the related work.Section 3 describes the original SIFT.Section 4 describes a brief introduction to Discrete Wavelet Transform (DWT).Section 5 presents the proposed work which includes descriptor calculation with MSIFT and descriptor matching.Section 6 presents the methodology of proposed work with proposed algorithm.Section 7 illustrates the implementation and results in which proposed technique has been compared with other efficient techniques.Section 8 briefly presents the main contribution of this paper.In last, Section 9 describes the conclusion of the proposed approach.

Related Work
Face recognition is an important computer vision problem.A lot of research has been conducted by many researchers.Hence, the related work has been further divided into three sections, that is, related work of face recognition under different emotions, related work of SIFT, and related work of Modified/Improved SIFT.

Related Work of Face Recognition under Different Emotions.
In 2001, Tian et al. [3] proposed an automatic face analysis system to analyze facial expression based on permanent features and transient features.They used Gabor wavelets and 2 Journal of Engineering multistage model to extract permanent features, and canny edge detection method for transient features.The neural network has been used as a classifier.In 2004, Pantic and Rothkrantz [4] proposed a way to recognize facial expressions in front view and profile view using rule based classifier on 25 subjects of face expression with 86% accuracy.Buciu and Pitas [5] proposed a technique discriminant nonnegative matrix factorization (DNMF) and compared it with local nonnegative matrix factorization (LNMF) algorithm and nonnegative matrix factorization (NMF) method.The proposed algorithm yields 81.4% accuracy on 164 samples of Cohn-Kanade and only 55% to 68% accuracy for all three methods on 150 samples of JAFFE database.In 2005, Shan et al. [6] represented micro patterns of face with local binary patterns (LBP).They performed experiments on the Cohn-Kanade database.The experiments demonstrate that the features extracted by LBP are very efficient for recognition of facial expression and also for the images with low resolution.In 2008,Kharat and Dudul [7] used Discrete Cosine Transform (DCT), Fast Fourier Transform (FFT), and Singular Value Decomposition (SVD) and extracted features for recognition of emotions (Sadness, Happiness, Fear, Surprise, and Neutrality).They used Support Vector Machine (SVM) to recognize the emotions.In 2009, Vretos et al. [8] proposed appearance based approach using PCA with SVM based classifier.They achieved classification accuracy up to 90% on Chon-Kanade facial expression database.In 2012, Praseeda Lekshmi and Sasikumar [9] presented analysis of face expressions using Gabor wavelet to extract features of face.They performed classification with neural network.Londhe and Pawar [10] performed expression analysis based on entropy, kurtosis, and skewness.They used two-layer feedforward neural network to classify the face features.They achieved 92.2% recognition rate for the different face expressions (Happiness, Anger, Fear, Sadness, Surprise, and Disgust).In 2013, Kalita and Das [11] proposed eigenvector based method with Euclidean distance to calculate the minimum distance between the test image and different facial expressions.They achieved 95% recognition rate.In 2014, Donia et al. [12] presented a method for face expression recognition using histogram of oriented gradients (HOG) to extract facial expression features and Support Vector Machine (SVM) to classify different emotions of face.They achieved the 95% recognition rate for static images and 80% recognition rate for videos.

Related Work of SIFT.
Recently, many researchers have used SIFT for face recognition.The first work on object recognition with SIFT was performed by Lowe [17] in 1999.In this approach, the local feature vectors are obtained from an image.These feature vectors do not change with image scaling, rotations, and translation.In 2004, Lowe [18] again described the method to perform matching by extracting distinctive invariant features of an image.In 2006, Mohamed [19] proposed SIFT approach for face recognition and compared it with recognition techniques, that is, eigenfaces and fisher faces.The results show the superiority of SIFT over these two methods, especially using smaller training sets.In 2008, Yanbin et al. [20] have used SIFT to extract facial features.They compared real extracted features with training sets to recognize the face.They performed experiments with ORL face database.They achieved recognition rate of 96.3% for SIFT, 92.5% for PCA, 91.6% for ICA, and 92.8% for FLD.In 2009, Majumdar and Ward [21] proposed a method to reduce the number of SIFT descriptors with discriminative ranking of SIFT features for the recognition of face.In 2010, Chennamma et al. [22] proposed a new approach based on the SIFT features for identification of faces from the manipulated face images.They compared the proposed approach with eigenfaces and proved that proposed technique provides better results.In 2010, Križaj et al. [23] proposed a novel face recognition technique named fixed key point SIFT (FSIFT).The performance of proposed technique has been evaluated on EYB database, and they have concluded that proposed method performs better than the other techniques such as LDA, PCA, and other recognition techniques which are based on SIFT.In 2011, Yang and Bhanu's paper [24] includes a study about recognition of facial expression using a model with emotion avatar image.Each frame has been registered with an avatar face model by using SIFT algorithm.LBP and LPQ methods were used for feature extraction.The results were evaluated on the training data of 155 videos from GEMEP FERA database.The results showed that algorithm removes the person specific information for emotion and handles the unseen data very well.In 2013, Park and Yoo [25] proposed an algorithm based on Gabor filters and LBP histogram sequence to represent face images, and SIFT detector to extract local feature points.The proposed method has an outstanding performance in processing time and memory.Barbu [2] proposed a robust automatic unsupervised recognition system using SIFT characteristics.He developed an automatic facial feature vector classification technique based on a hierarchical agglomerative clustering.He also introduces a novel metric for the obtained feature vectors and produces approximately 90% face recognition rate.In 2014, Dagher et al. [26] proposed face recognition using SIFT.After obtaining the key points from SIFT algorithm, the -means algorithm has been applied on it.The proposed algorithm provides better recognition rate than LDP and other SIFT based algorithms.In 2015, Lenc and Kral [27] proposed a corpus creation algorithm in order to extract the faces from the database and to create a facial corpus.They evaluated the performance of SIFT based Kepenekci face recognition approach with the original Kepenekci method.The experiments show that their approach significantly outperforms the original one and the error rate reduction is about 26% in a relative value.

Related Work of Modified/Improved SIFT.
Many researchers have also modified the original SIFT algorithm to increase its performance and to decrease the complexity.In 2004, Ke and Sukthankar [15] proposed PCA based SIFT technique which increases the speed and accuracy of classification under controlled as well as real world conditions.In 2008, Tang et al. [13] also proposed the Modified SIFT algorithm to provide such feature points, which are invariant for image matching under noise.They also used the Earth Mover's Distance (EMD) to measure the similarity between two descriptors.Alhwarin et al. [28] proposed an improvement on SIFT algorithm by recognizing the objects through reliable feature matching criteria.This is done by splitting the features extracted from both the test and the model object image into several subgroups before they are matched.They reduced the 40% processing time for matching of stereo images.In 2008, Gul-e-Saman and Gilani [29] improved the performance of SIFT by changing the descriptors and localization of key points.Instead of smoothed weighted histograms of SIFT, the kernel principal component analysis (KPCA) is applied in order to normalize the image patch.They concluded that KPCA based descriptors are more compact, distinctive, and robust to distortions.In 2010, Alhwarin et al. [30] again proposed a method for fast matching of SIFT feature.They observed that the proposed approach for feature matching increases the speed by 1250 times with respect to exhaustive search at the same accuracy rate.Bastanlar et al. [14] proposed the Improved SIFT matching to increase the number of accurate matches while removing the false matches.It includes preprocessing of the images before matching method to increase the performance of SIFT.In 2012, Zhang et al. [31] proposed an Improved SIFT matching method with adaptive matching direction and scale restriction.The experimental result shows that processing time is reduced and false match percentage is improved.In 2013, Saleem and Sablatnig [32] proposed modifications to SIFT descriptors in order to preserve edges in multispectral imaging.They concluded that image matching results can be improved by boosting the contribution of edges in construction process of SIFT descriptors.In 2014, Abdul-Jabbar et al. [16] extended SIFT features by using an Adaptive Principle Component Analysis based on Wavelet Transform (APCAWT), on compressed and denoised ORL database.The main idea to use APCAWT was to reduce the size of face image that entered to SIFT and this leads to good matching results.

Scale Invariant Feature Transform (SIFT)
The SIFT algorithm locates the points in an image which are invariant to scale and shift.These points are represented by orientation invariant feature vector [21].An efficient algorithm can extract a large number of features from the typical images.These features are highly distinctive; hence, a single feature is correctly matched with high probability against a large database of features [18].SIFT features are commonly used for the object recognition and have hardly been used for face recognition.SIFT features are invariant to scale, rotations, translations, and illumination changes [21].The SIFT algorithm has four steps: extrema detection, removal of key points with low contrast, orientation assignment, and descriptor calculation [27].

Scale Space Extrema Detection.
The first stage of key point detection involves the detection of stable features, that is, locations of those key points which are invariant to scale change of the image.The points of interest are implemented by using difference of Gaussian (DoG).The scale space of an image is defined as a function, (, , ), that is produced from the convolution of a variable-scale Gaussian, (, , ), with an input image, (, ) [18]: where * is the convolution operator.Consider The difference of Gaussian (DoG) is calculated as the difference between two filtered images: The DoG is a very efficient function to compute smoothed images by simple image subtraction and also provides the most stable images features as compared to the gradient, Hessian, or Harris corner function [18].
Extrema Detection.In DoG pyramid, each point is compared with its neighboring pixels, that is, on the level as well as on the lower and higher levels, that is, 128 pixels.If the pixel is the maximum or minimum of all the neighboring pixels, it is considered to be a potential key point [27].Then, the localization of the key point is improved, by using a secondorder Taylor series expansion.This gives the true extrema location as [18]  ( where  and its derivatives are evaluated at the sample point and  = (, , )  is the offset from this point.

Low Contrast Key Point
Removal.At this stage, the best key points are chosen by rejecting the points with low contrast and are poorly localized to edge.The DoG pyramid at extrema is given by [18,33]  (x) =  + 1 2 If the value of x is below a threshold value, then this point is excluded [33].The poorly localized extrema are eliminated by using the fact that in these cases there is a large principle curvature across the edge, but a small curvature in the perpendicular direction in the DoG function.The principle of curvature can be computed from 2 × 2 Hessian matrixes,  at the location, and scale of key points [33]: Then, compute the sum of the eigenvalues from the trace of H and their product from the determinant [18]: Journal of Engineering If determinant is negative, the curvature has different signs, so the point is discarded.Let  be the ratio between the largest magnitude eigenvalue and the smaller one, so that  = .Then, The quantity ( + 1)2/ is at a minimum when the two eigenvalues are equal and it increases with  hence if Then, the key point is removed.Here,  is taken as 10 [18].

Orientation Assignment.
Here, on the basis of local image properties, the orientation to the key points is assigned.An orientation histogram is formed from the gradient orientations of sample points within a region around the key point as described in Figure 1.A 16 × 16 square is chosen in this implementation.The orientation histogram has 36 bins covering the 360-degree range of orientations [17].The gradient magnitude, (, ), and orientation, (, ), are precomputed using pixel differences [18]: (, ) = tan −1 (  (,  + 1) −  (,  − 1)  ( + 1, ) − ( ( − 1, )) ) . ( The above equation is used to compute scale, orientation, and location of SIFT features that have been found in images.These features respond strongly to the corners and intensity gradients.Figure 2 describes the calculation of the descriptor.First of all, the image gradient magnitudes and orientations are sampled around the location of key point.This is done by using the scale of the key point to select the level of Gaussian blur for the image.In order to achieve orientation invariance, the coordinates of the descriptor and the gradient orientations are rotated relative to the key point orientation.
The gradients are precomputed for all levels of the pyramid.These are described with small arrows at each sample location [18].The arrows point from the dark to the bright side and the length of the arrows indicates the magnitude of the contrast at the key points [18].

Descriptor Calculation.
The final step descriptors are created.This involves the computation of 16 by 16 neighborhoods of the pixel.At each point of the neighborhood, the gradient magnitudes and orientations are computed.
Then, their values are weighted by Gaussian.The orientation histograms are created for each subregion of size 4 by 4 (16 regions).Finally, a vector containing 128 values is created [27].

Discrete Wavelet Transform (DWT)
The wavelet transform focuses the energy of the signal image into a small number of wavelet coefficients.The timefrequency localization property of wavelets is very good [34].The basic idea behind wavelets is to decompose the signal into four subbands (LL, LH, HL, and HH).LL represents the approximation subband.LH, HL, and HH are the detail subbands.LL is the low-frequency subband.The original image corresponds to the low-frequency components in both vertical and horizontal directions and it contributes to the global description of an image.The subband LL will be the most stable subband, so it can be used for feature representation of an image [35].Figure 3 shows the DWT decomposition of neutral face image.In this work, the LL subband is used in combination with other algorithms for feature extraction.As the size of LL subband is half the size of original image, computation complexity reduces.Also, most of the energy is concentrated in the LL subband which may lead to better recognition.

Proposed Work
In the proposed work, the original SIFT algorithm has been modified to reduce the complexity and to achieve better results than SIFT.It is further combined with DWT and CoC  Journal of Engineering to enhance the results in face recognition under different emotions.By this, computational complexity of MSIFT is also reduced since it uses average instead of normalization.

Descriptors Matching.
In the proposed approach, the matrix of correlation coefficients for each feature vector of descriptors of the test image and database images has been computed in which each row is an observation and each column is a variable.Now, by taking this matrix as an input, the CoC is calculated between the two images to recognize the image.The main advantage of using the CoC is that it provides the result by calculating the correlation between two images without providing any threshold value or any distance ratio which is required in Euclidean distance and Dot product methods, respectively.Hence, the Euclidean distance and Dot product method involve user interaction while CoC is automatic.

Coefficient of Correlation (CoC).
Correlation is a method for establishing the degree of probability that a linear relationship exists between two measured quantities.In 1895, Karl Pearson defined the Pearson product-moment correlation coefficient.Pearson's correlation coefficient is widely used in statistical analysis, pattern recognition, and image processing.For monochrome digital images, Pearson's CoC is defined as [36] CoC where   and   are intensity values of th pixel in 1st and 2nd image, respectively.Also,   and   are mean intensity values of 1st and 2nd image, respectively.The correlation coefficient has the value CoC = 1 if the two images are absolute identical, CoC = 0 if they are completely uncorrelated, and CoC = −1 if they are completely anticorrelated [36].The CoC is a pure number and it does not depend upon the units in which the variables are measured.The Pearson productmoment CoC is a dimensionless index, which is invariant to linear transformations of either variable.It condenses the comparison of two 2D images down to a single scalar [36].

Methodology
The proposed method for the face recognition is described in Figure 4 and the algorithm for the same is as follows.
Algorithm 1.Consider the following steps.
Step 1. Load the test image.
Step 2. Preprocess an image using Gaussian filter.
Step 3. Apply DWT (db1 with 2 levels of decomposition) to get the LL band.
Step 4. Calculate descriptors of the image using MSIFT.
Step 5. Repeat 1-4 steps for image stored in the database.
Step 6. Calculate CoC between the descriptors of test image and database image.
Step 7. Check whether the calculated value is more than the previously stored value; if yes, replace the previously stored value with current calculated value.
Step 8. Go to Step 5 to repeat the same procedure for next image stored in database to retrieve the image with the highest CoC.

Implementation and Results
Implementation of all the techniques has been done in MATLAB software.The JAFFE database has been used for testing as it is the most common database used for facial expression recognition system.The JAFFE database contains total 213 face images of 7 facial expressions, posed by 10 Japanese female models.In this implementation, all images are resized to a uniform dimension of 180 × 200.There are 213 test images which have been compared with 10 images (neutral expression) present in the dataset which are shown in Figure 5.The variations of image 1 which are stored in database under different emotions have been shown in Figure 6.    of face.During experiments, it has been observed that face images with disgust and surprise expressions have very less recognition rate, because they involve the movement of lips, chin, eyebrows, nose, and forehead; hence it becomes difficult to recognize face images of disgust and surprise expressions.It has been observed that APCAWT + SIFT [16] yields maximum recognition rate, that is, 78.87%, as compared to other existing techniques which are shown in Table 1.The results show that existing Modified/Improved SIFT techniques are not providing any excellent results for the recognition of face under different emotions.Also, Figure 7 demonstrates the comparative analysis of existing Modified/Improved SIFT techniques.
The performance of modifications to SIFT technique on 213 images of different expressions compared with only 10 images (neutral expression) is shown in Table 2.The original SIFT [18] yields 89.671% recognition rate.In the experiments, for SIFT recognition, the distance ratio = 0.6 is considered.Although the performance of SIFT and SIFT + CoC is the same, CoC makes the algorithm automatic.The recognition rate is further enhanced by first applying DWT (Haar transform with 2 levels of decomposition) to smooth the image before applying SIFT.Finally, it has been concluded that the proposed technique (DWT + MSIFT + CoC) yields excellent results when different expressions of face are compared with only a neutral expression of face.The CoC has also performed well for matching the descriptors on the same dataset.The DWT has also enhanced performance of SIFT and MSIFT.Also, Figure 8 demonstrates the comparative analysis of the proposed technique with other techniques.

Contributions
(i) The paper proposed a new face recognition method called MSIFT.This algorithm significantly outperforms the other approaches, particularly on the faces under different emotions.
(ii) The paper proposed algorithm that significantly deals with approximately 21 face images with different emotions (happiness, sadness, anger, disgust, and surprise) of one person.Hence, there is no need to store images under different emotions of the same person.In experiment, only 10 face images with neutral expression have been stored and compared with 213 face images which are under different emotions.
Hence, this helps to reduce the size of the database which increases the performance.
(iii) Modified SIFT reduces the computational complexity of original SIFT.
(iv) For matching the images, CoC has been combined with Modified SIFT to improve the results.
(v) Special feature of DWT has been used to get smoothed image.

Conclusion
The new technique has been proposed to recognize faces under different emotions from the database of neutral face images.The proposed technique consists of smoothing of images using DWT, application of MSIFT, and descriptor matching using CoC.The use of CoC has also performed well for matching the descriptors on the same dataset and makes the algorithm automatic.The DWT has also enhanced performance of SIFT and MSIFT.The novelty of the method is reduced database size, as it requires only neutral images to store instead of all the expressions of the same face image.The results demonstrate the superiority of MSIFT as compared to original SIFT algorithm and other SIFT based techniques.

Figure 1 :
Figure 1: (a) The middle point is known as candidate key point and orientations to this point are computed using pixel differences.(b) The value of each bin holds the magnitude sums from all the points computed within that orientation.

Figure 4 :
Figure 4: Proposed model of face recognition technique.

Figure 5 :
Figure 5: Database images (only neutral face images of 10 subjects are stored in database which helps to reduce the stored database).

Figure 6 :
Figure 6: Variations of image 1 of stored database (test database includes total 213 variations of all 10 subjects of stored database, which are compared with only 10 stored database images).

Figure 8 :
Figure 8: Image recognition rate with JAFFE database.
Modified SIFT.After orientation assignment, the gradient magnitude and orientation at each point in the image are computed to create a key point descriptor.A weight is assigned to each sample point with the help of Gaussian weighing function through the Gaussian window.Gaussian window ignores the sudden changes in descriptors by repositioning the window.It also gives less weightage to the gradients that are far from the center of the descriptor.The computation involves each 4 × 4 subregion and orientation histogram is created.The trilinear interpolation is used to distribute the values of each gradient sample into histogram bins.The results are achieved with 4 × 4 array of histogram with 8 orientation bins in each.Hence, the totals of 4 × 4 × 8 = 128 elements feature vectors are used.As the illumination and contrast change affect the performance of recognition algorithm, thus to reduce the effect of the same in original SIFT, descriptors are normalized to unit length, whereas in MSIFT descriptors are reduced by subtracting the average of descriptors from each descriptor.

Table 1 .
It shows the performance of each technique for face recognition, when different expressions of face image (213 images of 10 subjects) are compared with only neutral face image (10 images) present in stored database.It has been observed that although all techniques perform well on their respective databases, they are not well suitable for recognizing the various expression

Table 1 :
[16]arative analysis of existing Modified/Improved SIFT techniques.forrecognition of face images[16], but it does not yield good result when implemented on JAFFEE database to compare different expressions of face image with only face images of neutral expression (to reduce the database).

Table 2 :
Experimental results of recognition of images.Image recognition rate of existing Modified/Improved SIFT techniques with JAFFE database.