Weighted Feature Gaussian Kernel SVM for Emotion Recognition

Emotion recognition with weighted feature based on facial expression is a challenging research topic and has attracted great attention in the past few years. This paper presents a novel method, utilizing subregion recognition rate to weight kernel function. First, we divide the facial expression image into some uniform subregions and calculate corresponding recognition rate and weight. Then, we get a weighted feature Gaussian kernel function and construct a classifier based on Support Vector Machine (SVM). At last, the experimental results suggest that the approach based on weighted feature Gaussian kernel function has good performance on the correct rate in emotion recognition. The experiments on the extended Cohn-Kanade (CK+) dataset show that our method has achieved encouraging recognition results compared to the state-of-the-art methods.


Introduction
Emotion recognition has necessary applications in the real world. Its applications include but are not limited to artificial intelligence and human computer interaction. It remains a challenging and attractive topic. There are many methods which have been proposed for handling problems in emotion recognition. Speech [1,2], physiological [3][4][5], and visual signals have been explored for emotion recognition. Speech signals are discontinuous signals, since they can be captured only when people are talking. Acquirement of physiological signal needs some special physiological sensors. Visual signal is the best choice for emotion recognition based on the above reasons. Although the visual information provided is useful, there are challenges regarding how to utilize this information reliably and robustly. According to Albert Mehrabian's 7%-38%-55% rule, facial expression is an important mean of detecting emotions [6].
Further studies have been carried out on emotion recognition problems in facial expression images during the last decade [7,8]. Given a facial expression image, estimate the correct emotional state, such as anger, happiness, sadness, and surprise. The general process has two steps: feature extraction and classification. For feature extraction, geometric feature, texture feature, motion feature, and statistical feature are in common use. For classification, methods based on machine learning algorithm are frequently used. According to speciality of features, applying weighted features to machine learning algorithm has become an active research topic.
In recent years, emotion recognition with weighted feature based on facial expression has become a new research topic and received more and more attention [9,10]. The aim is to estimate emotion type from a facial expression image captured during physical facial expression process of a subject. But the emotion features captured from the facial expression image are strongly linked to not the whole face but some specific regions in the face. For instance, features of eyebrow, eye, nose, and mouth areas are closely related to facial expression [11]. Besides, the effect of each feature on recognition result is different. In order to make the best of feature, using feature weighting technique can further enhance recognition performance. While there are several approaches of confirming weight, it remains an open issue on how to select feature and calculate corresponding weight effectively.
In this paper, a new emotion recognition method based on weighted feature facial expression is presented. It is motivated by the fact that emotion can be described by facial expression and each facial expression feature has different impact on recognition results. Different from previous works 2 Computational Intelligence and Neuroscience by calculating weight of each feature directly, this method considers impact of feature by calculating subrecognition rate. Our method consists of two stages: weight calculation stage and recognition stage. In the weight calculation stage, we first divide face into 4 areas according to degree of facial behavior changes. Then, we use each area's features to calculate corresponding recognition rate. At last, we calculate weight of each area's features according to magnitude of recognition rate. In the recognition stage, we first use the above weight results to calculate weighted kernel function. Then, we obtain a new recognition model based on SVM with weighted kernel function.
For the proposed method, there are three main contributions and differences compared to the preliminary work.
(1) A more advanced weight of feature method is used. In previous method, the weight of each feature was calculated individually without practical verification. To overcome this shortage, we group features and calculate corresponding subrecognition rate. Then we calculate weight of feature groups based on their respective subrecognition rate. (2) In the recognition stage, the previous method used the weight of features directly. In this paper, we use weight of feature groups to weight kernel function. Then we use new weighted kernel function in machine learning model. (3) The proposed method has been evaluated in a database which contains 7 kinds of emotions. Moreover, comparison results have been carefully analyzed and studied on whether to use weighted kernel function. The rest of the paper is organized as follows: Section 2 gives an overview of related works on feature extraction of facial expression, calculation of weight of feature, and classification of emotion. Section 3 describes the theorem in proposed method and proofs. Section 4 verifies the proposed method by experiment and analyzes experimental results. Section 5 concludes the paper.

Related Work
The recognition performance of motion based methods is highly dependent on the feature extraction methods. Many novel approaches have been proposed for feature extraction based on facial expression. They can be broadly classified into two categories: appearance-based methods and geometric-based methods. The appearance-based methods extract intensity or other texture features from facial expression images. The common methods of feature extraction include Local Binary Patterns (LBP) [12,13], Histogram of Oriented Gradient (HOG) [14,15], Gabor Wavelet [16,17], and Scale-Invariant Feature Transform (SIFT) [18,19]. These features can be used to extract Action Unit (AU) feature and recognize facial expression. The geometric-based methods describe facial component shapes based on key points of facial detected on images, such as eyebrows, eyes, nose, mouth, and contour line. The movement of these key points can be used for guiding the facial expression recognition process. For instance, the active appearance model (AAM) [20] or Active Shape Model (ASM) [21,22] and the constrained local model (CLM) [23] are widely used to detect and trace these key points of face to record their displacement. However, the location accuracy of both ASM and AAM relies on their geometric face models. And the model training phases sometimes need manual works and are usually timeconsuming.
The recognition results obtained by classification algorithm are affected by all features. So the introduction of weight can distinguish the contribution of different features and improve classification performance. A variety of methods have been proposed to calculate the weight of every feature. Reference [24] presented Euclidean metric in the criterion extended to Minkowski metric to calculate weight of each feature directly. Some methods divided the facial image into some uniform subregions and calculated the weight of each subregion. Reference [25] introduced information entropy to distinguish the contribution of different partitions of the face. Reference [26] estimated the weight of each subregion by employing the local variance. For feature weighting in different ways, feature selection and weight calculation might be recognized as a latent problem. One effective method to solve this problem is to perform feature weighting based on the obtained feedback. Some methods [27,28] divided the facial image into some uniform subregions and returned the subregion result for feature weighting. There is no restriction on each feature, which provides freedom on how the feature representations are structured.
Many machine learning methods have been proposed to classify facial expressions, such as SVM [29], Random Forest (RF) [30], Neural Network (NN) [31], and nearest neighbor ( NN) [32]. Reference [33] presented the performance of RF and SVM in classification of facial recognition. Reference [34] used boosting technique for the construction of NNEs and the final prediction is made by Naive Bayes (NB) classifier. Reference [35] divided the region into different types and combined the characteristic of the Fuzzy Support Vector Machine (FSVM) with NN, switching the classification methods to the different types. The studies show that these methods are extremely suitable for facial expression classification.

Linear Support Vector
Machines. SVM is a new supervised learning model with associated learning algorithm for classification problem of data whose ultimate aim is to find the optimal separating hyperplane. The mathematical model of SVM is shown below.
Given a training set { , ⃗ } =1 , where ⃗ ∈ is input and ∈ {−1, +1} is the corresponding output, if there is a hyperplane which can divide all the points ⃗ into two groups correctly, we aim to find the "maximum-margin hyperplane" where the distance between the hyperplane and the nearest point ⃗ from either group is maximized. By introducing the penalty parameter > 0 and the slack variable ⃗ = ( 1 , 2 , . . . , ), the optimal hyperplane can be obtained by solving constraint optimization problem as follows: Computational Intelligence and Neuroscience 3 Based on Lagrangian multiplier method, the problem is converted into a dual problem as follows: where > 0 are the Lagrange multipliers of samples ⃗ . Only a few > 0 are solutions of the problem of removing the parts of = 0, so that we can get the classification decision function as follows:

Nonlinear Support Vector Machines.
For the linearly nonseparable problem, we first map the data to some other high-dimensional space , using a nonlinear mapping which we call Φ. Then we use linear model to achieve classification in new space . Through defined "kernel function" , (2) is converted as follows: And the corresponding classification decision function is converted as follows: The selection of kernel function aims to take the place of inner product of basis function. The ordinary kernel functions investigated for linearly nonseparable problems are as follows: (1) th-degree polynomial kernel function ( ⃗ , ⃗ ) = ( ⃗ ⋅ ⃗ + 1) , = 1, 2, . . .

Weighted Feature SVM.
Weighted feature SVM is based on weighted kernel function of SVM, which is defined as Definition 1.

Definition 1.
Let be a kernel function defined in * , ⊆ . is a linear transformation square matrix of order of given input space, where is dimensionality of input space. Weighted feature kernel function is defined as where is referred to as a weighted feature matrix. The different choices for lead to different weight situation: (1) is an identity matrix of order , which is no weight situation. ] .
We only consider is a diagonal matrix of order in this paper.
Definition 2. The ordinary weighted feature kernel function can be got by (9), and the process is shown as follows: (1) Weighted feature polynomial kernel function ( ⃗ , ⃗ ) = ( ⃗ ⋅ ⃗ + 1) = ( ⃗ ⃗ + 1) , (2) Weighted feature (Gaussian) radial basis kernel function 4 Computational Intelligence and Neuroscience The motivation for introducing kernel function is to search nonlinear model in the new feature space which is obtained by using nonlinear mapping. Matrix appears not to be related to the motivation, since it acts as linear mapping. However, it can be useful in practice, because it can change geometry shape of input space and feature space, thereby changing the weight of different functions in the feature space. And the weighted feature Gaussian basis kernel function is still a nonlinear model after using linear transformation. The conclusion can be proved by Theorem 3.  Proof. From definition of weighted feature kernel function and classification decision function (5), the conclusion is straightforward.
Theorem 3 indicates that changes of location relation between spot and spot lead to changes of geometry shape of feature space after linear transformation. And there may be better linear separating hyperplane in new feature space to improve the classification performance of SVM. Theorem 4 indicates that weighted kernel function can reduce the effects of weak correlation and no-correlative features, and we are looking forward to better classification results. The experiment results in the following section of this article demonstrate this conclusion.

Weight Estimation of Features.
Feature weighting technique based on certain principle gives a weight to various data features where calculating ⃗ is the key element. The changes in facial expression lead to slight different instant changes in individual facial muscles in facial appearance. According to motion range of facial muscles, the whole face can be divided into three kinds of regions: rigid region (nose), semirigid region (eyes, forehead, and cheek), and on-rigid region (mouth). According to the principles above, we divide face into several areas and find out recognition rate (1 ≤ ≤ ) of all the areas where the higher the recognition rates, the greater the influences. Otherwise, the lower the recognition rate, the smaller the influences. Regard weight determination as the base for calculating the value of weight, and the calculation formula is presented as follows: This approach makes 1 + 2 + 3 + 4 = 1.
The area of the highest value of weight has the highest differentiation in the face, although it is also the largest contributor to classification results. Therefore, the higher the value of weight as a correlation measurement index, the stronger the correlation. The four constructing steps of weighted feature SVM are as follows:

Experiment
The experiments on the extended Cohn-Kanade (CK+) dataset show the effectiveness of the proposed method. In our experiments, we use python programs based on LIBSVM software packages, and the platform of data processing is a computer with Windows 7, Intel5 Core6 i3-2120 CPU (3.30 GHz), 4.00 GB RAM.

Extended Cohn-Kanade Dataset.
Lucey et al. [36] presented the CK+ dataset containing 593 sequences from 123 subjects. Each of the sequences incorporates images from onset (neutral frame) to peak expression (last frame). But, only 327 of the 593 sequences were found to meet criteria for one of seven discrete emotions. And, 327 peak frames have been selected and labeled which come together to compose origin facial expression image dataset . The detailed number of images of each discrete emotion is shown in Table 1.

Facial Feature Extraction.
In the paper, we use facial key points of each image as feature points on emotion recognition based on facial expression. Each feature point is expressed as a 2-dimensional coordinate as follows: ( , ). The resolution of each image of dataset is 640 × 490, 640 × 480, or 720 × 480. In order to unify the standard of coordinate system, image preprocessing is used to change the resolution of each image into 640 × 480. Reference [11] proposed the production of emotion, which has brought about facial behavior changes and is strongly linked to not the whole face but some specific areas, such as eyebrows, eyes, mouth, nose, and tissue textures. Besides, a face has different rigidness in different areas. According to the principles above, this paper divides face into 4 areas, which are shown in Figure 1 and corresponding feature vectors are listed as follows.   Above all, we select 59 key points from the eyebrows, eyes, nose, and mouth. Therefore, 118-dimensional facial feature vector ⃗ can be got from each frame where ⃗ = ( ⃗ 1 , ⃗ 2 , ⃗ 3 , ⃗ 4 ).  Table 2. According to the analysis of experimental results in four feature areas, the influence of features of three types of region is different. The nonrigid region has the biggest impact; rigid region has the least while semirigid region has an impact at a fair level.

Experiment Contrast with Different Kernel Function.
We use the previous experiment results and (10) and (16)    got by (7) and (13) Thus we experiment twice under training set and test set with different kernel function, respectively. The number of correctly recognized facial expressions under two kernel functions is shown in Table 3.
Finally, we compare our results with the experiments of two kernel functions, which are all image-based framework and tested on the CK+ dataset. The average precision of WF-SVM which uses weighted feature Gaussian kernel function is 93%, which is higher than SVM that uses standard Gaussian kernel function whose average precision is 83%, as is shown in Table 3. And the recognition rate is better than the previous method for the seven emotions. These confirm the effectiveness of our method. After investigating the reason, we find it can be explained from robustness of machine learning algorithm. This method reduces the influence of weak correlation feature by weighted feature, thus improving the robustness of algorithm.

Conclusion and Future Work
In this paper, we propose an approach of emotion recognition based on facial expression. In our approach, we propose a feature weighting technique since the effect of each feature on recognition result is different. Different from previous works by calculating weight of each feature directly, the facial expression images are divided into some uniform subregions and weight of subregion features is calculated based on their respective subrecognition rate. The experimental results suggest that the approach based on weighted feature Gaussian kernel function has good performance on the correct rate in emotion recognition. But our approach shows a pretty good performance for the dataset with limited head motion. Emotion recognition based on facial expression is still full of challenges in the future.