Facial Expression Recognition Based on LDA Feature Space Optimization

With the development of artificial intelligence, facial expression recognition has become an important part of the current research due to its wide application potential. However, the qualities of the face features will directly affect the accuracy of the model. Based on the KDEF face public dataset, the author conducts a comprehensive analysis of the effect of linear discriminant analysis (LDA) dimensionality reduction on facial expression recognition. First, the features of face images are extracted respectively by manual method and deep learning method, which constitute 35-dimensional artificial features, 128-dimensional deep features, and the hybrid features. Second, LDA is used to reduce the dimensionality of the three feature sets. Then, machine learning models, such as Naive Bayes and decision tree, are used to analyze the results of facial expression recognition before and after LDA feature dimensionality reduction. Finally, the effects of several classical feature reduction methods on the effectiveness of facial expression recognition are evaluated. The results show that after the LDA feature dimensionality reduction being used, the facial expression recognition based on these three feature sets is improved to a certain extent, which indicates the good effect of LDA in reducing feature redundancy.


Introduction
With the development of the information society, as one of the subjects for human information exchange, the accurate recognition of facial expression will help to improve the efficiency of people's information exchange and to save time and cost. As one of the most primitive ways of information interaction, facial expression has been preserved to this day and plays an irreplaceable role. People's facial expressions can reveal their attitudes to the received information, the emotions toward people or things, and even their inner thoughts. erefore, the technology of facial expression recognition can be applied to various scenarios, including human-computer interaction, as well as the security and medical fields, such as public security control, fraud behavior analysis, and patient emotion analysis. e method of facial expression recognition has become a research hotspot in the field of computer vision.
Facial expression recognition can be roughly described as analyzing the data of face images by computer, extracting the useful feature information for recognition, and using this information to determine the category of the images. However, as the facial features are extracted and constructed by a variety of feature extraction methods, they are often of high dimensions. If the high-dimensional features are directly used in the subsequent recognition tasks, then it will bring great difficulties for the recognition task. e reason is that although the high-dimensional features contain the complete face information, they have a large number of redundant features. As the number of samples increases, the redundant information will greatly increase the computational resource overhead but reduce the performance of the recognition model. When the amount of data cannot support the feature dimension, a "dimension disaster" will occur. e performance of the recognition model drops greatly or even becomes ineffective. erefore, it is a key step to reduce the dimension of high-dimensional features in face recognition technology, which has extremely important practical significance. e accurate recognition of facial expression recognition is inseparable from the extraction of multidimensional features, so redundant features are inevitable. Although the LDA feature dimensionality reduction method has been widely used in the field of face recognition, the previous work had not analyzed the dimensionality reduction effect of different types of feature sets on expressions. Based on the KDEF face public dataset, this study constructs three types of feature sets such as face traditional artificial feature sets, deep learning feature sets, and hybrid feature sets. Furthermore, it performs LDA feature dimension reduction respectively and uses machine learning models to classify the facial expressions.
e experimental results show that dimensionality reduction of hybrid features can achieve the best accuracy of facial expression classification, which shows the importance of multidimensional hybrid feature extraction for machine learning tasks.
In this study, it introduces the positioning of human facial key points, the definition of action units, and the construction of the coding system in Section 2. In Section 3, the process of image correction, feature extraction, feature set construction, and feature dimensionality reduction are introduced in detail. In Section 4, the classification effects of different feature sets under different classification methods and dimensionality reduction methods are compared and analyzed. e conclusions and future studies are presented in Section 5.

Related Work
e key of facial expression recognition is to construct the concise and accurate facial feature descriptors. e method based on the complete face recognition generally uses the traditional feature descriptors, such as Gabor [1], local binary pattern (LBP) [2], and histogram of oriented gradient (HOG) [3], to extract and construct the global feature set of face images. However, in the actual scenes, the face information in the collected images is often incomplete due to various factors, such as the information loss caused by the shooting angle and the face occlusion caused by glasses, masks, or beards. As the method cannot extract enough face information for recognition, it is difficult to improve the recognition accuracy. In this regard, researchers propose a method to extract high-dimensional face features by the key points of the face [4]. e core of this method is to accurately locate the key points of the face and correct them. e research on the facial key points can be traced back to the facial action coding system (FACS) [5]. FACS is developed by Ekman and Friesen to study the correspondence between facial muscle movements and different expressions. It is a system for classifying human facial movements through facial appearance. e movement of the facial muscles is encoded by FACS in accordance with the subtle changes of the face. It is deconstructed into specific action units (AUs) that generate expressions, which are independent but interconnected. e codes of 46 main action units and their definitions are listed in Table 1. e features of these facial AUs and the main regions they control can not only be used to distinguish faces but also to reflect the related facial expressions [6]. Since then, more studies have begun to optimize the key feature descriptors of faces. Belhumeur et al. [7] used the global model to generate local position information as latent variables, derived a Bayesian objective function to optimize these local positions, and used the literature research method [8] to align the faces. Literature [9] introduced a method for constructing facial feature descriptors based on orthogonal difference-local binary pattern (OD-LBP). First, 3 gray level differences are calculated for each orthogonal location by subtracting the 2 nearest neighbors and the center pixel from their respective orthogonal values. Second, the respective differences of gray levels are counted. Finally, a vector is generated by concatenating the binary patterns produced by the two orthogonal groups. Kazemi and Sullivan [10] proposed a general gradient-boosted framework to learn the set of regression trees. is framework used the set of regression trees to estimate the location of face descriptors directly from a sparse subset of pixel intensities and then extracted the face features according to the location.
In the process of facial expression recognition, in order to extract more comprehensive face features, the feature dimension is often too high because of the high complexity of facial image data. erefore, it is usually necessary to reduce the dimension of the extracted features first to reduce the feature redundancy. Principal component analysis (PCA) [11] and linear discriminant analysis (LDA) [12] are widely used because of their simplicity and convenience. Principal component analysis (PCA) is a method based on the Karhunen-Loeve (K-L) transform of statistical theory [13]. Mondal and Bag [14] used PCA to reduce the dimension of the extracted features and combined with the minimum distance classification system to realize face recognition. Zhuang and Guan [15] combined Gabor wavelet and PCA for feature extraction, so as to deal with the full illumination of face images. Linear discriminant analysis (LDA) obtains the optimal projection direction by finding the extreme value of the Fisher criterion function, so that after the sample is projected in this direction, the interclass dispersion is the largest and the intraclass dispersion is the smallest. Fisher's criterion function was first proposed by Fisher in 1936. It was originally used to solve the problem of binary classification and then extended to multiclassification. Bodini et al. [16] combined the effectiveness of deep convolution neural network (DCNN) feature with the feature dimensionality reduction capability of LDA to solve the problem of face recognition for a single person and single sample. Benouareth [17] proposed a facial feature extraction method that combines likelihood sufficient dimension reduction (LSDR) and LDA to handle the face recognition under pose and illumination changes.
Although the LDA feature dimensionality reduction method has been widely used in the field of face recognition, the impact of LDA feature dimensionality reduction on the face recognition under different feature sets has not been evaluated. In this regard, according to the KDEF face public data set, this study conducts a comprehensive analysis for the effect of facial expression recognition based on LDA feature dimensionality reduction, including traditional artificial features, deep learning features, and the hybrid features. First, multidimensional feature extraction is performed. For traditional artificial features, we extract face AU information according to the facial action coding system and construct a 35-dimensional AU feature set. For deep learning features, we use the facial feature extractor based on FaceNet pretraining to extract the deep facial features and construct a 128-dimensional feature set. Second, LDA dimension reduction is performed on the extracted features respectively. en, machine learning models, such as Naive Bayes and decision trees, are used to analyze the effects of facial expression recognition before and after LDA feature dimensionality reduction. Finally, we compare the effects of PCA and LDA feature dimensionality reduction methods on facial expression recognition. e specific workflow is shown in Figure 1.

Face Multidimensional Feature Extraction.
In order to fully express the face features, we use different feature extraction methods to extract two aspects of face features according to the KDEF dataset: one is the AU feature based on the artificial methods and the other is the deep feature based on the deep learning.
OpenFace describes the recognizable AUs in two scoring methods: (1) existence: if the AU is visible on the face, then 0 is absent and 1 is present; (2) intensity: if the AU has the intensity score on the interval [0, 5], then 0 is absent and 5 is present at the maximum intensity. OpenFace is able to judge the existence of 18 recognizable AU subsets and score the intensity of 17 AUs except for AU28. erefore, each face image can form 35-dimensional AU features, as shown in Figure 2.

Face Deep Feature Extraction.
In order to enable the deep learning model to focus on the face and better characterize the face, we use the method described in literature [10] to extract the key points from the dataset and correct the face according to the key points. It uses the gradient boosting decision tree (GBDT) regression method to accurately identify the key points of face. One GBDT is composed of multiple trees. rough these trees, the initial face key points are gradually returned to the real face key points. e specific implementation steps are as follows: (1) ere are generally two ways to select the initial face key points. e first one is to take the mean value of real key points of all images as the initial face key points. e second one is to randomly select the real key points of other images as the initial key points of the current image. is study chooses the latter method to select the initial face key points.
(2) When a tree is constructed and multiple images are input into the current tree, each image will fall into a leaf node. Calculate the differences between the current face key points and the real face key points of each image, and take the mean value of the differences of all images in the same leaf node as the residual saved by the leaf node. In that way, the last current face key points can be used as the real face key points.
According to the above algorithm, we extract 68 face key points for each face image and use these key points to perform the affine transformation correction on the face, so that the corners of the eyes and the nose are close to the horizontal position. It is shown in Figure 3. After correcting the face image, we use the face feature extractor in OpenFace to extract deep face features. is feature extractor is based on the pretraining deep learning model FaceNet [19], which is shown in Figure 4.
OpenFaces uses 500,000 images to train FaceNet, including the two largest labeled face recognition datasets, CASIA-WebFaces [20], and FaceScrb [21]. It can extract the 128-dimensional deep feature of efficient facial feature expression.

Face Feature Dimensionality Reduction.
Due to the diversity of facial expressions, recognizing facial expressions is a multiclassification task, so it needs to use the multiclassification promotion form of LDA. For the problem of multiclassification, we assume the dataset c, in which any sample x i is an n-dimensional vector, and y i is its category, y i ∈ n 1 , n 2 , ..., n k , with a total of k categories. N j (j � 1, 2, 3, ..., k) is defined as the number of Class j sample, X j (j � 1, 2, 3, ..., k) is defined as the set of Class j sample, μ j (j � 1, 2, 3, ..., k) is defined as the mean vector of Class j sample, and j (j � 1, 2, 3, ..., k) is defined as the generalized covariance matrix of Class j sample.
Assuming that the matrix formed by the projected hyper-plane basis is W � (w 1 , w 2 , ..., w d ), then the optimization objective is as follows: where S w is the intraclass scatter matrix: S b is the interclass scatter matrix: In equation (3), Since J(W) is a matrix, the diagonal element product is used to replace the matrix for optimization: In equation (5), the right side of J(W) is the continuous multiplication form of the generalized Rayleigh quotient. It only needs to obtain the maximum value one by one and then J(W) will obtain the maximum value. As the maximum value of the Rayleigh quotient w T S b w/w T S w w is the maximum eigenvalue of S −1 w S b , the matrix W formed by the eigenvectors corresponding to the product of the largest d eigenvalues is the mapping result after feature dimensionality reduction.

Introduction of Machine Learning Classification Models.
In order to evaluate the influence of the feature set on the face recognition after LDA dimensionality reduction, we rely on the several machine learning methods for facial expression recognition, including Naive Bayes classification model (NB), the decision tree classification model (DT), K-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM). Among them, the Naive Bayes model [22] outputs the probability that a sample belongs to a specific category by predicting the membership probability of the samples and each category. It is assumed that S � (s 1 , s 2 , ..., s n ) is an item to be classified and each s i is a feature attribute of S. According to the existing category set C � (c 1 , c 2 , ..., c m ), we can calculate the probability of all categories under the characteristic condition of S and select the category with the highest probability as the category label of S. In this way, we can obtain the equation definition of Naive Bayes classifier as shown in the following equation: In equation (6), V(S) is the category label of S, p(S) is a constant for all categories, and p(c i ) is the category prior probability. p(s 1 |c i ), p(s 2 |c i ), ..., p(s n |c i ) is the conditional probability of each feature attribute in S under the condition of category c i . All of these can be obtained from the training sample sets.
e decision tree classification model [23] consists of three parts: root node, internal node, and leaf node. e root node is the starting point of decision-making, and a decision path can be formed from the root node to a leaf node. Each of the internal nodes contains a feature that can divide the data. Each leaf node corresponds to a category. We use the classic algorithm ID3 to build a decision tree model. We suppose that S is the set of training samples, |S| is the number of training samples, and A is any feature of the sample.

Computational Intelligence and Neuroscience
When the samples are divided into n different classes C 1 , C 2 , . . ., C n , and the number of samples in these classes is marked as |C 1 |, |C 2 |, . . ., |C n |, then the probability that any sample S belongs to class C i is as follows: en, the total information entropy is as follows: e information entropy of samples divided in accordance with Feature A is as follows: e information gain of Feature A on the set S is as follows: In equation (9), A is the sum of all possible values v of attribute A, S v is the subset of sample features A with v value in S, and |S v | is the number of samples in S v . According to the theory of information theory, the ID3 algorithm uses "information gain" to measure uncertainty; that is, the greater the information gain, the smaller the uncertainty and the better the division effect.

Experimental Results and Analysis
e experimental conditions in this study are CPU: Intel(R) Core (TM) i5-10210U, RAM: 16.0 GB, and the programming language is Python 3.6.

Datasets and Evaluation Metrics.
e experiments in this study are based on the KDEF public face dataset [24]. e KDEF dataset was originally used for psychological and medical experiments on perception, attention, emotion, memory, etc. It contains the facial information of 70 people, 35 males and 35 females, who are between the ages of 20 and 30. ey have no beards, earrings, or glasses and have removed their makeup as far as possible before shooting. In the process of collecting the dataset, all the subjects wore special gray T-shirts, and the light was soft indirect light, evenly distributed on both sides of the face. Seven different expressions are collected for each person, including surprise, sadness, happiness, etc., and each expression has the color images from 5 angles.
e entire dataset has a total of 4900 images, each with a size of 562 * 762 pixels. Among them, there are 2916 photos that can identify the frontal and half-side face. Some samples are shown in Figure 5.
We randomly select 10% of the identifiable samples as the test set and the rest as the training set. e recognition effect is measured by the multiclassification evaluation metrics, including microprecision (micro-P), microrecall (micro-R), micro-F1, macroprecision (macro-P), macrorecall (macro-R), and macro-F1, which is shown in equations (11)- (16), where N denotes the number of categories, TP is true positive, FP is false positive, and FN is false negative.

Analysis of LDA Dimensionality Reduction Effect for Facial Expression Recognition.
In order to test the effect of LDA feature dimensionality reduction on the accuracy of facial expression recognition, we respectively extract the AU feature and the 128-dimensional feature from the images in the dataset as two different feature sets and regard the hybrid feature set as the third feature set. For the three feature sets, LDA was used to reduce the dimensions to 7 dimensions, and the training set of the machine learning method is used to test its classification effect. erefore, the experiment can be divided into three parts: analysis of LDA dimensionality reduction effect based on AU features, analysis of LDA dimensionality reduction effect based on 128-dimensional face deep features, and analysis of LDA dimensionality reduction effect based on hybrid features.

Analysis of LDA Dimensionality Reduction Effect Based on AU Features.
e facial AU feature is obtained by dividing the face into regions and counting the influence of each expression on different regions. It contains rich expression information, but it may also have redundancy that affects the recognition results. erefore, we adopt several machine learning methods for facial expression recognition on basis of the AU feature and compare the recognition effects before and after LDA feature dimensionality reduction. e results are listed in Table 2.
Macro-F1 is taken as an example in Table 2. After the LDA dimensionality reduction, the macro-F1 of the NB and DT increased, while that of KNN, RF, and SVM decreased.
is shows that there is some redundancy in the original AU feature set. 6 Computational Intelligence and Neuroscience

Analysis of LDA Dimensionality Reduction Effect Based on 128-Dimensional Deep
Features. e feature extraction based on the deep learning model can perform a higher-level abstract expression. In contrast to the local AU features, the semantic information of these features is richer, but the detailed location information is relatively brief. e facial expression recognition effect of 128-dimensional deep features and the dimensionality reduction effect of LDA are listed in Table 3.
It can be seen from Table 3 that the macro-F1 of facial expression recognition based on the original face 128-dimensional deep features is low. After LDA dimensionality reduction, the macro-f1 of all the methods is significantly improved except for the RF method. e reason is that most of the original 128-dimensional deep features are facial semantic information, which has little effect on the facial expression recognition that requires detailed information. erefore, the recognition effect before dimensionality reduction is low. LDA can project the feature to the largest distance between classes according to the label, which can also be regarded as mapping the 128-dimensional deep features to the feature space that can conduct the facial expression recognition. Due to the high expressive ability of deep features, these features can better express facial expression information after dimensionality reduction. erefore, compared with face AU features, the 128-dimensional deep features can achieve higher recognition accuracy after dimensionality reduction.

Analysis of LDA Dimensionality Reduction Effect Based on Hybrid Features.
We spliced local face AU features and 128-dimensional deep features containing more global semantic information to obtain the hybrid features. e facial expression recognition effect of hybrid features is listed in Table 4.
As listed in Table 4, compared with the above single feature set, the hybrid features can achieve higher recognition accuracy before and after LDA dimensionality reduction. After dimensionality reduction, the macro-F1 of the NB, DT, KNN, RF, and SVM methods increased by 19.22%, 18.04%, 15.10%, 3.91%, and 11.83%, respectively. e reason is that the hybrid features contain multidimensional face features, namely, local face AU features and global 128-dimensional deep semantic features. It enables the model to be fully learned, so as to achieve the higher recognition accuracy.
Combining Tables 2-4, it can be seen that no matter which feature set is used, LDA dimensionality reduction can effectively improve the recognition accuracy of the two machine learning methods. e main reason is that LDA feature dimensionality reduction increases the correlation between features and categories; that is, the class spacing is maximized while the intraclass dispersion is minimized, so that the subsequent model can better learn the interclass differences, resulting in the higher recognition evaluation matrix. e dimensionality reduction efficiencies of LDA for  Computational Intelligence and Neuroscience     It can be seen from Figures 6-10 that for different types of feature sets and different machine learning methods, the accuracy of LDA feature dimensionality reduction for face recognition is higher than that of PCA method. Furthermore, the standard deviation of the correct samples identified by LDA is generally lower than that of PCA method, which indicates that LDA feature dimensionality reduction can make the feature be of greater category expression ability. Combining the ACC without dimensionality reduction in Tables 2-4, it can be seen that the accuracy of face recognition after PCA dimensionality reduction is even lower than that before dimensionality reduction. e reason is that PCA is an unsupervised data dimensionality reduction method. As this method does not use category information, it cannot obtain the effective dimensionality reduction effect for complex face feature sets. LDA is a supervised data dimensionality reduction method. It can be label oriented and select the vector that makes the Fisher criterion function reach the extreme value as the best projection direction, so that the sample can achieve the largest interclass dispersion and the smallest intraclass dispersion after being projected in this direction.

Conclusion and Discussion
e accurate recognition of facial expression recognition is inseparable from the extraction of multidimensional features, but there is often a lot of redundancy in multidimensional features. Based on the KDEF face public dataset, this study evaluates the effect of LDA feature dimensionality reduction method on facial expression recognition. First, the extraction of multitype features is performed on the face, including 35-dimensional face AU features based on artificial construction, 128-dimensional face deep features based on the FaceNet pretraining model, and 163-dimensional hybrid features. Second, LDA was used to reduce the dimensions of three different feature sets to 7 dimensions. en, the classical machine learning methods are used to evaluate the effect of facial expression recognition before and after feature dimensionality reduction. Finally, we compare the LDA and PCA feature dimensionality reduction methods. e experimental results show that after the LDA feature dimensionality reduction, the facial expression recognition effects of the three feature sets have been improved to a certain extent, which are better than the PCA method. e dimensionality reduction of hybrid features has the best effect on improving the effect of facial expression recognition, with the macro-F1 reaching 82.90%. In addition, we found that no matter whether the feature dimensionality reduction is carried out or what machine learning method is used, the hybrid features can achieve a higher recognition accuracy than a single one. It shows that it is important to extract the multidimensional hybrid features in facial expression recognition.
As this study only compares the same number of dimensions when reducing the dimension, it may need to dynamically adjust the feature dimensions for different feature sets. Hence, in the future, we will explore the effect of the feature dimensionality reduction under the optimal number of dimensions. e research on the effect of feature dimensionality reduction will be further expanded from multiple directions, such as more machine learning methods, more feature dimensionality reduction methods, and more different types of datasets.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares that there are no conflicts of interest.