In this paper, we carry on research on a facial expression recognition method, which is based on modified sparse representation recognition (MSRR) method. On the first stage, we use Haar-like+LPP to extract feature and reduce dimension. On the second stage, we adopt LC-K-SVD (Label Consistent K-SVD) method to train the dictionary, instead of adopting directly the dictionary from samples, and add block dictionary training into the training process. On the third stage, stOMP (stagewise orthogonal matching pursuit) method is used to speed up the convergence of OMP (orthogonal matching pursuit). Besides, a dynamic regularization factor is added to iteration process to suppress noises and enhance accuracy. We verify the proposed method from the aspect of training samples, dimension, feature extraction and dimension reduction methods and noises in self-built database and Japan’s JAFFE and CMU’s CK database. Further, we compare this sparse method with classic SVM and RVM and analyze the recognition effect and time efficiency. The result of simulation experiment has shown that the coefficient of MSRR method contains classifying information, which is capable of improving the computing speed and achieving a satisfying recognition result.

Facial expression is an important way of nonverbal communication [

In the field of image processing, Candes and Wakin indicate that the recovery process of the original image is an optimization problem [

In this paper, we study an expression recognition method by a sparse representation method. Firstly, we use Haar-like+LPP [

This paper uses sparse representation algorithm for facial expression recognition and is divided into two steps [

The problem of sparse solution is as follows:

In facial expression classification, we can use the entire training set as a dictionary. However, the classification efficiency will get lower with the increase of training data. It is sometimes necessary to preprocess the training set data before the classification, such as getting an abstract dictionary by certain dictionary learning method. Dictionary learning is an important part of sparse representation classification. The formula of dictionary learning is as follows [

In formula (

Given a test face expression

The scheme of dictionary learning based sparse expression classification is presented in Figure

Frame of sparse expression representation by dictionary learning [

In this section, we will show the modified sparse representation recognition based on block dictionary learning (LC-LSVD) and fast classification (stOMP) method.

Label consistency based K-SVD dictionary learning algorithm LC-K-SVD [

For algorithm initialization, we need to initialize dictionary

Flow chart of initialization dictionary within class in LC-K-SVD [

Using multiple and ridge regression model to solve

Then, use K-SVD learning algorithm:

For the test face

The optimization problem (

The optimum

In OMP, only a single element is selected in each iteration, so that the algorithm has to be run many iterations as there are nonzero elements to be estimated. This can only be avoided in a greedy algorithm by selecting more than a single element in each iteration. Here we adopt Stagewise OMP (StOMP) [

The iteration process of stOMP based sparse expression classification is shown in Figure

Recognition rate of the sparse classifier increases as iteration goes. A vector is used to store the recognition rates of all features, where “

Using multiple and ridge regression model to get coefficient matrix of linear classifier

Diagram of coding distribution (dataset is Extended Yale B, encoding for the face on the left, coding for the training process) [

In formula (

In this paper, we take the use of block dictionary learning LC-K-SVD algorithm to build up overcomplete dictionary and then use stOMP algorithm to carry on classification process, in order to accelerate the speed of traditional OMP algorithm, combined with antinoise factor. To be convenience, the proposed sparse expression recognition classifier in this paper is renamed MSRR (modified sparse representation recognition). Basic sparse representation, without a dictionary to learn, using OMP method, is named SRC (sparse representation classification).

(

Procedure:

Initialize: dictionary

Sparse coding:

Dictionary updating stage:

Set

K-SVD Dictionary updating: Trained dictionary

End: The change of

Procedure:

Calculate the sparse coding

Object:

Initialize: Set

For

Selecting more than a single element in each iteration:

Update distribution of sparse solution:

Judge termination condition. Comparing the difference between the prior and the last distribution of sparse solution, if

Output sparse solution

Linear classifier:

The diagram of our modified facial expression recognition is shown in Figure

Diagram of facial expression recognition based on our modified sparse method.

In this section, we will validate the classification performance of the proposed sparse representation algorithm on experimental level. In typical and self-built database (infants and children expression database, JAFFE database [

The self-built infant and children expression database we use is originated from the internet and preprocessed. The total number of the collected images is 900, 300 for each class: neutral, happy, and crying. Figure

Part of the images in self-built infant and children expression database.

In this part of the experiment, the images are limited in number. Therefore, in order to ensure the universality of the experimental results, we take 100, 200, 300, 400, 500, 600, and 700 images as training samples, respectively, and adopt the LOO (leave one out) cross-validation approach in each experiments. Use Haar-like+LPP method for feature extraction and dimension reduction and select dimensions 30, 48, 72, 120, 168, 210, 288, 399, 483, 528, 624, and 725 for sparse representation classification and recognition. Figure

Chart of crying expression recognition with different number of training samples and feature dimension.

We can see from the experiment that with the increase of training samples, the classification accuracy of the test sample is also gradually increased. When the test samples reach 600 and 700, the correct classification rate is at a higher level. That means the training sample 600 is sufficient for this experiment. When the training sample is 700 and the feature space dimension reaches 725, the recognition rate drastically reduces. When the training sample is 600 and the feature space dimension reaches 624, the recognition rate drastically reduces. When the training sample is 500 and the feature space dimension reaches 528, the recognition rate drastically reduces. When the training sample is 400 and the feature space dimension reaches 483, the recognition rate drastically reduces. When the training sample is 300 and the feature space dimension reaches 399, we can get similar result. When the training sample is 200 and the feature space dimension reaches 218, the recognition rate drastically reduces. When the training sample is 100 and the feature space dimension reaches 120, we will face the same problem.

We analyze the mathematical model: each column in matrix

As can be seen from the test results, the proportion of feature dimension obtained after dimension reduction for training samples in the number of training samples will affect the final recognition result. When the training sample is 600 and the feature dimension is 72, the recognition rate is 88.5%, which is a relatively high recognition result. When feature dimension comes to 120 and 168, the recognition rates are 88.9% and 88.9%, respectively, which means that the growth is very slow, indicating that the performance of the algorithm has almost reached the limit at this time. When the feature dimension comes to 210 and 288, recognition begins to draw dramatically. The results show that the feature dimension of 70 is sufficient for sparse reconstruction. Further, the best effect recognition rate appears when the feature dimension is about 120.

Meanwhile, in addition to crying expression recognition, we also carry on neutral and happy expression testing. For each expression, we have adopted training samples and feature dimensions that perform the best, as shown in Figure

Expression recognition result for three expressions when training sample is 700 (for each expression, training samples and feature dimensions that perform the best are selected).

Use Haar-like+LPP and PCA method to test the sensitivity of our facial expression recognition method to different feature extraction and dimension reduction methods. Take the crying expression recognition; for example, when the number of training samples is 600, recognition result by two feature extraction methods is shown in Figure

Crying expression recognition results with PCA and Haar-like+LPP methods when the number of training samples is 600.

To test the algorithm’s robustness to noise, the whole test face is added Gaussian random noise, with the steps of variable variance 0.01 and increment from 0 to 0.5. JAFFE database is taken, and parts of the images with noises are shown in Figure

Images after being added to noise (variance from left to right is 0, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5).

When the noise variance increases, the sparsity gets worse, which leads to a significant decline in the rate of recognition accuracy. Add to the test face random Gaussian noise of zero mean and variable variances 0.1 and 0.3, respectively. Six kinds of expression are selected, 180 in total number, with 30 for each type of expression. The training samples are in a sequence of anger, disgust, surprise, neutral, sadness, and fear. Obviously, the 1–30 columns belong to angry expression, the 31–61 columns belong to disgust expression, and so on. The obtained sparse solutions by SRC with Gaussian noises of variance 0.1 and variance 0.3 are shown in Figures

Sparse solution with different noise variances: (a) 0.1; (b) 0.3.

We find in Figure

Then, we repeat the above experiment using modified sparse facial expression recognition method and compare our method with SRC method under different noise variances, as shown in Figure

Recognition effect with different variance: (a) recognition rate; (b) SCI.

As can be seen from Figure

Compute Sparse Concentration Index (SCI) of sparse coefficient for each recognition, as a measure of sparsity. Average indexes of SCI obtained by 35 faces under different variances are in Figure

Compared to the traditional SRC expression recognition, MSRR methods in this paper contain block training processing, reducing discrimination dictionary error, so as to improve the recognition rate of facial expression. Add noise suppression components during solving sparse solution process, so as to enhance the robustness of sparse representation classification. Therefore, when the noise variance is relatively high, there is still a good recognition rate and sparsity.

Select dataset 1 and dataset 2 to carry out experiments, and use SVM, RVM, and MSRR algorithm for classification.

(1) Select dataset 1 to make person-dependent face experiments. This set of experiments is to examine the performance of each algorithm immured from outer influence. We select randomly one image per person per expression from CK database, and the rest are for training samples. The total number for testing samples is 178, and that for training samples is 1050, where it is 150 for each face expression samples.

To analyze the sparsity of solutions by MSRR, we take the face with fear expression as test face. The recognition results by SRC and MSRR are in Figures

Residual values of SRC and MSRR: (a) SRC; (b) MSRR.

Sparse solution of SRC and MSRR: (a) SRC; (b) MSRR, where the

As can be seen from Figure

Result for each classification algorithm (%).

Method | Expression | ||||||
---|---|---|---|---|---|---|---|

Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | Class 6 | Class 7 | |

SVM | 78.71 | 72.07 | 71.22 | 79.00 | 77.85 | 76.00 | 81.56 |

RVM | 81.20 | 76.98 | 76.88 | 82.53 | 83.35 | 77.44 | 82.64 |

MSRR | 88.88 | 86.85 | 86.14 | 87.45 | 88.15 | 83.02 | 89.99 |

Where the time cost of the proposed method is 49 m, SVM costs a total of 1.35 h and RVM training plus classification costs a total of 2.10 h. We can see that the computation time after optimization by our method is reduced, and the time-consuming part of our method is the dictionary learning stage; the classification time costs short time due to selecting more than a single element in each iteration.

(2) Choose dataset 2; person-independent facial expression recognition is more difficult than person-dependent case. In person-independent facial expression recognition, expression information is susceptible to be interfered by facial feature information, leading to a unsatisfying recognition rate. The CK database is divided into 18 parts, where 17 parts are taken for training and the rest for testing. Make sure that every sample can be as test target and at the same time is not in the training sample. The results are achieved by averaging the total 18 recognition results. Take the expression anger as test object, and get the final result by SRC and MSRR, which is shown in Figures

Residual values of SRC and MSRR: (a) SRC; (b) MSRR.

Sparse solution of SRC and MSRR: (a) SRC; (b) MSRR, where the

In Figure

We take the use of SVM and RVM for comparison, and the classification result is shown in Figure

Person-independent facial expression recognition accuracy comparison between MSRR, RVM, and SVM.

In this paper, we study expression recognition by sparse representation method. Firstly, we use Haar-like+LPP to extract feature and reduce dimension. Add block dictionary training mode to LC-K-SVD instead of adopting directly the dictionary from samples. Use stOMP on the classification stage to speed up the convergence rate of the traditional OMP and a dynamic regularization factor to suppress noises and enhance accuracy. In typical and self-built database, we select part of the samples for training and testing, to verify the sensitivity to different training samples, different feature dimensions, different feature extraction, and dimension reduction methods, noises, so as to verify the feasibleness and effectiveness of proposed sparse representation classification method. Further, our method is compared with SVM and RVM to analyze the effect of the recognition algorithm and time complexity. Experimental results show that when the sample size is 600 and the extracted features dimension is about 120, the method can achieve best reconstruction and get a better recognition rate. In addition, the proposed recognition method is not very sensitive to feature extraction methods (Haar-like+LPP or PCA). In case there is feasible feature space dimension, we can get satisfying sparse solution. The proposed method can suppress noises to a certain extent due to the use of dynamic regularization factor but perform not quite well for person-independent facial expression recognition. The above experiments illustrate the feasibility of our sparse representation method, which can be better applied to facial expression analysis and has its own advantages in certain aspects.

The authors declare that there is no conflict of interests regarding the publication of this paper.

This work was supported in part by the National High-Tech R&D Program of China (863 Program) under Grant 2013AA100305, in part by the National Natural Science Foundation of China under Grant 61174090, and in part by the U.S. National Science Foundation’s Beacon Center for the Study of Evolution in Action, under cooperative Agreement DBI-0939454.