This paper introduces a method for feature extraction and emotion recognition based on empirical mode decomposition (EMD). By using EMD, EEG signals are decomposed into Intrinsic Mode Functions (IMFs) automatically. Multidimensional information of IMF is utilized as features, the first difference of time series, the first difference of phase, and the normalized energy. The performance of the proposed method is verified on a publicly available emotional database. The results show that the three features are effective for emotion recognition. The role of each IMF is inquired and we find that high frequency component IMF1 has significant effect on different emotional states detection. The informative electrodes based on EMD strategy are analyzed. In addition, the classification accuracy of the proposed method is compared with several classical techniques, including fractal dimension (FD), sample entropy, differential entropy, and discrete wavelet transform (DWT). Experiment results on DEAP datasets demonstrate that our method can improve emotion recognition performance.
Emotion plays an important role in our daily life and work. Real-time assessment and regulation of emotion will improve people’s life and make it better. For example, in the communication of human-machine-interaction, emotion recognition will make the process more easy and natural. Another example, in the treatment of patients, especially those with expression problems, the real emotion state of patients will help doctors to provide more appropriate medical care. In recent years, emotion recognition from EEG has gained mass attention. Also it is a very important factor in brain computer interface (BCI) systems, which will effectively improve the communication between human and machines [
Various features and extraction methods have been proposed for emotion recognition from EEG signals, including time domain techniques, frequency domain techniques, joint time-frequency analysis techniques, and other strategies.
Statistics of EEG series, that is, first and second difference, mean value, and power are usually used in time domain [
Other features extracted from combination of electrode are utilized too, such as coherence and asymmetry of electrodes in different brain regions [
Some other strategies such as utilizing deep network to improve the classification performance have also been researched. Zheng and Lu used deep neural network to investigate critical frequency bands and channels for emotion recognition [
EMD is proposed by Huang et al. in 1998 [
EMD is a good choice for EEG signals and we utilize it for emotion recognition from EEG data. Which feature is effective for emotion recognition in EMD domain? Which IMF component is best for classification? Is the performance based on EMD strategy better compared to time domain method and time-frequency method or not? All these have not been researched yet and we investigate them in our research.
EMD has been widely used for seizure prediction and detection, but for emotion recognition based on EMD, there is not so much research. Higher order statistics of IMFs [
In this paper, we present an emotion recognition method based on EMD. We utilize the first difference of IMF time series, the first difference of the IMF’s phase, and the normalized energy of IMF as features. The motivation of using these three features is that they depict the characteristics of IMF in time, frequency, and energy domain, providing multidimensional information. The first difference of time series depicts the intensity of signal change in time domain. The first difference of phase measures the change intensity in phase and normalized energy describes the weight of current oscillation component. The three features constitute a feature vector, which is fed into SVM classifier for emotional state detection.
The proposed method is studied on a publicly available emotional database DEAP [
To realize emotional state recognition, the EEG signals are decomposed into IMFs by EMD. Three features of IMFs, the fluctuation of the phase, the fluctuation of the time series, and the normalized energy, are formed as a feature vector, which is fed into SVM for classification. The whole process of the algorithm is shown in Figure
Block diagram of the proposed method.
DEAP is a publicly available dataset for emotion analysis, which recorded EEG and peripheral physiological signals of 32 participants as they watched 40 music videos. All the music video clips last for 1 minute, representing different emotion visual stimuli, with grade from 1 to 9. Among the 40 music videos, 20 are high valence visual stimuli and 20 are low valence visual stimuli. The situation is exactly the same for arousal dimension. After watching the music video, participants performed a self-assessment of their levels on arousal, valence, liking, dominance, and familiarity, with ratings from 1 to 9. EEG was recorded with 32 electrodes, placing according to the international 10-20 system. Each electrode recorded 63 s EEG signal, with 3 s baseline signal before the trial.
In this paper, we used the preprocessed EEG data for study, with sample rate 128 Hz and band range 4–45 Hz. EOG artefacts were removed as method in [
Each music video lasts for 1 minute, and 5 s EEG signals are extracted as a sample. So for each subject who watched 40 music videos, we acquire 480 labeled samples.
EMD decomposes EEG signals into a set of IMFs by an automatic shifting process. Each IMF represents different frequency components of original signals and should satisfy two conditions: Set Get local maximum and minimum of Interpolate the local maximum and minimum with cubic spline function and get the upper envelope Calculate the mean value of the upper and lower envelope as Subtract If
If
Go to step
By the iterative process described above,
EEG signals and the corresponding first five IMFs.
In this paper, three features of IMF are utilized for emotion recognition, the first difference of time series, the first difference of phase, and the normalized energy. The first difference of time series depicts the intensity of signal change in time domain. The first difference of phase reveals the change intensity of phase, representing the physical meaning of instantaneous frequency. Normalized energy describes the weight of current oscillation component. The motivation of using these three features is that they depict the characteristics of IMF in time, frequency, and energy domain, utilizing multidimensional information.
The first difference of times series
Based on EMD, EEG is decomposed into multilevel IMFs, each IMF being band-limited and representing an oscillation component of original EEG signals. For an
The analytic signal can be further expressed as follows:
First difference of phase
For an
The extracted features are fed into SVM for classification. SVM is widely used for emotion recognition [
In the following subsections, we test our method on DEAP emotional dataset. Training and classifying tasks were conducted for each subject independently and we utilized leave-one-trail-out validation to evaluate the classification performance. Each subject watched 40 music video clips, and every video clips lasted 1 minute. In our experiment, we utilized the participants’ self-assessment as label. Every 5 s EEG signals are extracted as a sample, so for each subject we acquire 480 labeled samples.
In leave-one-trail-out validation, for each subject, 468 samples extracted from 39 trails were assigned to training set, and 12 samples extracted from the remaining one trail were assigned to test set. So there was no correlation between samples in the training set and the test set. Among the total 40 trails of one subject, each trail will be assigned to the test set once as the validation data. The 40 results from the 40 test trails then can be averaged to produce a general estimation for each subject. The final mean accuracy is computed among all the subjects.
In order to evaluate the effectiveness of the three features for emotion recognition, we first use only one single feature for classification each time. All the experiments in this subsection are under the condition that the first five IMF components and total 32 electrodes are utilized for feature extraction. The training and classifying for each subject were conducted, respectively, and the mean accuracy was computed among all the subjects.
The mean classification accuracies of three features are given in Figure
Classification accuracies of three single features. For each subject, one single feature was extracted from the first five IMF components. “
In this subsection, we did two experiments to investigate the role of different IMF components in emotion recognition. In the first experiment, each time only one IMF component was utilized for feature extraction and we analyzed which IMF is effective for emotion recognition. In the second experiment, we further verified whether the combination of multi-IMFs would improve the accuracy.
Table
It shows that IMF1 yields the best performance, 70.41% for valence and 72.10% for arousal. As the level increases, the performance decreases sharply. The performance of IMF5 is only 55.74% for valence and 62.38% for arousal. We applied
Comparison of performance for different IMFs selected for feature extraction (32 channels) (standard deviation shown in parentheses).
Component | Valence | Arousal | ||
---|---|---|---|---|
Accuracy (%) |
|
Accuracy (%) |
|
|
IMF1 |
|
|
|
|
IMF2 | 63.47 (7.10) |
|
66.58 (9.36) |
|
IMF3 | 61.45 (8.57) |
|
64.56 (10.52) |
|
IMF4 | 59.55 (8.56) |
|
63.99 (10.96) |
|
IMF5 | 55.74 (9.20) |
|
62.38 (12.23) |
|
IMF1-2 | 69.02 (7.00) |
|
70.47 (8.29) |
|
IMF1–3 | 68.47 (6.69) |
|
70.08 (8.10) |
|
IMF1–4 | 67.99 (6.58) |
|
69.60 (8.08) |
|
IMF1–5 | 67.59 (6.58) |
|
69.00 (8.37) |
|
Performance of 8 channels selected for feature extraction (Fp1, Fp2, F7, F8, T7, T8, P7, and P8) (standard deviation shown in parentheses).
Predict | Label | |||
---|---|---|---|---|
Valence | Arousal | |||
High | Low | High | Low | |
High | 6664 | 2723 | 7493 | 2748 |
Low | 2024 | 3949 | 1555 | 3564 |
|
0.7374 | 0.7769 | ||
Accuracy (%) | 69.10 (6.95) | 71.99 (7.77) |
IMF1 represents the fastest changing component of EEG signals, with the highest frequency characteristic. As the level increases, the oscillation becomes smoother with frequency becoming lower and lower. So we infer that the valence and arousal of emotion relate more tightly to high frequency. It is also coincided with the finding in [
So combining the results of classification accuracy and
Form verification in Section
Fisher distance is an efficient criterion of divisibility between two classes, which is broadly used in pattern recognition. It computes the ratio of between-class scatter degree and within-class scatter degree between two classes. Larger ratio means larger divisibility of the two classes. In our experiment, we used fisher distance to mark important electrodes under condition that IMF1 is used for feature extraction. For each channel, fisher distance is calculated among features extracted from one subject’s total 480 labeled emotion samples.
Figure
Fisher distance of different channels with subject 1. Features are extracted from component IMF1. For each channel, Fisher distance is calculated among features extracted from 480 labeled emotion samples of subject 1. (a) Fisher distance under feature
Based on the analysis of all the subjects, we selected the following 8 electrodes Fp1, Fp2, F7, F8, T7, T8, P7, and P8 for channel reduction verification. Table
So in practical use, we just need to extract features from IMF1 with 8 channels. Our offline experiment used every 5 s EEG signals as a labeled emotion sample. This infers that our method may provide a new solution for real-time emotion recognition in BCI systems.
In this subsection, we compared our proposed method with some classical methods, including fractal dimension (FD), sample entropy, differential entropy, and time-frequency analysis DWT. We used box counting for fractal dimension calculating. The parameter for sample entropy
From Figure
The mean accuracy of different kinds of methods (Fp1, Fp2, F7, F8, T7, T8, P7, and P8) (standard deviation shown in parentheses; statistical analysis shown in column
Methods | Valence | Arousal | ||
---|---|---|---|---|
Accuracy (%) |
|
Accuracy (%) |
|
|
Fractal dimension | 53.08 (19.14) |
|
59.61 (20.28) |
|
Sample entropy | 57.44 (11.66) |
|
62.96 (13.82) |
|
DWT + differential entropy (Beta) | 60.87 (11.74) |
|
64.66 (11.59) |
|
DWT + differential entropy (Gamma) | 67.36 (6.61) |
|
68.55 (9.28) |
|
Our method | 69.10 (6.95) |
|
71.99 (7.77) |
|
Classification accuracies of different methods. “FD,” “SampEn,” and “DE” in the figure are corresponding to fractal dimension, sample entropy, and differential entropy, respectively. The mean accuracy was computed among all the subjects. Error bars show the standard deviation of the mean accuracies across all subjects.
EMD strategy outperforms time domain method, including fractal dimension and sample entropy. This is because compared to methods in time domain, EMD has the advantage of utilizing more oscillation information. Compared to time-frequency method DWT, EMD can decompose EEG signals automatically, getting rid of selecting transform window first. The classification accuracy is also higher than DWT. So the experiment results infer that our method based on EMD strategy is suitable for emotion recognition from EEG signals.
Emotion recognition from EEG signals has achieved significant progress in recent years. Previous methods are usually conducted in time domain, frequency domain, and time-frequency domain. In this paper, we propose a method of feature extraction for emotion recognition in EMD domain, a new aspect of view. By utilizing EMD, EEG signals can be decomposed into different oscillation components named IMF automatically. The characteristics of IMF are utilized as features for emotion recognition, including the first difference of time series, the first difference of phase, and the normalized energy.
Compared to methods in time domain, EMD has the advantage of utilizing more frequency information. The experiment results show that the proposed method outperforms method in time domain, such as fractal dimension in [
We investigate the role of each IMF in emotion classification. Features extracted from IMF1 yield the highest accuracy. IMF1 is corresponding to the fastest changing component of EEG signals, so our study confirms the deduction that emotion is more relative to high frequency component. This consists with findings in [
Finally, we selected 8 informative channels based on EMD strategy, namely, FP1, FP2, F7, F8, T7, T8, P7, and P8. Our proposed method just needs to extract features from IMF1 with 8 channels, which will save time and relieve computation burden. Also in our experiment, every 5 s EEG signals are extracted as a sample, so it may provide a new solution for real-time emotion recognition in BCI systems.
Our limitation is that now we just test it on DEAP dataset, so in the future we want to experiment it on more emotional datasets to verify the method comprehensively. Also we will utilize more strategies such as feature smoothing and deep network to improve the classification accuracy.
In this paper, an emotion recognition method based on EMD using three statistics is proposed. An extensive analysis has been carried out to investigate the effectiveness of the features for emotion classification. The results show that the three features are suitable for emotion recognition. Then the effect of each IMF component is inquired. The results reveal that, among the multilevel IMFs, the first component IMF1 plays the most important role in emotion recognition. Also the informative channels based on EMD strategy are investigated and 8 channels, namely, FP1, FP2, F7, F8, T7, T8, P7, and P8, are selected for feature extraction. Finally, the proposed method is compared with some classical methods and our method yields the highest accuracy.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was supported by the grant from the National Natural Science Foundation of China (Grant no. 61701089).