Recognition of Emotions Using Multichannel EEG Data and DBN-GC-Based Ensemble Deep Learning Framework

Fusing multichannel neurophysiological signals to recognize human emotion states becomes increasingly attractive. The conventional methods ignore the complementarity between time domain characteristics, frequency domain characteristics, and time-frequency characteristics of electroencephalogram (EEG) signals and cannot fully capture the correlation information between different channels. In this paper, an integrated deep learning framework based on improved deep belief networks with glia chains (DBN-GCs) is proposed. In the framework, the member DBN-GCs are employed for extracting intermediate representations of EEG raw features from multiple domains separately, as well as mining interchannel correlation information by glia chains. Then, the higher level features describing time domain characteristics, frequency domain characteristics, and time-frequency characteristics are fused by a discriminative restricted Boltzmann machine (RBM) to implement emotion recognition task. Experiments conducted on the DEAP benchmarking dataset achieve averaged accuracy of 75.92% and 76.83% for arousal and valence states classification, respectively. The results show that the proposed framework outperforms most of the above deep classifiers. Thus, potential of the proposed framework is demonstrated.


Introduction
Emotion plays an important role in the daily life of human beings. Especially, peoples communicate more easily through emotional expression, and different emotional states can affect people's learning, memory, and decisionmaking. erefore, recognition of different emotional states has wide application prospects in the fields of distance education, medicine, intelligent system, and humancomputer interaction. Emotion recognition has recently been highly valued by researchers and has been one of the most important issues [1].
Emotional recognition can be performed by external features such as facial expressions and voice intonation [2][3][4][5][6]. It can also be performed according to changes of physiological signals such as electroencephalogram. Compared with physiological signals, facial/vocal expressions are easily affected by the external environment and the parameters easily vary in different situations. However, the emotion recognition results from EEG signals are relatively objective due to the fact that physiological signals are hard to be camouflaged. erefore, the studies of associations between EEG activity and emotions have received much attention [7][8][9].
Emotion recognition is essentially a pattern recognition task, and one of the key steps is extracting the emotionrelated features from the multichannel EEG signals. Various EEG features in time domain, frequency domain, and time-frequency domain have been proposed in the past. Time-domain features from EEG can identify characteristics of time series that vary between different emotional states.
e statistical parameters of EEG series, such as mean, standard deviation, and power, were usually employed [10][11][12]. Frantzidis et al. used amplitude and latency of event-correlated potentials (ERPs) as features for emotion recognition [13]. In addition, Hjorth features [14,15], nonstationary index [16], and higher order crossing features [17,18] have also been utilized. Power features from different frequency bands from EEG signals are most popular in frequency domain techniques. e EEG power spectral density (PSD) in alpha (8)(9)(10)(11)(12)(13) band was reported to be significantly correlated with the level of valence [19]. e EEG PSDs in delta (1)(2)(3)(4) and theta (4-7 Hz) extracted from three central channels also contain salient information related to both arousal and valence levels [13]. Based on time-frequency analysis, the Hilbert-Huang spectrum (HHS) [20] and discrete wavelet transform method [21,22] were proposed in the emotion classification tasks. e above research shows that time domain characteristics, frequency domain characteristics, and time-frequency characteristics of EEG signals can provide salient information related to emotional states separately.
Usually, machine learning methods are used to establish emotion recognition models. Samara et al. fused statistical measurements, band power from the β, δ, and θ waves, and high-order crossing of the EEG signal by employing a support vector machine (SVM) as the classifier [23]. Jadhav et al. proposed a novel technique for EEGbased emotion recognition using gray-level co-occurrence matrix-(GLCM-) based features and k-nearest neighbors (KNN) classifier [24]. ammasan et al. applied three commonly used algorithms to classify emotional classes: a support vector machine based on the Pearson VII kernel function (PUK) kernel, a multilayer perceptron (MLP) with one hidden layer, and C4.5 [25]. Recently, various deep learning (DL) approaches were investigated for EEG-based emotion classification. e standard deep belief networks (DBNs) were employed by Wand and Shang to extract features from raw physiological data to recognize the levels of arousal, valence, and liking [26]. In reference [27], two types of deep learning approaches, stacked denoising autoencoder and deep belief networks, were applied as feature extractors for the affective states classification problem using EEG signals. Li et al. designed a hybrid deep learning model that combines the convolutional neural network (CNN) and recurrent neural network (RNN) to extract EEG features [28].
Compared with the traditional machine learning methods, DL has achieved the promising results. However, there still exist two challenges in the multichannel EEG signals based emotion recognition. Firstly, seeing that time domain characteristics, frequency domain characteristics, and time-frequency characteristics of EEG signals contain salient information related to emotional states, naturally the complementarity between different types of features derived from these domain characteristics, respectively, is considered. us, feature extraction and feature fusion of multichannel EEG signals in time domain, frequency domain, and time-frequency domain need to be investigated to achieve better performance. Generally, a simple deep model such as DBN or CNN can abstract the intermediate representations of multichannel EEG features and achieve feature fusion at the feature level [29]. Nevertheless, in view of the fact of the high dimensionality and limited training samples of the physiological data, too many nodes in each layer of the deep network will lead to the model overfitting problem. Secondly, capturing the correlation information between different channels of EEG signal and extracting depth correlation feature, which are ignored by the researchers, needs to be taken into consideration when performing feature fusion using the deep model.
To address the two issues mentioned above, an integrated deep learning framework composed of DBN-GC is proposed in this paper. As a special nerve cell in human brain, glia cell can transmit signals to neurons and other glia cells.
erefore, researchers paid attention to the characteristics of glia cells and applied it to the artificial neural networks [30,31]. In the framework, raw multidomain features are obtained from multichannel EEG signals. en, the intermediate representations of the raw multidomain features are separately extracted by member DBN-GC, in which glia chains work for mining interchannel correlation and help to optimize learning process. Finally, a discriminative RBM is used to obtain the emotion predictions. In the experiment, the effectiveness of our method is validated on the multichannel EEG data in DEAP dataset, which is a widely used for emotion recognition. e rest of this paper is organized as follows. A detailed description of the proposed deep learning framework based on DBN-GC is presented in Section 2.
e experimental results and discussions are reported in Section 3. e last Section 4 briefly concludes the work.

Database.
In this research, the DEAP dataset is used for emotion analysis. DEAP is an open source dataset developed by the research team at Queen's University in Marie, London [32]. It mainly recorded the multimodal physiological signals produced by 32 volunteers under the stimulus of the selected videos. e multimodal physiological signals include the EEG and peripheral physiological signals. Each volunteer needed to watch 40 one-minute long videos. While each video was presented, the EEG and peripheral physiological signals of volunteers were recorded synchronously. It should be noted that the EEG was recorded from 32 sites (Fp1, AF3, F3, F7, FC5, FC1, C3, T7, CP5, CP1, P3, P7, PO3, O1, Oz, Pz, Fp2, AF4, Fz, F4, F8, FC6, FC2, Cz, C4, T8, CP6, CP2, P4, P8, PO4, and O2). Finally, the subjective-ratings of arousal, valence, liking, and dominance on a scale of 1-9 were provided. In this study, we only focus on arousal and valence scales. us, a two-dimensional emotional model (illustrated in Figure 1) can be built, where the two dimensions are arousal and valence, respectively. We divide and label the trials into two classes for valence and arousal, respectively (pleasant: >5, unpleasant: ≤5; aroused: >5, relaxed: ≤5).

Data Preprocessing and Feature Extraction.
In this study, only EEG signals are employed for the emotion recognition. EEG signals recorded with 512 Hz sampling frequency are downsampled to 128 Hz. en, filtering is implemented by a band-pass filter with cutoff frequencies of 4.0 and 45.0 Hz.
In order to make full use of the salient information regarding the emotional states in EEG signals, four types of raw features which characterizing the information in time 2 Computational Intelligence and Neuroscience domain, frequency domain, and time-frequency domain, respectively, are extracted from the EEG signals [32][33][34][35][36], and the detailed description is shown in Table 1. e 14 EEG channel pairs for achieving power differences include Fp2- , and O2-O1. us, the dimension of the feature vector for one instance is 664, and the label for each instance is 2dimensional. For one volunteer, the size of the corresponding data matrix is 40 × 664 (videos/instances × features). For all 32 volunteers, 40 × 32 � 1280 instances are available. e corresponding data matrix of each volunteer was standardized to remove the difference in feature scales.

Improved DBN with Glia Chains.
For emotion recognition tasks, deep learning methods hypothesize that a hierarchy of intermediate representations from the EEG raw features is necessary to characterize the underlying salient information related to different emotional states. Deep belief network, which is composed of many restricted Boltzmann machines in the stacking way, has the strong ability to learn high-level representations benefiting from a deep structure-based learning mechanism with multiple hidden layers.
As shown in Figure 2, the output of the first RBM is used as the input of the second RBM. Similarly, the third RBM is trained on the output of the second RBM. rough this way, a deep hierarchical model can be constructed that learns features from low-level features to obtain the high-level representation.
In view of the fact that there are no interconnections among the neural units of DBN in the same layer, it is hard to exploit the mutual information of different neural units in the same layer. is means that DBN is hard to work for mining interchannel and interfrequency correlation information from multichannel EEG signals in the emotion recognition tasks. Considering this, an improved DBN with glia chains is introduced in this paper. e structure of DBN-GC can be seen from Figure 3. In addition to the two level units of each RBM, there is a group of glia cells represented by stars and linked into a chain structure. Each glia cell is also connected to a unit in the hidden layer of RBM, as shown in Figure 4. ere is no weight between the glia cells and the corresponding hidden units. e effect of all glia cells in the training process can be directly applied to the hidden units, and the outputs of the hidden layer nodes can be adjusted accordingly. rough the connection of glia cells, each glia cell can also transmit activated signal to other glia cells and adjust the glia effect of other glia cells.

Computational Intelligence and Neuroscience
For example, if the output of a hidden unit h 1 is higher than the prespecified threshold, the corresponding glia cell g 1 will be activated and then, a signal is transmitted to the glia cell g 2 . When the signal is passed to g 2 , the glia cell g 2 will be activated, no matter whether or not the output of the hidden unit h 2 reached the prespecified threshold. en glia cell g 2 will produce the second signal to spread. Meanwhile, the signal generated by g 1 will continue to spread. In order to simplify the calculation, all signals produced by glia cells are propagated along the specific direction of the glia chain.
at is, the signals are transmitted from the first glia cell on the chain to the last.
For RBM with a glia chain, the output rule of hidden units is updated as follows: where h * j is the output value of the hidden node j before the output rule is updated, g j is the glia effect value of the corresponding glia cell, α is the weight coefficient of glia effect value, and σ() is the sigmoid function. e weight coefficient α is set manually, which can control the effect of glia effect on the hidden units. h * j can be calculated as where W ij is the connection weight of the visual unit i and the hidden unit j, v i is the state value of the visible unit i, and c j is the bias value of the hidden unit j. Instead of random sampling, activation probability is employed as output for each hidden unit, which can reduce sampling noise and speed up learning. e glia effect value of glia cell g j is defined as where θ is the prespecified threshold, T is an unresponsive time threshold after activation, and β represents the attenuation factor. Every time, the signal produced by an activated glia cell is passed to the next glia cell. e activation of a glia cell will depend on whether the output of the corresponding hidden unit reaches the prespecified threshold θ or whether the previous glia cell conveys a signal to it. Meanwhile, the difference t j between its last activation time and the current time must be less than T. If the glia cell is activated, it will transmit a signal to the next glia cell; otherwise, it will not produce signals, and its glia effect will gradually decay.
After integrating the glia cell mechanism, the learning algorithm of RBM is improved and the pseudocodes of the learning algorithm are listed in Algorithm 1.
e training process of a DBN-GC, which is similar to that of DBN, consists of 2 steps: pretraining and fine-tuning. Glia cell mechanism only acts on the pretraining process. In the pretraining phase, a greedy layer-wise unsupervised method is adopted to train each RBM and the hidden layer's output of the previous RBM is used as the visible layer's input of the next RBM. In the fine-tuning phase, back propagation is performed to fine-tune the parameters of the DNB-GC.

DBN-GC-Based Ensemble Deep Learning Model.
Considering that the raw EEG features in Table 1 may share different hidden properties across different time domain and frequency domain modalities, we proposed a DBN-GCbased ensemble deep learning model which implements a DBN-GC-based network on homogenous feature subset independently. e feature vectors derived from different feature subsets are fed into the corresponding DBN-GC, respectively, and the higher feature abstractions of each feature subset are obtained as the outputs of the last hidden layer in the corresponding DBN-GC. en, a discriminative RBM is built upon the combined higher feature abstractions. e network architecture is illustrated in Figure 5. e ensemble deep learning model consists of three parts: the input layer, five parallel DBN-GCs, and a discriminative RBM. e overall raw EEG feature set in Table 1 can be defined as F 0 , which is split into five nonoverlapped physiological feature subsets: F 1 , F 2 , F 3 , F 4 , and F 5 . e statistical measures from time domain construct subset F 1 , and the multichannel EEG PSDs construct subset F 2 . Another subset F 3 is built by EEG power differences. In view of the heterogeneity of multichannel HHS features in time domain and frequency domain, the HHS features can be grouped into two subsets,   ensemble deep learning framework, which is fed into the input layer. en, the input vector x(F 0 ) is split into five subvectors: , and x(F 5 ). e five subvectors are the input to the corresponding DBN-GC, respectively. e five DBN-GC based deep models are built for learning the hidden feature abstractions of each raw EEG feature subset, and the hidden feature abstractions are described as s 1 (x(F 1 )), s 2 (x(F 2 )), s 3 (x(F 3 )), s 4 (x(F 4 )), and s 5 (x(F 5 )). s i (x(F i )) is the output vector of the last hidden layer of the corresponding DBN-GC i. en, s 1 (x(F 1 )), s 2 (x(F 2 )), s 3 (x(F 3 )), s 4 (x(F 4 )), and s 5 (x(F 5 )) are merged into a vector, which is fed into the discriminative RBM to recognize emotion states.
When building the DBN-GC-based ensemble deep learning model, the five DBN-GCs are trained firstly. An additional two-neuron output layer that corresponds to binary emotions is added when training each DBN-GC. en, the discriminative RBM is built upon the combined higher feature abstractions derived from the five DBN-GCs. To determine the DBN-GCs' hyperparameters, different combinations of hyperparameters are tested and the parameter combination with the minimal recognition error is adopted.

Results and Discussion
In view of the limited sample size in the dataset, crossvalidation techniques are adopted in the experiments. e ensemble deep learning model is trained and tested via 10fold cross-validation technique with a participant-specific style. For each of the 32 volunteers, the corresponding 40 instances are divided into 10 subsets. 9 subsets (36 instances) are assigned to the training set and the remaining 1 (4 instances) is assigned to the test set. e above process is repeated 10 times until all subsets are tested.

Comparison between DNB and DBN-GC.
In order to study the learning performance of DBN-GC, we first use three feature subsets (F 6 , F 7 , and F 8 ) to train DBNs and DBN-GCs, respectively. e three subsets are given as follows: F 6 � F 1 , F 7 � F 2 ∪ F 3 , and F 8 � F 4 ∪ F 5 . e three feature subsets represent time domain characteristics, frequency domain characteristics, and time-frequency characteristics, respectively. A DBN and a DBN-GC are trained by the same feature subset, and they have the same hyperparameters, as shown in Table 2. In addition, the parameters of the DBN and the DBN-GC which share the same feature subset, such as learning rate, are all set to the same value. e six models perform the same emotion recognition task, and the metrics for recognition performance adopt accuracy and F1-score. e detailed recognition performance comparisons on arousal and valence dimensions are illustrated in Figure 6.
Each column represents the statistical results of 32 participants. Figures 6(a) and 6(c) show the classification accuracy and F1-score of arousal dimension. Figures 6(b) and 6(d) show the classification accuracy and F1-score of valence dimension. As we can see from Figure 6, no matter which feature subset is used, the DBN-GC model greatly outperforms the corresponding baseline DNN model with a higher median of accuracy and F1-score and a relatively low standard deviation. In the three DBN-GC models, DBN7 which is built by the feature subset  (1) according to v (1) ∼p (v (1) | h (0) ) for j � 1 : m (for all hidden units, calculate the output value without glia effect) h j (1) * � Σ i W ij v i (1) + c j End for update the glia effect vector g for j � 1 : m (for all hidden units, calculate the output value with glia effect) ALGORITHM 1: Pseudocodes for training the RBM with glia chain.
Computational Intelligence and Neuroscience 5 characteristics of EEG signals can provide salient information regarding emotion states. e results validate that the glia chain can improve the learning performance of the deep structure. rough the glia chains, the hidden layer units in DBN-GC can transfer information to each other and the DBN-GC model can obtain the correlation information between the same hidden layer units. us, the improved DNB model can learn more discriminative features. For EEG-based emotion recognition task, the DNN-GC can mine interchannel correlation and utilize interchannel information, which is often ignored by other emotion recognition studies.

Results of the DBN-GC-Based Ensemble Deep Learning Model.
en, the proposed DBN-GC-based ensemble deep learning model is employed to perform the emotion recognition task. Each parallel DBN-GC in the ensemble deep learning model has 3 hidden layers. e numbers of hidden neuron of each parallel DBN-GC are listed in Table 3.
rough the five parallel DBN-GCs, the samples' feature dimensionality is reduced from 664 to 350. Table 4  x (F 1 ) x (F 2 ) x (F 5 )   [37]. In addition, Li et al. also trained a two-layer DBN to extract high-level features for each channel, and then, a SVM with RBF kernel is employed as the classifier [38]. Wang and Shang presented the DBN-based system that extracted features from raw physiological data and 3 classifiers were built to predict emotion states [26]. In view of that the above studies did not introduce F1-score as the metrics for recognition performance, the average recognition performance of the proposed model is also compared with that of reference [32]. In this reference, Koelstra et al. analyzed the central nervous system (CNS), peripheral nervous system (PNS), and multimedia content analysis features for emotion recognition. Considering the proposed DBN-GC-based method in this paper is based on the EEG signal; Table 4 only lists the recognition results of the CNS feature-based single modality in reference [32]. e DEAP dataset is used in all references in Table 4, and the trials are divided into two classes for valence and arousal, respectively (ratings divided as more than 5 and less than 5) in all references in Table 4. As can be seen from Table 4, the performance of the DBN-GC-based ensemble deep learning model regarding the recognition accuracy outperforms most of the above deep classifiers. Meanwhile, the F1-scores achieved by the proposed model are obviously superior to 0.5830 and 0.5630 reported by reference [32]. e proposed method provides 0.7683 mean recognition accuracy (MRA) on valence, which is lower than the highest MRA reported on valence (0.8141).    Table 3 which are trained by a single-feature subset. is indicates that time domain characteristics, frequency domain characteristics, and time-frequency characteristics of EEG signals should be complementary in emotion recognition, and the proposed method can integrate different types of characteristics effectively.   As can be seen from Figure 7, when the glia effect weight is between 0.05 and 1, the MRA on arousal as well as the MRA on valence fluctuates continuously. e highest MRA on arousal (76.12%) is obtained as the glia effect weight is set to 0.75. For MRA on valence, the higher values will appear when the weight value is close to 0.15 or 0.80. Taking into account these two indicators simultaneously, it is appropriate to set the weight coefficient to 0.80. Figure 8 shows the results of arousal classification and valence classification with different values of the attenuation factor. When the value of attenuation factor is between 0.05 and 0.35, the MRA on arousal as well as the MRA on valence fluctuates greatly. With the attenuation factor increased to 0.4, both the MRA on arousal and the MRA on valence increase rapidly. Once the attenuation factor exceeds 0.5, the two MRAs have been decreasing slowly. us, it is appropriate to set the attenuation factor to 0.40 or 0.50. Figure 9 shows the results of arousal classification and valence classification with different values of glia threshold. Although the highest value of MRA on arousal occurs with the glia threshold set to 0.35, the MRA is more stable when the glia threshold is between 0.65 and 1. For the MRA on valence, its value has been rising slowly when the attenuation factor exceeds 0.25. It is appropriate that the glia threshold is within the range of 0.70 to 0.80.

Conclusions
In this paper, we presented an ensemble deep learning model which integrates parallel DBN-GCs and a discriminative RBM for emotion recognition. e interchannel correlation information from multichannel EEG signals, which is often neglected, contains salient information regarding to emotion states, and the chain structure of glia cells in DBN-GC has the ability in mining interchannel correlation information. In addition, the time domain characteristics, frequency domain characteristics, and time-frequency characteristics of EEG signals should be complementary for emotion recognition, and the ensemble deep learning framework benefits from the comprehensive fusion of multidomain feature abstractions. e reliability of the DBN-GC and the ensemble deep learning framework-based fusion methods is validated by the experiments based on DEAP database.

Data Availability
e DEAP dataset used in our manuscript is a dataset for emotion analysis using electroencephalogram (EEG) and physiological and video signals. e DEAP dataset is available at http://www.eecs.qmul.ac.uk/mmv/datasets/ deap/. Anyone interested in using this dataset will have to print, sign, and scan an EULA (end-user license agreement) and return it via e-mail. en, a username and password to download the data will be provided. e dataset was first presented in reference [32]. DEAP: a database for emotion analysis using physiological signals.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.