EEG Feature Extraction and Data Augmentation in Emotion Recognition

Emotion recognition is a challenging problem in Brain-Computer Interaction (BCI). Electroencephalogram (EEG) gives unique information about brain activities that are created due to emotional stimuli. This is one of the most substantial advantages of brain signals in comparison to facial expression, tone of voice, or speech in emotion recognition tasks. However, the lack of EEG data and high dimensional EEG recordings lead to difficulties in building effective classifiers with high accuracy. In this study, data augmentation and feature extraction techniques are proposed to solve the lack of data problem and high dimensionality of data, respectively. In this study, the proposed method is based on deep generative models and a data augmentation strategy called Conditional Wasserstein GAN (CWGAN), which is applied to the extracted features to regenerate additional EEG features. DEAP dataset is used to evaluate the effectiveness of the proposed method. Finally, a standard support vector machine and a deep neural network with different tunes were implemented to build effective models. Experimental results show that using the additional augmented data enhances the performance of EEG-based emotion recognition models. Furthermore, the mean accuracy of classification after data augmentation is increased 6.5% for valence and 3.0% for arousal, respectively.


Introduction
ese days, emotion recognition based on EEG signals using machine learning and deep learning has become very debatable in different fields of study. e EEG data generated in response to an emotional stimulus, compared with visual or speech signals, are unique and cannot be hidden by individuals, even when they try not to show their emotions. Additionally, neuroscientists are trying to find patterns of brain activities for different states of emotions and determine if these patterns are common among different people. Experimental results have shown there are neural signatures for three emotions: positive, neutral, and negative [1]. Feature engineering as a way of pattern recognition is another controversial issue that should be considered carefully in training a model. So how to extract meaningful brain activities from apparently meaningless and complex brain electrical signals is a big challenge for BCI [2]. Many methods have been proposed to improve performance in many aspects, including preprocessing, feature extraction, feature selection, and classification [3,4].
Many EEG-based emotion recognition methods have been studied in recent years. e main focus of emotion recognition is on feature extraction and classification. Classifiers use features as input to identify the emotional states. ere are various methods for feature extraction, such as the traditional method of feature engineering based on many signal processing techniques and statistics, or automatic feature engineering, which can be directly extracted by neural networks. Many studies have been done on both traditional and automatic feature engineering to propose an effective EEG-based emotion classification. Extracted features are given as input to effective standard machine learning models like SVM [5][6][7][8][9][10], KNN [10][11][12], etc. Lately, deep learning networks have shown significant power in feature extraction and classification tasks, and many researchers have applied different neural networks to EEG data [13][14][15][16] to enhance accuracy. e lack of EEG training datasets, compared with visual and audio datasets, is still one of the primary challenges in EEG-based emotion recognition tasks based on deep learning models. ere are only a few public datasets for EEG-based emotion recognition: SEED, DEAP, DREAMER, MAHNOB-HCI3, and MPED [14]. In addition, the scale of these datasets is much smaller than image datasets like ImageNet. A machine learning model would be more accurate if it could access more training data. Generating fake EEG data is a common solution to solve the lack of data problem.
is method is called augmentation. Lately, a variety of different techniques have been used to generate more data. For example, applying a geometric modification to original data is commonly used for image data augmentation. In EEG data augmentation, Gaussian noise is usually added to data to create new data [13], but recently a new method has been proposed to generate EEG realisticlike data by using deep generative neural networks [17]. A CWGAN network is proposed in [17] for the first time to generate a vector of EEG features. en, a technique is used to check the quality of generated data and only high-quality data are added to the trainset. Finally, SVM and DNN are trained to classify the original and augmented training data with binary classification. A 2-dimensional Arousal-Valence model is used to identify emotions from complex and nonstationary EEG data. e experimental results have shown that data augmentation improved the accuracy of classifiers. e rest of the paper is organized as follows: Section 2 provides an overview of related work on generative and data augmentation methods for EEG-based emotion recognition. In Section 3, the implementation of the proposed method is discussed in detail. Section 4 describes the DEAP datasets and presents the details of our experimental settings. e experimental results and comparison of the proposed method with different methods are presented in section 5. Finally, in section 6, we present the conclusions of our work.

Related Work
Due to the high costs and challenges of EEG data collection, most EEG public datasets are small and the number of recorded data from different participants is limited. is has a great impact on the accuracy of implemented machine learning models for prediction and classification tasks and imposes a huge challenge in EEG data classification. Working on a method to generate EEG fake data like real data, is a controversial issue to solve the lack of data problem in EEG-based emotion recognition tasks. In this paper, the performance of emotion recognition models with the standard machine learning models and deep neural networks are compared before and after data augmentation to check whether data augmentation was effective or not. e experimental results indicate that data augmentation method effectively improved the performance of models in some cases.
Data augmentation for EEG-based emotion recognition by adding Gaussian noise to the trainset is used in [13,18]. New data augmentation with deep generative models is proposed to generate EEG fake data for the first time in [17], and the results have shown improvement in accuracy. e combination of three datasets, DEAP, DREAMER, and a dataset that they collected themselves, is used to solve the lack of data problem in emotion recognition tasks in [19]. In the last few years, much research has been conducted by deploying machine learning techniques to analyze EEG data for emotion recognition. SVM classifier is proposed as a classification model for the prediction of three emotional states and EEG time-frequency features are used as input data for implemented classifier [20]. In [21], authors have used KNN as a classifier and amplitude of the signal as input features to predict eight emotional states.
LSTM network is developed to recognize emotions from EEG data and raw EEG signals of the DEAP dataset are given to the network as input features. Feature extraction is done automatically by the LSTM network and a dense layer is used for classification. e average accuracy of implemented network for arousal, valence, and liking is 85.65%, 85.45%, and 87.99% respectively. e proposed method reached a high average accuracy in comparison with the traditional techniques [15]. A multilayer group classification model based on a stacked autoencoder (MESAE) has been proposed to identify emotions. On the DEAP dataset [22], the average accuracy of the model for binary prediction of excitement and valence parameters was 77%, 76%, and F-score 69% and 72%, respectively. Two convolutional neural networks with new architectures are proposed for biometric identification based on EEG signals in [23]. An ensemble deep neural network is proposed to explore the correlation between channels and contextual information of recorded data from EEG frames. e hybrid method is a combination of CNN and RNN networks [24]. A deep neural network has been proposed to detect emotions from EEG signals using the DEAP dataset. Two types of neural network architecture have been studied in this research: CNN and DNN. Both models are highly effective in categorizing user emotions when training on preprocessed data [1]. GELM model has been used to identify stable patterns over time and evaluate the stability of the emotion recognition model. Feature selection and classification of patterns of emotions are evaluated in the SEED and DEAP datasets [25]. A CWGAN network is proposed as a data augmentation technique to generate EEG data in the emotion recognition task. e mean classification accuracy based on the 2d-arousal-valence model on SVM and DNN for the DEAP dataset is 48.9% and 47.5% respectively [17]. A combination of three datasets, DEAP, DREAMER, and a proprietary data set that they collected themselves, is proposed in [19] to solve the lack of data problem in emotion recognition tasks. e total dataset is related to 60 participants, which is the largest number compared with other datasets. e accuracy of this method for valence and arousal is 70.26% and 72.42%, respectively. As mentioned above, the study of EEG-based emotional recognition has never stopped. Although many deep learning methods have been developed to identify emotions from EEG signals, proposing a suitable method is still in its infancy. Due to the limitation of EEG data collection, the labeled EEG samples that can be used for deep learning techniques for EEG-based emotional recognition is a significant challenge, and proposing a solution is still an issue.

DEAP Dataset
e dataset includes brain electrical waves and physiological signals recorded during the user's response to an external stimulus. DEAP is a collection of brain, environmental, and facial signals while watching a music video [26]. In this dataset, 40 music videos have been selected to evoke people's emotions as much as possible. e number of participants in the experiments is 32. Data have been recorded from 40 channels which include 32 EEG channels and 8 physiological channels.
e period of each music video is 63 seconds, which includes a 3-second preparation period for watching each music video and one minute for watching. After watching each music video, the participants give a score from 0 to 9 in terms of Valence, Arousal, Dominance, and Liking to each music video. e score that each person gives is considered as a standard criterion for each person. Participants' evaluation of each video is based on the two-dimensional arousal-valence model which is shown in Figure 1 [27].
Arousal: indicates the intensity of people's feelings. e higher the value, the stronger the feeling, and the lower the value, the weaker the feeling. e scale ranges from calm to excited [28].
Valence: indicates the degree of pleasure in the people's feelings. e higher it is, the more positive and happier the person feels, and the lower it is, the more negative and sadder the person feels. e scale ranges from unpleasant to pleasant [28]. e dataset description is given in detail in Table 1. In each of the 32 participant files, the length of data recorded in 63 seconds is 8064 samples, sampled at a frequency rate of 128 Hz. In each of the 32 files, there are physiological and EEG signals recorded from 40 different channels for 40 trials [27].

Implementation Details
Traditional feature engineering is one of the oldest solutions for analyzing EEG signals. Depending on the type of problem, features that describe a particular pattern of the signal have been identified and extracted. Feature identification to describe any pattern in brain signals is itself a complex branch of data analysis. According to previous research and their results [22], appropriate features for identifying emotions have been selected and extracted. Feature extraction reduced the dimensions of recorded EEG data. After feature extraction, a data Augmentation method is proposed to generate more data from real EEG data to extend the dataset and overcome the lack of data which leads to overfitting and incorrect prediction of classification models. e general process of recognizing emotions based on traditional feature engineering is shown in Figure 2. Finally, SVM and DNN are used as a classifier to validate the result of augmented data on extracted features.

Data Preparation.
Emotions themselves are a complex issue and relate to many things that are still unknown. Although emotion recognition from EEG signals is an interesting issue, it is too hard to figure out what exactly is going on in a human's mind by analyzing brain activities. Electrical brains might produce different patterns in people's brains in response to the same emotional stimuli. e perplexing EEG dataset is shown in Figure 3. Figure 3, the recorded data is large and confusing. e first step before solving a problem is a clear definition of the problem. e key point is to clarify what exactly is going to be solved. e first question that arises at first glance is whether we are going to examine and analyze the emotions of one person in different experiments or whether emotions are to be identified between different people. It is important to consider that the emotions of different people in response to the same stimuli may create different emotional patterns  in the brain and it is difficult to find a common pattern between them. In this paper, the identification of emotions between different people has been studied and recorded data from all participants in 40 experiments have a total number of 1280 samples. e first preprocessed and rearranged dataset before any exploration is shown in Figure 4. e dimension of data is still high and takes a long time to be explored and analyzed. Besides, memory usage for this dimension of data is too high. us, after rearranging the data, it is time to reduce data dimensionality by doing some feature engineering.

Feature Extraction.
In general, feature extraction from EEG signals is one of the most important issues in signal processing. Extracted features from a signal describe the behavior of a signal, and each feature gives special information about data. erefore, extracting features that can accurately describe signal behavior increases the learning power of machine learning models. If the features extracted from the signal can be easily divided into different classes    and the boundary between them is clearer, the machine learning model would be able to learn better. e main purpose of feature extraction is to extract more important information hiding in massive data. Additionally, the feature extraction process also significantly reduces the required resources for data analysis and processing high dimensional data by reducing data processing volume. Time complexity and resource usage is a controversial issue in data analysis and deep neural network-based research. Recently, different techniques have been proposed for feature extraction from EEG signals. So, what is given to a model as input is important. In this paper, many features are extracted from EEG signals as input for machine learning models. e extracted features [22] are shown in Table 2.
All features have been extracted with the help of python libraries and extracted features collocated in a 2d array which is ready to be given to machine learning models. Extracted features describe how brain signals change due to different emotional states. e meaning of each extracted feature is explained accordingly:  Table 3.
EEG recorded data in 6 (s) from a single channel and EEG data in different frequency bands such as eta, Slow-Alpha, Alpha, Beta, Gamma are shown in Figures 5-9  (1) Features matrix. e EEG features matrix is shown in Figure 23. e rows represent the total number of 32 participants in 40 experiments (1280:32 × 40). e columns represent the extracted features from EEG signals (344).
(2) Labels matrix. Based on the scoring value of the participants, the values of the labels ranging from 0-9 are recorded as continuous values. e number 5 has been chosen as the threshold for labeling the upper and lower classes. Hence, scores above 5 are considered as 1, which means high and scores less than or equal to 5 are considered as 0, which means low. erefore, according to Table 4, the labels are divided into two separate classes, 0 (Low) and 1(High).
(3) Splitting of Train and Test Data. To train the proposed model and test if it works properly, the entire available dataset must be divided into two parts: the trainset and the test set. e number of train and test sets is 1152 and 128, respectively.

Data Augmentation.
Data augmentation is the process of generating new samples by transforming training data to improve the accuracy and robustness of classifiers [29]. Unfitting methods of increasing data to improve the performance of the model not only do not improve the learning ability of the model but also worsen the result and reduce the predictive power of the model. An appropriate data augmentation method must be chosen based on data properties. Two common data augmentation methods were formerly used in image processing: geometric transformation and noise addition. Geometric transformations, such as shift, scale, rotation/reflection, etc., are not a good choice for augmenting EEG data because it is nonstationary signal and changes over time. e extracted features in the time domain or frequency domain are still time series, so the rotation or shifting of these time series would destroy the features, so it cannot be a suitable technique for this kind of data. Compared with geometric transformation, adding noise is a better choice but not the best method for augmenting EEG data. ere are a variety of noises that can be added to data, such as Gaussian, Poisson, Salt, Pepper, etc., but since EEG data is nonstationary, we cannot add any type of these noises to data because it might change the features of EEG data locally. e most frequently used noise for EEG data augmentation based on previous research is Gaussian noise that is added to each feature of the EEG time series to create new data from original data [18]. In our work, we considered using GANs as a very new EEG data augmentation method for generating new data.

GANs.
Due to the cost of data collection, most EEG datasets have a small amount of EEG data. Lack of data makes it difficult to predict emotional states with deep learning models that require sufficient training data. In this study, the data enhancement method has been used to solve the lack of data problems in the emotion recognition task. Experimental results have shown that more data can effectively improve the performance of emotion recognition based on deep learning models. Recent work on generative models such as Generative Adversarial Networks (GAN) and Variational Autoencoders (VAEs) have shown that they generate new data like real data. Evidence has also shown artificial data generated by a generative model can be used to Computational Intelligence and Neuroscience eta waves are associated with natural consciousness or thinking and anxiety and concentration. Beta is usually seen with a symmetrical distribution on both sides of the brain but is more pronounced in the frontal lobe. It may not be present or reduced in areas where the cortex is damaged.

Alpha 7/5 < alpha <13
Alpha frequency band waves are generated by the simultaneous electrical activity of large groups of neurons.
ey are usually found with the eyes closed but still awake in signals recorded from the scalp more than the occipital lobe during periods of relaxation. Open eyes also reduce drowsiness and sleepiness. It mostly indicates a state of consciousness Beta 12 < beta <25 Beta frequency band waves are a fast irregular activity, where the cortex is damaged.
Beta waves are associated with natural consciousness or thinking and anxiety and concentration. Beta usually occurs on both sides of the brain with a symmetrical distribution but is mainly seen in the frontal lobe. It may not be present or reduced in areas Gamma waves are thought to be a sign of the active exchange of information between the cerebral cortex and other areas.
Gamma waves are usually generated in the brain when people are conscious and when the eyes move rapidly. Gamma and beta waves may overlap within the range of natural frequencies, and the exact boundary between these two frequency bands is not clear and yet is a judgment for experts.   Computational Intelligence and Neuroscience increase data, to improve classifier accuracy and prevent overfitting by increasing generalizability [17]. Figure 24 shows how GAN works. Generally, GAN consists of two main components including generator and discriminator that are trying to defeat each other. e input of the generator network is random noise, and the discriminator gets two inputs; generated fake data and real data.
It should compare the generated data with real data to recognize whether it is fake or real. e purpose of the generator and discriminator is to fool each other. e generator tries to produce high quality which is like real data to fool discriminator. e discriminator tries to detect fake data. is process continues until the generator produces data that the discriminator cannot recognize whether it is Computational Intelligence and Neuroscience fake or real and consider the generated data as real data. GANs are not able to produce labeled data.

CWGAN Implementation.
In [17], the CWGAN network was proposed as a new data augmentation technique to produce EEG data without any judgment about its quality. In this work, not only does the proposed CWGAN produce EEG features, but also the quality of produced data is considered. erefore, CWGAN is used to generate features that have been previously extracted. Besides, a supplementary condition is considered in generating data to produce labeled data. en, the quality of produced data is evaluated, and high-quality data is added to the train set. e proposed CWGAN consists of two networks: a generator and a discriminator. ese two networks work together to  Ave PSD   Ave PSD  Computational Intelligence and Neuroscience  Ave PSD Difference  Variance in  AF3  F3  F7  FC5  FC1  C3  T7  CP5  CP1  P3  P7  PO3  O1  Oz  Pz  Fp2  AF4  Fz  F4  F8  FC6  FC2  Cz  C4  T8  CP6  CP2  P4  P8 PO4 O2 Figure 22: Zero crossing Rate of EEG recorded data for each channel in 60 (s).
Computational Intelligence and Neuroscience generate realistic-like EEG features. ey constantly try to defeat each other. e generator gets a Gaussian noise and a label as input and the discriminator gets two pairs of labeled generated and real data. e generator tries to generate fake data with the same distribution of real data to deceive the discriminator and the discriminator tries to distinguish if the given data is real or fake. e proposed CWGAN works well if the generator can deceive the discriminator. e architecture of CWGAN is shown in Figure 23. e main difference between GAN and CWGAN is that CWGAN produces labeled data. Figure 25, the generator is designed as a simple deep neural network that gets noise and labels as input and produces fake data from the given   Figure 24: GAN network diagram. noise. Initially, the quality of generated data is adequate. After a few epochs on the generator training process, the generator learns to produce high-quality data to deceive the discriminator. e learning phase is then complete.

Discriminator.
It is designed as a simple deep neural network that gets two pairs as input, labeled fake data which is produced by the generator, and labeled real data. e discriminator must distinguish whether the two pairs of given data have the same distribution or not. If it discovers that the distributions of given data are the same, it shows the generator succeeded in deceiving the discriminator by producing high-quality data and the training phase is complete. e discriminator network is shown in Figure 26.
After preparing the training data, the train set is ready to be given as input of the proposed CWGAN to generate more fake data. So, after some preprocessing and normalization on the prepared training set, it is given to the network. en, by setting the hyper-parameters of CWGAN, it is ready to generate fake data. e quality of the generated data is determined by comparing the distribution diagram of real and generated data and by loss function. e number of training steps is set to 500, and the data generated after 500 steps have high quality. Hyperparameters of the generator and discriminator, i.e., the Epoch number, Batch size, and Learning Rate, are 10, 32, and 0.0002, respectively.

Evaluation Quality of Generated Data.
Evaluation of generated high-dimensional EEG data is challenging for researchers. One of the main challenges of using the CWGAN network to generate EEG data is that the quality of generated data cannot be easily identified. Image data can be easily evaluated by visual observation and comparison, but another solution must be sought to evaluate the similarity of produced EEG data with real EEG data. One of the most common methods for comparison is to compare the distribution of generated data with original data. Another technique is to observe the changes in the loss function diagram for the generator and the discriminator during the training phase. Figure 27 describes the changing process of the loss function of the discriminator and generator during training. It shows the process of the CWGAN training phase and the quality of produced data. Initially, the generator begins to generate random data from the noise given to it as input. As shown in Figure 27, the loss of the generator is high, and the loss of the discriminator is low, which means the generator is not able to generate high-quality data and deceive the discriminator. e low loss value for the discriminator means it can distinguish that the given data is fake. e optimum point for high-quality data generation is a low generator loss and a high discriminator loss. When the diagram converges to this point and the changes in loss value become stable, the training phase is complete, and the generated data seems to have a high quality. Also, the distribution of generated and real data must be compared. If they were similar enough, it means that CWGAN was able to generate high-quality data. e distribution of the data is shown in Figure 28, where Z1 and Z2 are the extracted features by PCA with the largest eigenvalues.
Due to the high dimensionality of EEG data, comparing its distribution plot is very difficult. Hence, PCA is applied to generated and real features to reduce the dimension of data for better visualization and comparison. As shown in Figure 18, after 500 training steps, the data generated by CWGAN and its scatter distribution in two-dimensional space are like the real data. Training can be stopped when the scatter plot of generated data becomes equal to the original data and there is little change in the next steps. For image data, its quality can be easily determined by observing and comparing the generated data with original data. Output of this network is ultimately a CSV file that stores a set of generated features and a file that contains the generated labels, which are also formatted as a CSV file.

Adding Generated Data to Train Set.
In the next step, generated data is appended to the training set. Various numbers of data have been generated and added to the training set, but only some of them were able to improve the results of classification.

Classification.
For data classification, a support vector machine and a deep neural network have been applied to train various sizes of augmented data, and results have shown that in some cases, classification accuracy improved.
Contrary to our expectation that increasing data improves classification accuracy, in some cases increasing data not only did not improve accuracy but reduced it.
To implement a stable and efficient deep neural network, a different number of layers and neurons have been tested to reach a high-quality design. Finally, this architecture has yielded the best results. e first layer, which is the input layer, contains 512 neurons, and the hidden layers have 256 and 128 neurons, respectively. After the last hidden layer, a dropout layer is placed to prevent overfitting. e last layer consists of a neuron for binary prediction with the sigmoid activation function, and the middle layers have the Relu activation function. e network architecture is simple and easy to implement. Low memory consumption and execution time are the issues considered in this research. Support vector machine, which is one of the most powerful machine learning algorithms with easy implementation, high training speed, high predictability, and high stability, is considered for classification. Different kernels have been tested, and it was concluded that the linear kernel was the best in this case.

Result
For an appropriate training phase, a different number of augmented data and the design of network have been tested. Experimental results are listed in Tables 5 and 6.
As shown in Tables 5 and 6, data augmentation is more efficient in neural network models than standard machine learning models. Data augmentation improved the prediction accuracy of both SVM and DNN classifiers. It was clear that by doubling the data, SVM accuracy improved up to 3.9%, but DNN did not improve at all. e reason is obvious; Deep neural network models require more data than traditional machine learning models. On the one hand, by adding too much data to the original dataset, not only did the accuracy of SVM not improve, but it got worse. On the other hand, adding too much data significantly improved DNN prediction accuracy. DNN prediction accuracy improved up to 6.7%, which is surprisingly noticeable. In conclusion, the data augmentation task, especially in EEG data, is complicated, and so many issues need to be considered. In this experiment, a great number of data have been generated and added to the original dataset, but not all of them effectively yielded the expected result. is means more data does not assuredly improve accuracy. e more important concern in this task is the reliability of the classifier's accuracy. A comparison of the proposed method with previous work is shown in Tables 7 and 8.

Conclusion
In this study, two challenges were the priority in identifying emotions from EEG signals. e first one is the high dimensionality problem of EEG signals, and the second one lacking EEG data. To solve these problems, feature extraction and data augmentation with generative adversarial networks were, respectively, proposed. e implemented method had a better accuracy on DNN, compared with SVM classifier, which means lack of data is more important for neural network models than traditional machine learning models. e distribution of extracted and generated features has shown that features are heavily cluttered and there is no clear border between the features of different classes. is leads to low classification accuracy, and it is more evident in SVM than DNN.

Future Work
In this paper, the most important tasks were EEG data generation and feature extraction. e experimental results have shown that extracted features have a key role in the classifier prediction and learning phase. If extracted features have the potential to clearly describe the patterns of signal in different classes, the ability of the classifier increases in prediction and the model works more accurately. erefore, in future work, feature extraction techniques are considered as the priority of our research criteria. e next problem that leads to the wrong prediction is relabeling data. e binary encoding for the target is one of the reasons of the model's false prediction and low accuracy. For instance, in target encoding, 5.1 is considered as 1, and 4.9 is considered as 0.
ese two labels are very close to each other and seem to have the same pattern, but they are considered in two different classes of prediction, and it is easy for a model to become confused in prediction, and the false prediction rate increases.

Data Availability
To gain access to the dataset and download the files, please visit links below in order to obtain a username and a password: https://www.eecs.qmul.ac.uk/mmv/datasets/ deap/download.html https://anaxagoras.eecs.qmul.ac.uk/ request.php?dataset�DEAP.

Conflicts of Interest
e authors declare that they have no conflicts of interest.