Extracting a Novel Emotional EEG Topographic Map Based on a Stacked Autoencoder Network

Emotion recognition based on brain signals has increasingly become attractive to evaluate human's internal emotional states. Conventional emotion recognition studies focus on developing machine learning and classifiers. However, most of these methods do not provide information on the involvement of different areas of the brain in emotions. Brain mapping is considered as one of the most distinguishing methods of showing the involvement of different areas of the brain in performing an activity. Most mapping techniques rely on projection and visualization of only one of the electroencephalogram (EEG) subband features onto brain regions. The present study aims to develop a new EEG-based brain mapping, which combines several features to provide more complete and useful information on a single map instead of common maps. In this study, the optimal combination of EEG features for each channel was extracted using a stacked autoencoder (SAE) network and visualizing a topographic map. Based on the research hypothesis, autoencoders can extract optimal features for quantitative EEG (QEEG) brain mapping. The DEAP EEG database was employed to extract topographic maps. The accuracy of image classifiers using the convolutional neural network (CNN) was used as a criterion for evaluating the distinction of the obtained maps from a stacked autoencoder topographic map (SAETM) method for different emotions. The average classification accuracy was obtained 0.8173 and 0.8037 in the valence and arousal dimensions, respectively. The extracted maps were also ranked by a team of experts compared to common maps. The results of quantitative and qualitative evaluation showed that the obtained map by SAETM has more information than conventional maps.


Introduction
Emotion is one of the essential cognitive aspects of human beings. According to cognitive studies, evaluation of human emotion in contact with individuals and social environments plays an important role in behavior human daily life [1]. Te emotion of a normal individual can be recognized by processing body reactions including facial expressions, voice, body gesture, and electrophysiological reactions. Electrophysiological signals are more preferable, especially in case of abnormal individuals, which other body reactions rarely represent internal emotional states. Terefore, the study of emotions would have a great impact on the treatment process of diseases such as depression, autism, epilepsy, and similar cases [2]. In addition, emotion recognition is an interesting topic in many research areas. Te brain-computer interface (BCI) system introduces methods such as recording physiological signals from the human brain based on the central nervous system [3]. Physiological signals record the electrical activity of neurons in the brain in diferent parts of the cerebral cortex. Electroencephalogram (EEG), which has been used to detect brain abnormalities, is a noninvasive method for recording brain signals [4] and contains rich information about internal emotional states with the most comprehensive features. Te EEG signal can be processed by the state-of-the-art marching learning methods, machine learning classifers, and classifcation approach.
Machine learning is one of the leading methods in developing BCIs. Machine learning has many subsets such as recurrent networks, deep learning networks, and Boltzmann networks, which have their own strengths and weaknesses based on the application [1,5,6]. Deep learning is a specialized example of this method, which has been considered in recent decades. Te development of machine learning algorithms is an interesting topic in the feld of cognitive science. Deep learning networks are a trending machine learning subject capable of detecting underlying states hidden in EEG signals. Deep learning, especially in the case of large dataset such as EEG, shows acceptable and citable results in both supervised and unsupervised EEG classifcations [6].
Autoencoder (AE) is a special type of artifcial neural network and one of the deep learning algorithms, which automatically learn the compressed representation of raw input data [7]. Autoencoders (AEs) can extract low-level features from the input layer and high-level features in deep layers, which is well done with the structure of stacked autoencoders (SAEs) [8]. AEs extract complex nonlinear patterns from EEG data, which make the process of diagnosing and treating diseases more accurate. Zhao and He [9] developed deep learning networks to analyse early-stage Alzheimer's disease from the EEG signal and reported 92% accuracy to increase the diagnosis of this disease. Jose et al. [8] employed SAEs to study epilepsy and detected epileptic seizures by EEG signals and extracted features, such as relative energy, spectral features, and some nonlinear features from each channel. Tese data were imported as input to an autoencoder network, which resulted in 91.5% accuracy in the diagnosis of seizure with the concept of adaptive. Furthermore, the study of AE networks in emotion recognition from EEG data has received much attention in recent decades. Yin et al. [6] conducted studies on emotion recognition through deep networking based on a multiplefusion-layer based ensemble classifer of stacked autoencoder (SAE). Using the AE network could increase the average classifcation by up to 5.26% compared to other emotion recognition networks [6]. On the other hand, the combination of neural networks is one of the most recently published for emotion classifcation. Liu et al. [10] combined a convolutional neural network (CNN), SAE deep neural network, and a deep neural network (DNN) to classify emotional states and reported acceptable results compared to a neural network method.
Te EEG signal has acceptable temporal resolution and it does not provide useful information in terms of spatial resolution [11,12]. As a result, spatial resolution of EEG contains rich information about emotional states. One of the common methods in visualizing EEG signal is quantitative EEG (QEEG) analysis, that is well known as topographic brain mapping, which provides a cost-efective and practical method for spatial evaluating of neural activities. Tis method represents structural and efective communication in nerve cells, nerve complexes, and brain structure [13]. Brain topography by the QEEG technique is obtained by extracting features from the EEG signal. Today, with the advancement of topographic maps, the analysis of EEG provides a comprehensive exploration of temporal and spatial characteristics simultaneously [12,13].
In conventional "topographic brain mapping" technique, only one feature is considered to draw a map. For instance, the classical Fourier transform is calculated to quantify the power spectrum in the frequency subband of EEG signal [14], and entropy is another feature derived from EEG signal for brain mapping. Keshmiri et al. [15] examined entropy to diferentiate between the brain's negative, neutral, and positive states to emotional stimuli. Moreover, power spectrum density (PSD) is another feature, which provides a separate topographic brain map [16]. As a consequence, investigating all the features underlying the EEG signal would create a larger number of topographic brain maps.
Tis study aimed to evaluate the hypothesis that compression of temporal, frequency, linear, and nonlinear EEG features can provide original and useful information about brain function in the form of topographic brain maps. Tus, we have presented a novel method to reduce the number of topographic brain maps to only one map by preserving spatial features and extracting the optimal combination of all features that existed in EEG signals. Terefore, the resulting topographic brain map is a specifc combination of the extracted feature while preserving the spatial EEG signals features [11]. Terefore, a method is required to extract the optimal combination of EEG features. Hence, an AE-based optimal feature selection network was proposed to extract the optimal topographic brain map (stacked autoencoder topographic map-SAETM), which would provide more complete information about brain functions. In addition, evaluating one map instead of several maps speeds up the diagnostic process. To prove the study hypothesis, SAETM and conventional topographic maps were compared in a quantitative and qualitative manner. Tere are many common criteria for measuring the similarity of two images, including absolute error, mean square error, peak signal-tonoise ratio, histogram, similarity of Euclidean distance, or correlation coefcient to compare two independent images [17] and also using classifer methods. Accordingly, Topic and Russo [3] revealed that CNN networks have the highest performance in calculating similarity between maps of diferent classes. In addition, similar studies on the DEAP database by topographic brain maps with deep learning networks have enhanced the process of emotion recognition based on Capsule neural network (CapsNet) [18]. Finally, the SAETM and conventional topographic brain maps were compared by a team of specialists based on a scale questionnaire for further evaluation.

Materials and Methods
Te study consists of four main parts, including EEG signal processing, stacked autoencoder network, emotion classifcation, and algorithm parameters, extracting a new topographic brain map. Te frst part includes EEG signal preprocessing and extraction of conventional features in 2 Journal of Healthcare Engineering emotion recognition as well. In the second part, the extracted features are abstracted by the autoencoders. Te best structure of features is obtained by the emotion classifer in part three. In the last part, the ultimate features are used to draw the topographic brain map. Te architecture of the SAETM is illustrated in Figure 1, including primary feature extraction (part 1), SAEs networks for abstracted feature extraction (part 2), multilayer perception (MLP) networks to extract fnal features based on emotion classifcation (part 3), and topographic brain mapping (part 4). As shown in Figure 1, the EEG signal features are extracted for each channel and fed to an SAE network. Tus, there are 32 SAE networks. At the output of each SAE, an MLP network is used to obtain a fnal feature; therefore, one feature is obtained for each channel. Moreover, there is an MLP classifer that is applied to the output of the previous MLPs layer. Te output of this classifer is used for emotion classifcation, in arousal and valance dimensions, that the parameters of the SAETM algorithm will be adjusted by this classifer. A colour is assigned proportionally to each weight of the frst MLPs layer to draw a topographic brain map.

Database.
In this study, DEAP physiological dataset was used in emotion analysis with simultaneous recording of EEG signals and eight electrophysiological signals, including skin galvanic, respiratory rate, skin temperature, pulse rate, blood pressure, neck and smile muscle activity, and EOG signal. Te EEG signal was recorded through 32 locations based on the International 10-20 system. Te study was conducted on 32 healthy participants aged 19-37 (mean age 26.9), half of whom were women. Tis experiment was designed in a controlled environment to stimulate emotions. Forty music videos were played based on diferent emotional states when recording the signals. Tere was a 3-second interval between each music video to reset the participant's emotional states. Te baseline signal was recorded for 5 seconds and after that the videos were randomly displayed to the participants. Tose videos that were used as emotional stimuli were categorized with emotional labels using the self-assessment Mankins questionnaire. Ten, the participant gave each video a score of one to nine after watching the full videos. Scores 1 to 3 corresponded to the negative state of the valence dimension and the inactive state in the arousal dimension, 4 to 6 were related to the neutral state of the valence dimension and the normal state in the arousal dimension, and 7 to 9 were relevant to the positive state of the valence dimension and the active state in the arousal dimension. Tese scores were divided into happy, pleased, relaxed, excited, neutral, calm, distressed, miserable, and depressed classes, which were related to four dimensions of emotion valence (positive/ negative), arousal (passive/active), liking (like/dislike), and dominance [19].  Journal of Healthcare Engineering

Preprocessing
Standard deviation Zero-crossing rate

Correlation dimension
For any set of N points in an m-dimensional where g is the total number of pairs of points, which have a distance between them that is less than distance ϵ

3: C(ϵ)
∼ ϵ CD and converted to a frequency of 128 Hz using the down sample method. Ten, all the EEG trials were fltered to 0.05-47 Hz. Recorded EEG is afected by several noises and artefacts. Te independent component analysis (ICA) algorithm extracts statistically independent components from a mixture of sources. In this study, the ICA was used to remove unwanted signals, including EMG and EOG signals. On average, 1-3 artifact-related independent components (ICs) were removed per participant.

Primary Feature Extraction.
Feature selection is considered as one of the most important parts since these features can describe the signal. EEG signal features are divided into three main classes of time, frequency, and time-frequency features [11]. In this study, features, including power and statistical features as linear features and entropy, fractal dimension, and correlation dimension as nonlinear features are selected, which were considered in previous emotion recognition studies. Te calculation of power is a common feature for all EEG subbands [20,21]. Power spectrum density for fve subbands, theta (4-8 Hz), low alpha (8-10 Hz), upper alpha (10-12 Hz), beta (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma (30 Hz higher) is calculated by Welch's method [22]. Mean, standard deviation, and zero-crossing rate are examined as statistical features [6] and signal complexity is measured by entropy [1]. Te fractal dimension is used for measuring the complexity and irregularity of the signal [23]. Te correlation dimension shows the relationship between the signal and itself, which extracts repetitive and periodic patterns of the signal [24] that these features were extracted from the fltered signal. Te extracted features were normalized to the baseline signal in the range of zero and one. Table 1 lists the features extracted in this study based on previous studies [23]. All data were labelled according to the arousal-valance domain. Te data labels were used for supervised training of the SAME algorithm. Te trials were 1-minute intervals in which music videos with diferent emotional states were shown. Te DEAP dataset of each trial specifed a number from one to nine, which was assigned to it. Tis study focused on the high arousal-high valence, low arousal-high valence, high arousal-low valence, and low arousal-low valence. Te reason for this choice is that the diference between the positive and negative levels of the valence scale and the high and low levels of the arousal scale are very signifcant. Tese two scales have two complementary and diferent aspects to examining positive and negative emotions [22]. A 2-second window with 50% overlap was used to extract the features. A total of 8 music videos were played in high arousal-high valence. 60 * 8 * 10 (60 windows * 8 music videos * 10 features) features were extracted from the frst area. Te low arousal-low valence included 12 music videos, and the extracted features were 60 windows * 12 music videos * 10 features. Te two parts of low arousal-high valence and high arousal-low valence played ten music videos, 60 * 10 * 10 (60 windows * 10 music videos * 10 features) features were extracted for each area [19] (Figure 2).

Stacked Autoencoder Topographic Map (SAETM)
Te autoencoder is a deep learning network to get a better description of the features [24]. Autoencoders have a symmetrical structure and the inputs and outputs are similar [7]. Each layer of autoencoders consists of three layers (input layer, one hidden layer, and one output layer). Te hidden layer contains two parts, encoder and decoder. Te stacked autoencoder includes several autoencoders with a SoftMax layer. Te input of the frst layer of the SAE network is the features extracted from the EEG signal (Table 1). Tese features are weights and biases calculated with the training of the frst AE network. Te output of the encoder at this stage is the input of the next AE network. Tis process continues to obtain the fnal abstracted features, and fnally, the output of the last AE network encoder is used to classify emotions [25].
In the frst step of the SAEs training, the network uses unlabelled data to extract EEG abstracted features in an unsupervised procedure. Ten, the encoder part is completed with a classifer and it trained with supervised procedure to fnetuning the SAE parameters. It can help to initialize the weights one layer at a time by minimizing the reconstruction loss.
Assuming that the vector of the extracted features from input and the vector of the hidden layer are x ∈ R n and h ∈ R m , respectively, n is the dimension of the extracted features in input and m is the dimension of abstracted features (Equation (1)) (R is real number).
where W ∈ R m×n is a weight matrix, b ∈ R m is a bias vector, and σ is an activation function (sigmoid function) (Equation (2)) that is located in the output layer. (2) x ′ ∈ R n is the next layer that has the same dimension as the input vector. Te output reconstructs the input vector by updating the hidden layer weights.
Te autoencoder parameters, W, W T , b, and c are obtained by the backpropagation algorithm by the square error cost function according to equation (4) that l is considered the number of the training samples.
Te next autoencoder layer is used by h and this operation is repeated l times to produce a stacked autoencoder. Te best abstracted features are produced in the hidden layer of each autoencoder and h (l) is the best representation of abstracted features (Equation (5)). (2) . + c (l) .

(5)
Tis stage is called pretraining to set SAE parameters. An MLP network with one output neuron is added to the encoder side of each SAE to extract the one abstracted feature in order to plot brain map in topographic map stage. U i is the output function in which µ is the matrix of weights and z is the bias vector in the MLP layer, and i is the number of SAEs.
Te feature sets are defned as F (n) . It means that the features of each channel are grouped into ten parts, the power is F 1 , F 2 , F 3 , F 4 (four subbands are selected). Te linear EEG features including means, standard deviation, and zero-crossing rate are F 5 , F 6 , and F 7 , respectively. In the end, F 8 , F 9 , and F 10 are built based on nonlinear features, fractal dimension, approximate entropy, and correlation dimensions. Terefore, feature vectors are defned, x(F j ) ∈ F j , j ∈ 1, 2, . 10 { }. We construct i SAE for describing the hidden feature abstractions of each channel based on equation (7), where S 1 sae (x), . . . S 32 sae (x) denote the higher feature abstractions of each channel features.
Te structure of the SAETM is completed by placing two neurons in the last layer (Equation (8)), where y � 1 0 or shows the low and high levels of emotion dimensions.
where y is the output function in which β is the matrix of weights and α is the bias vector in the last layer. Te fnetuning stage is an important stage of SAE networks. Te fnetuning method is used to train large labelled data and can improve classifer performance [6,25]. Tis stage fnetunes the parameters of the last layer of SAE by backpropagation algorithm in the form of training with the supervisor. Te parameters obtained in part 4 are used for the topographic brain map. Te number of layers and the number of neurons in each SAE layer are important in SAETM training. Terefore, the minimum hidden layer and minimum number of neurons in each layer are essential for having an optimal classifcation. In this study, Pearson or Spearman correlation coefcients were used to fnd the most optimal structure [6,10]. Tese two coefcients calculate the best similarity between input and output data. Tese two parameters calculate the best similarity between input and output data. Terefore, the structural loss function (SLF) is defned based on equation (9).
where ω 1 � 0.5, ρ 1D x D z , and ρ 2D x D z are the Pearson correlation coefcient and Spearman rank correlation coefcient, respectively [6], where D x is the input matrix and D z indicates the output matrix.

Classifer Evaluation.
Depending on literature [26], choosing the type of classifer can afect the results. For this purpose, referenced classifers will be used in this study and the desired classifer will be selected based on the results. To check the accuracy of the network, we consider a criterion as described below. Te following equations are used to evaluate the precision of classifer of emotion classes in equation (10), in which TP is true positive and FP is false positive [27].
Te network accuracy is calculated by equation (11), where FN is a false negative.
Te classifer accuracy is generally obtained from equation (12) in which TN is true negative.
Te F1 is a combination of the accuracy and recall criteria, which is obtained according to equation (13).

Evaluating the Topographic Brain
Maps. Extracting topography or brain map is one of the practical methods of QEEG. Parameters of making the brain topography are calculated for diferent subbands of EEG signal for each number of electrodes according to the standard of the 10-20 international system. Te extracted features in the previous section are considered as colour mapping parameters. Te bilinear interpolation method is used for navigating the values between the electrodes [13,27]. In this study, the brain topographic map was extracted by the MNE library in Python software.

Te CNN Used in Image Classifcation.
Te convolutional neural network (CNN) is a feed-forward neural network, in which the input of this network is image-like. CNNs are originally designed for evaluating images [3]. In this study, we use CNN accuracy as criteria to measure similarity between two groups of topographic maps. Te building blocks in CNN architecture include convolution layer, pooling layer, and fully connected layers. Te convolutional layer is the central part of a CNN. In this layer, there are multiple flter slides (or Kernel) that convolves across the input with the convolution operation. Tis operation has the ability to extract features with preserving spatial information from the database and the pooling layer can decrease the spatial dimension of features. In addition, the pooling layer also flters out noise from the image. An image is convolved with a flter to learn one feature from the whole image. Te fully connected layers connect inputs in the previous layer, pooling layer, to the output neurons [3,28]. Suppose a M × M image convolves with a k × k kernel. Equation (14) shows the size of the output image without padding and equation (15) is the convolution operation. Padding is used in order to preserve the size of input image. Te size of the output image with padding is shown in equation (16).
where O is the output, P is the padding, s is the stride, b is the bias, δ is the sigmoidal activation function, δ is a 3 × 3 weight matrix of shared weights, and h x,y is the input activation at position x, y [29]. CNN model, which is used in this study, is presented in Figure 3. Max pooling were applied as pooling method. In max pooling, the maximum activation output is pooled into a 2 × 2 input region and the parameters of the model were set as follows: Number of epochs: 10, optimizer: RMS prop, learning rate � 0.001, the parameter β: 0.9, activation: sigmoid, stride: 1 for convolution layer, stride: 2 for pooling layer.

Results.
In this section, the results obtained from the SAETM were presented to extract topographic brain maps. Te data were divided into train and test groups to evaluate this algorithm. All data were normalized for each participant with a mean of zero and a standard deviation of one to eliminate the diference in the scale of features. K-fold crossvalidation method was used to evaluate the studied samples better. k � 10 was considered so that each time 0.1 of the data is selected for testing and trained with 0.9 of the data. Tis operation is repeated ten times to observe all the data by the network.

4.2.
Architecture of the SAETM. Te appropriate selection of SAETM parameters, that is, the number of hidden layers and the number of neurons in each layer, improves network performance. Figure 4 illustrates the SLF based on (7) for the ten features selected in Table 1. Te SLF was used to optimize the number of neurons in each layer that was calculated by adding the number of neurons in each layer. Figure 4(a) represents the trend of feature abstraction in the F3 channel as an example of channels in the left hemisphere. As shown, the input of the frst hidden layer is ten features extracted from the EEG signal. Te SLF value has its lowest value in the frst layer with seven neurons. Terefore, seven abstracted features were obtained in the frst layer. Adding another neuron to this layer increases the amount of SLF. Terefore, the minimum amount of SLF, which is seven neurons in the frst hidden layer, is important. Te seven features extracted from the frst hidden layer are the inputs of the second hidden layer. Te minimum amount of SLF is observed in the second hidden layer with four neurons. Tus, ten neurons are reduced to seven neurons and fnally to four neurons. Figure 4(b) presents these calculations in the right hemisphere for the F4 channel. In this channel, ten features  Journal of Healthcare Engineering were reduced to six in the frst hidden layer, three features in the second layer, and fnally to one feature. Figure 4(c) is similarly calculated for the Cz channel. As shown, ten features were decreased to seven features in the frst layer and three in the second layer. Table 2 shows the number of neurons in each hidden layer in each of the 32 channels. Te maximum and minimum neurons in the last layer are four and one, respectively.

Accuracy Measures for the Comparison of Classifers.
Te abstracted features are obtained in the last layer after fnetuning the SAE parameters. According to the hypothesis of this study, the output of each SAE is used as an optimal feature to extract the brain topographic map. Te performance of the SAETM algorithm was compared with several emotion classifers. Figure 5 demonstrates the comparison of the accuracy of emotion classifers with the accuracy of the SAETM. KNN (K-nearest neighbour classifer), BN (naive Bayesian classifer), and SVM (support vector machines) are selected for the reason that these classifers are known as widely used classifers in emotion recognition feld using EEG information [23]. In Figure 5, the SAETM is made up of the MLP network (multilayer perceptron) [26]. Figures 5(a) and 5(b) show the accuracy of the classifers in the valence and arousal dimensions, respectively. Te accuracy of the SAETM and SVM networks are close to each other and average accuracy of SAETM and SVM networks are as much as in the valence dimension 83.3% and 82.7% and in the arousal dimension 82.8% and 74.8%, respectively. KNN and BN networks show the average accuracy equal to in the valence dimension 74.3% and 79.2% and in the arousal dimension 73.4% and 77.2%, respectively. Te SAETM method had the highest and the KNN network had the lowest accuracy. Tere is a signifcant diference between these two classifers SAETM and SVM (p > 0.01) and other classifers. Te loss of the proposed SAETM structure with respect to check the generalization of this network is presented in Figure 6. As shown, the SAETM has appropriate generalization on validation data and the maximum epoch is considered 200. Figure 7 shows a comparison of network performance with the Box-Whisker display in two dimensions, arousal (b) and valance (a). Each column corresponds to a classifer. Te highest accuracy is related to SVM and MLP classifers. Te MLP network was used for simplifying the structure of the SAETM. Te classifcation accuracy and the needed computational time for training an emotion recognition network are signifcant factors for building a new network structure. Te computational time taken by the SAETM, SVM, KNN, and BN networks for training are illustrated in Figure 8. Te BN has the highest computational time, while the KNN has the lowest value. Te SAETM reports less computing time than the BN and it is near to SVM.

Comparison of Diferent Feature Extraction Methods.
In this study, the SAE network was selected as the feature extraction method. Te SAE network was compared with PCA feature extraction method, nonlinear PCA method, and KLDA method to evaluate the selected feature extraction method. Figure 9 indicates the comparison of the classifer results for the 32 participants based on these three methods in the valence and arousal dimensions. Figure 9(a) demonstrates the Box-Whisker diagram of the results of comparing SAE networks in the valence dimension, and Figure 9(b) presents its arousal dimension with PCA, nonlinear PCA, and KLDA. Linear PCA method with an average accuracy of 75.3% in the valence and KLDA method with 73.2% in the arousal dimensions reported the least accuracy, and the SAETM reported 83.3% and 82.8% accuracy in both valence and arousal dimensions, respectively. Based on the results, SAE network has better performance compared to other networks (p < 0.01). Computational time for training the network with diferent feature extraction method is shown in Figure 10. Te highest value is related to KLDA method and the SAETM had the lowest computational time.
Some linear and nonlinear features of the EEG signal were used based on Table 1 in the designed SAETM algorithm. Te three modes were examined to evaluate the selected features. In the frst state, the network only trains with linear features. Te second state is the desired nonlinear features, and in the third state, the combination of linear and nonlinear features was evaluated. If the input of SAE networks was linear features, the accuracy of the network in the valence and arousal dimensions is 65.7% and 64.2%, respectively. Te network accuracy is 53.6% and 54.9%, respectively, by applying nonlinear features. Te accuracy of the network according to Figure 5 in the valence and arousal dimensions is 83.3% and 82.8%, respectively, if linear and nonlinear features are applied as inputs to SAE networks (SAETM). In addition, the F1 score for the SAETM in the valence and arousal dimensions, which is obtained from equations (11) and (12) (Precision and Recall concepts) is 81.8% and 80.3%, respectively, and the SVM network is 78.4% in the valence dimension and 72.7% in the arousal dimension. Terefore, using linear and nonlinear features together gives better results than the other two modes.

Comparisons for Combination of Classifers and Feature
Extraction Methods. Te result of accuracy comparison and computational time to combine common classifers and feature extraction methods is shown in Tables 3 and 4 respectively, and it is visible that the accuracy of combination of SVM classifer and NPCA feature extraction method in valance (78.04), and SVM classifer and KLD method in arousal (78.23) perform better than comparable methods reported (Table 3). On the other hand, computational time to train in valence and arousal space show that the combination of the KNN classifer and PCA feature extraction method, in the valence (452 seconds) and in the arousal (470 seconds), provides less computational time in comparison with others (Table 4).

Emotional Topographic Brain Mapping.
In this study, a brain topographic map is extracted by selecting the MLP network and assigning a colour appropriate to the weight of each node in this network (Figure 1). Figures 11(a) and 11(b) show the map use of the SAETM method and the common method for ten features of Table 1 while watching emotional video clips. Images obtained from sub-band power, mean, standard deviation, zero-crossing rate, fractal dimension, entropy, and correlation dimension features are observed separately in four emotion classes. Te right column in Figures 11(a) and 11(b) is the images from the SAETM algorithm. Te SAETM in four scales of high arousal-high valence, low arousal-high valence, high arousal-low valence, and low arousal-low valence, could create more separation for the border of active areas in the brain compared to common methods. Dark red shows the most brain activity and dark blue the least brain activity (Figure 11). In high arousal-high valence (1) in both Figures 11(a) and 11(b), the active regions in the frontal section are only in theta power and standard deviation and the images related to the two features of mean and zerocrossing rate are observed in the occipital region. Brain activity was high at relative entropy in the center of the head toward the frontal lobe. Brain activity in the three images was related to features such as theta power, relative entropy, and fractal dimension in the lower right hemisphere. Moreover, the images of relative entropy and correlation dimension in the left hemisphere indicate the lowest values of low brain activity. In the SAETM, the brain's activity in the frontal areas in the left hemisphere can be observed, along with its inactivity in the right hemisphere. Te active and inactive parts are separated from the center of the head and divided into right and left hemispheres. Most of the brain activity is in the left hemisphere towards the frontal. In low arousal-high valence (2), the active part of the brain is observed in the center of the head toward the left hemisphere in images of theta, alpha, gamma, and standard deviation. Te active part of the image has the maximum value of the beta power in the center of the head towards the frontal. Te occipital section is neutral or inactive in all images (2) except the correlation dimension image. In the next image, the correlation of the left hemisphere shows the highest brain activity at this scale. In the SAETM image, the frontal area shows the inactive areas of the brain at this scale and the head-to-back center bar shows the active area of the brain. In this network, the image is divided into two inactive and active parts from the middle of the head into two parts, including the front and the central bar of the head, respectively. In high arousal-low valence (3), the active parts in the images are theta, alpha, beta power, and mean in the frontal region to the right hemisphere. Images of zerocrossing rate and to some extent, the fractal dimension features show the highest brain activity in the right hemisphere. Te central area to the back of the head shows the activity of the brain at its lowest state in all images except the entropy feature. In the SAETM algorithm, the active and inactive parts of the brain are divided from the center of the head into right and left hemispheres. In this image, the frontal region is obtained in two fully active hemispheres. Finally, in low arousal-low valence (4), the frontal to central part of the head showed low brain activity in all images except alpha power and fractal dimension. Tree images of beta power, mean, and fractal dimension in the occipital region show brain activity. In the image of the SAETM, two active and inactive parts are divided from the center of the head to the front and back of the head, which shows the brain's activity in the occipital.

Comparison of the Resulting Topographic Maps.
Tere are several methods as numerical criteria rubric to compare the resulting topographic maps including the use of classifer networks and comparing network accuracy as a criterion for distinguishing network inputs (input images). Table 5 shows the results of using successful networks in image classifcation. As shown, the map classifcation results of the SAETM algorithm have the highest accuracy (0.8305 ± 0.02). In addition, the average accuracy of diferent classifcations on the images obtained from this network has the highest value (0.7613 ± 0.04). In the SAETM, the BN classifer has the lowest accuracy, which is equal to 0.6906 ± 0.12. Tis value is still higher than the average accuracy of the various classifers and the average accuracy for the alpha power is 0.5863, which is the highest accuracy after the SAETM. Terefore, the images obtained by the SAETM have more distinction than any of the common images. Te results obtained by Chao et al. [18] reported accuracy results as much as 0.6673 in the valence dimension and 0.6828 in the arousal dimension by creating an image by mapping the electrodes on a twodimensional matrix. Topic and Russo [3] evaluated the images obtained from CNN network on DEAP data and extracted features from the resulting images with an accuracy of 0.7630 in the valence dimension and 0.7654 in the arousal dimension. Te SAETM achieved accuracy of 0.8173 in the valence dimension and 0.8037 in the arousal dimension by CNN classifcation. In addition, the F1 score criterion for the SAETM in two dimensions of valence and arousal was 0.8031 and 0.7984, respectively. Table 6 demonstrates the accuracy of the CNN classifcation after watching ten music videos. Te accuracy of the CNN classifcation was 0.4874 after watching the frst video for SAETM. Te resulting image reported accuracy of 0.7923 after fve minutes and 0.8305 after ten minutes. According to Table 6, CNN network classifed the image from the SAETM after watching the ffth music video with the accuracy close to watching the tenth music video. Terefore, the SAETM produced a brain topographic map in a shorter time. Te best accuracy was obtained in the ninth or tenth minute in CNN network accuracy for ten other features during this time.

Quality Evaluation of the Resulting Maps.
To evaluate the quality of the resulting maps, 20 experts in the feld of topographic brain maps were asked to give a score of zero to ten via scale questionnaire to EEG maps extracted from the SAETM and maps obtained from the common methods. Te scale questionnaire is designed based on the rate of diferentiation and meaningfulness of the photos. Te results of ANOVA test show that topographic maps obtained from SAETM are preferable to common methods (P < 0.001) ( Figure 12). Te resulting maps well diferentiate the active areas in diferent parts of the brain while watching music videos from the rest time. Moreover, these maps show that the extracted topographic maps have spatial, temporal, and frequency information that would lead to more understanding of anatomical brain function. Terefore, topographic images, which contain rich spatial and functional   information about the brain, will lead discover more implications about humans. All software implementations were run on a Windows 10 64-bit workstation with an Intel Celeron 2.4 GHz and 4 GB of RAM.

Discussion
Electroencephalographic methodological issues have a high temporal resolution but low spatial resolution for locating the source. Te sensitivity of spatial resolution decreases as a function of the depth of neural sources. Terefore, the ability to detect deep brain generators that are vital to the production of emotions is still a matter of debate. Numerous EEG studies on emotion support the idea that the impact of deep sources such as the hippocampus, the amygdala, or basal ganglia can be reasonably determined despite relatively low signal strength using a variety of source analysis methods [30].
Due to the fact that in the generate of emotions, the EEG signal indicates the trigger of the deeper sources of the brain, topographic brain mapping as a feasible method allows us to study emotion with more details about the activity of brain areas. In our study, the features obtained for topographic mapping are a nonlinear combination of features used in conventional brain mapping. Terefore, the only common feature of the obtained map and common maps is the degree of participation of each area of the brain in emotional activity. To compare the obtained map with common maps, we investigate the degree of participation of brain areas in diferent emotions. Tere are several studies that show stimuli with relative valence afect the interhemispheric asymmetry within the prefrontal cortex [31], which results the development of the "hemispheric valence hypothesis" [32] and it states that high valence emotions are largely processed in the left frontal cortex and low valence emotions are largely processed within the right prefrontal cortex [33].  Figure 11: (a) Images obtained from ten features extracted from EEG signal (power for four sub-bands and mean) and images obtained from SAETM during viewing ten music videos. (b) Images obtained from ten features extracted from EEG signal (standard deviation, zerocrossing rate, fractal dimension, entropy, and correlation dimension) and images obtained from SAETM during the viewing period of ten music videos.  As it can be seen in Figures 11(a) and 11(b), SAETM map is clearly interhemispheric asymmetric and shows that arousal is associated with brain activity in the right posterior cortex and valence is associated with brain activity in the left frontal lobe, which is supported by Rogenmoser et al. [34]. Te relative diferences in interhemisphere asymmetry between high and low valence conditions, were investigated and the results of Kolmogorov-Smirnov (ks) tests show signifcant diferences (p < 0.01). We also investigate the dynamics of interhemisphere asymmetry by applying the Shannon entropy of the extracted maps (10 minute) for diferent valence in trials. A signifcant diference was found (p < 0.01). Te results show interhemisphere asymmetry refects activity in subcortical brain regions. Specifcally changes in prefrontal asymmetry are known to be related with amygdala and cerebellum. As the SATEM map shows frontal asymmetry is well refected in high valence-high arousal condition as well as supported by Hamann [30]. As depicted in last row of Figure 11, in low arousal-low valence condition, asymmetry relates to frontal-occipital and it can be most likely related to visual processing activity rather than emotional activity. Furthermore, this can be observed in high valence-low arousal, however, frontal asymmetry is also observed to some extent. Terefore, we can conclude that low arousal stimuli do not cause great deal of frontal asymmetry. In addition, as can be seen in higharousal stimuli, SAETM map is asymmetric in left and right hemispheres.
According to the results, the following items were evaluated to test the hypotheses of this study.  (ii) Te use of linear and nonlinear features was expected to provide better representations of the signal to the classifer due to the nonlinear nature of the EEG signal. Te accuracy of the network was evaluated in three modes of using only linear features, nonlinear features, and fnally the use of linear and nonlinear features, according to which the choice of linear and nonlinear features increased the accuracy of the network. (iii) Te optimal number of neurons in each hidden layer for each SAE network was calculated based on the SLF. For example, ten extracted features are compressed into seven and fnally into four features in the F3 channel. (iv) Te accuracy of the SAETM for classifying the four classes of emotions is a parameter for evaluating the choice of feature extraction method. SAE networks can correctly select features due to their deep structure and the accuracy calculated in the valence dimension (83.3%) and in the arousal dimension (82.8%). (v) Extracting the topographic maps of the SAETM was used in this study and the results were compared quantitatively and qualitatively with common maps. Te accuracy of maps classifers as a criterion for quantifying image diferentiation indicated that CNN has the highest accuracy on maps from the SAETM (0.8305 ± 0.02). Qualitative evaluation of maps by the experts showed that maps obtained from the SAETM are signifcantly diferent from common maps. (vi) Features extracted from the SAETM produced maps in less time than a single feature. CNN classifed maps with more than 79% accuracy fve minutes after the signal. Tis result showed that the speed of user recognition increases by enhancing speed of image production.
Finally, the limitations in current work and further work may include the following: (i) Te SATEM emotion classifer presents in this study is designed by the classifer paradigm. In future studies, we propose that the network structure of the  SAE be formed in an automatic manner, as well as the network structure based on the criteria and quantitative methods for generating topographic maps with the highest distinction. (ii) Te performance of the SATEM has been undermined when data are limited. Te potential reason is that the deep models require large size of data samples. On the other hand, considering that stacked autoencoders have the ability to extract deep features in the data, it is suggested to use raw EEG signal instead of the features that used in this study for SAE input to retain the spatial characteristics of EEG signals as much as possible. (iii) Since topographic maps provide rich information in the diagnosis of mental disorders, other directions deserving of exploration in future works include implemented on more datasets especially for mental disorders and functional network analysis based on the decoded hidden features. Moreover, the authors suggest the simultaneous fMRI and EEG to investigate the relationship between the obtained maps and the deeper sources of the brain.

Conclusions
In this study, we proposed and implemented a stacked autoencoder network, which creates novel emotional topographic EEG brain maps. Tis deep learning approach aimed to extract EEG maps with higher diferentiation than common maps. Tis method combines EEG features commonly used in emotion studies to extract richer features in a supervised emotion classifcation framework. In addition, the accuracy of the classifer was considered as a criterion for optimal feature combination. Terefore, the obtained map is considered as the optimal map in terms of diferentiating between diferent emotional states. Performance of the algorithm was approved by the quantitative and qualitative evaluation of classifer accuracy and emotional EEG maps extracted from DEAP database. Te results obtained in this study show that the proposed method has an acceptable ability to create topographic brain maps with more diferentiation than conventional EEG maps. It also allows us to better understand the involvement of diferent areas of the brain in emotional activities with the state-ofthe-art deep learning models.

Consent
Te data were recorded with the written consent of the participants.

Conflicts of Interest
Te authors declare that there are no conficts of interest regarding the publication of this paper.