Artificial Neural Network Classification of Motor-Related EEG: An Increase in Classification Accuracy by Reducing Signal Complexity

,


Introduction
The development of brain-computer interfaces (BCIs) is a very challenging and important task of neuroscience and neurotechnology.The BCIs are highly demanded in many fields of science and technology, including medicine, high technology, and industry [1][2][3][4].The most striking examples of possible BCIs' application are rehabilitation of patients with cognitive and motor disabilities, assessment of consciousness, communication, mind-controlled exoskeletons, manipulators, robots and other complex technical devices [4][5][6], human education using BCI with biological feedback, and so on.
Usually, BCI is based on the analysis of noninvasive electroencephalography (EEG) signals recorded by electrodes placed on skin surface of a head.EEG is a widespread inexpensive method for brain research which gives a deep insight into brain functionality related to various human activities.However, the treatment of multichannel EEG signals is a very sophisticated task because they are nonstationary, high-dimensional, and extremely noisy [7,8].All these factors make difficult the recognition and classification of specific motor-related or percept-related patterns in a single trial mode [9,10] and require extensive statistical measures.
From a practical point of view, the development of realtime compact BCIs and consumer headsets requires the reduction of the number of EEG channels to an optimal set, which would contain necessary information about underlying brain processes [11,12].Such reduction is aimed to minimize the size of the ANN structure and decrease the computation cost and memory volume for obtained data.Furthermore, some researchers emphasize that irrelevant EEG channels may add extra noise and redundant information that can reduce signal processing accuracy [12].
Among existing approaches for EEG data analysis (e.g., time-frequency analysis [13] and methods of nonlinear dynamics [14]), the most promising and effective tools for classification of single EEG trials are based on artificial neural networks (ANNs) [12,15,16].The successful application of ANNs requires careful selection of their parameters, which can significantly vary depending on a particular task and different subjects [17].Therefore, the optimization of EEG input data (dimensionality reduction, filtering, etc.) and channel selection is one of the key problems for the development of efficient ANN-based BCIs.Traditional methods of dimensionality reduction include principal component analysis (PCA) and linear discriminant analysis (LDA), where the original features are mathematically projected onto a lower dimensional space.However, such methods are nongeneric and require the input data optimization for every subject due to strong intersubject variability [8] and a lack of association of ongoing optimization with physiological processes in the brain.These problems are particularly relevant for untrained subjects [8] and create difficulties for the development of a universal BCI.
Indeed, many BCI studies involve specially trained subjects, since the classification of brain activity patterns during motor imagery of untrained subjects is significantly more difficult and hence poorly studied [18,19].Although the training is able to increase severity of EEG features and makes the recognition process easier for ANN-based algorithms [20], it cannot be effectively used for patients with motor and mental disabilities [16,21].Therefore, the creation of a universal BCI enable to work with untrained subjects would be useful for motor rehabilitation of such patients.Recent research reveals the possibility of performing classification of motor imagery EEG patterns of untrained volunteers, but only for healthy subjects, who can control their limbs.However, this is a very serious problem for paralyzed patients with motor system pathologies due to their inability to imagine the movement [22].Besides, the comparison of the motor imagery response in brain activity between trained and untrained subjects reveals significant differences.In particular, BCI-naïve subjects exhibit activation in the dorsolateral prefrontal cortex, and right and left insula, not detected in BCI-trained subjects [23].
Currently, one of the most important tasks in neuroscience and neurotechnology is the development of effective and universal methods for optimizing input data, in particular, by reducing signal complexity, for further processing with ANNs.
A promising approach to solving the above problems is the optimization of the input dataset based on the knowledge of the laws of the processes occurring in the brain when making some action, such as motor imagery.The simplest and intuitively clear method for the feature space reduction is a decrease of the number of EEG channels, basing on the time-frequency analysis.In general, the analysis of the time-frequency structure of multichannel EEG allows the brain areas detection, where a significant increase or a decrease in the energy of particular brain rhythms reflects motor activity or motor imagery (event-related synchronization/desynchronization) [19].
Thus, in this paper, we focus on the development of an efficient classification algorithm.It should be noted that in the case of supervised learning algorithms, the classification performance strongly depends on the dataset used for training.The training dataset must be balanced and representative to provide a good generalization ability of ANN.Here, we propose an approach for optimization of input dataset based on the high-pass filtration of input EEG data with different cutoff values and the selection of particular EEG channels, with the aim to detect the most effective spatial EEG configuration to obtain maximum classification accuracy.At the same time, it is known that different types of human activity cause responses in different cortical areas.Therefore, the second aim is to study the influence of the number of analyzed EEG channels (or electrodes) on the quality of leg motor imagery recognition and the optimization of the electrode selection.
It should be noted that the considering development of the methods for EEG patterns recognition associated with imaginary leg movements is of crucial importance for creation of BCIs which would help in therapy of patients with various motor disorders after trauma or stroke by using prostheses, exoskeletons, or anthropomorphic robots.
The paper structure is as follows.In Materials and Methods, we describe the design of our experiment, provide information about participants and equipment, and give insight into methods of preprocessing and channel selection and classifiers used for numerical analysis.In Results, we first propose the optimal structure of ANN and the optimal strategy of training set selection in order to obtain maximal classification accuracy.Then, we run the classifier for different combinations of EEG channels.Next, we apply the high-pass filtration to input dataset with different cutoff values.Finally, we discuss and generalize the obtained results in Conclusions.At the second experimental stage, the subjects performed tasks according to text commands appeared on the screen.We used the "BenQ" monitor with a 1920 × 1080 resolution and a 60 Hz screen refresh rate.At the same time, we placed on the screen an image of a person in the reclining position raising his leg as follows.The leg was in a free state, slightly bent at the knee, the foot also freely extended the leg line, and no special movement was performed to pull the toes up or forward.The leg rises in the hip joint up to an angle of 40-45 degrees.

Materials and Methods
Each subject participated in one experiment lasting about 30 minutes during which he/she had to perform two types of tasks: (i) Real movement of left/right leg (raising a leg in a hip) (ii) Imaginary movement of left/right leg The real movements in the first task were performed in order to make the subjects clearer how exactly they should imagine the movement by performing the second task.Each task proceeded by a whistle signal and followed by pauses of random durations (5-10 seconds).Thus, the second stage included two types of real and two types of imaginary movements, in particular, real movements of legs, both left and right, and imaginary movements of the same limbs.For the motor imagery tasks, pause durations were increased (from 8 to 18 seconds).In addition, for motor imagery tasks, a photo of "exemplary" movement performance was not demonstrated.After the tasks were completed, the EEG of the passive wakefulness state was recorded during 5 minutes.
The multichannel EEG was recorded at a 250 Hz sampling rate from P = 31 electrodes with two reference electrodes placed at the standard ear positions of the extended 10-10 international system (see Figure 1(b)) [24].To register the EEG data, we used a cup with Ag/AgCl electrodes placed on the "TIEN-20" paste.Immediately before placing the electrodes, the head skin was rubbed with abrasive gel "NuPrep" for increasing skip conductivity.Training and testing of the ANN were performed for every subject using two datasets containing 6000 points each (24 seconds of recorded EEG) for imaginary movements of the left and right feet.Each dataset consisted of the combination of eight 3 s EEG trials corresponding to a particular movement for every subject.Half of the datasets, chosen at random, were used to train the ANN, and the remaining half to test it.
The classification was carried out with the help of ANN trained on back propagation algorithms.For each subject, the ANN training process was carried out anew.

Complexity
The ANN initial parameters were chosen taking into account the following considerations.The number of ANN inputs was equal to the number of EEG channels.The number of neurons in the output layer was one, because the output can only be 0 or 1.Initially, the minimum number of neurons in the hidden layer was chosen to be 5.Further training of such a network was conducted by monitoring the control error and verifying the classification result to reduce the error.If the control error decreased as compared with the previous step, the number of neurons in the hidden layer was increased, and the above procedure was repeated.This was done until both the training and the control errors saturated at low enough values, which barely decreased when more neurons were added to the hidden layer.  1 contains detailed information about each channel combination according to the channels' position on the human head (see also Figure 1(b)).

ANN-Based Classifiers.
ANNs are widely used for processing neurobiological signals extracted by various methods including MEG and EEG.The most common technology to detect various kinds of brain activity, both normal and pathological, is based on EEG recordings [25,26], although recently Wu et al. [27] introduced a new approach for MEG data classification using a support vector machine (SVM) with a radial basis kernel function, which was shown to be an effective method for right and left temporal lobe epilepsy recognition.In the present paper, we analyze different types of ANNs in order to reveal most convenient configurations.Here, we implement machine learning algorithms for the analysis of multichannel EEG signals, designed on the base of the MATLAB package containing ANN methods.
The conducted analysis revealed that the fastest and accurate recognition of motor imagery EEG patterns can be achieved with the following ANN configurations: We also used a linear network (LN) for more representative results which demonstrated how the ANN operated with complex nonlinear data.The LN is the simplest model which consists of one input layer and one output layer with a linear activation function.Although this model is capable of solving simple classification tasks, the recognition of nonlinear data requires additional hidden layers with nonlinear activation functions given by a multilayer perceptron model.

Time-Frequency
Wavelet-Based Analysis.The timefrequency analysis is based on the continuous wavelet transform [28,29].
where parameters a and τ characterize the scale and translation of wavelet function ψ, and x t is the analyzed EEG with ω 0 = 2π being the central frequency of the Morlet and i = −1.
The wavelet energy spectrum E t, f = W 2 t, f is calculated in the frequency band f ∈ 1, 30 Hz (f = 1/a).For each EEG channel, the wavelet energy spectra E R f and E L f associated, respectively, with right leg and left leg motor imagery are calculated by averaging E t, f over the indicated frequency band and over each experimental session, (RE), (IM), or (BCG) as In the frequency ranges of δ-band (1-5 Hz), μ/α-band (8-13 Hz), and β-band (15-30 Hz), the energy values E R,L δ , E R,L μ , and E R,L β are calculated for each EEG channel by averaging spectrum E R,L f over the corresponding frequency band as Finally, for each band, the differences between the energy values E Rδ − E Lδ , E Rμ − E Lμ , E Rβ − E Lβ associated with right leg and left leg motor imagery are calculated.

Results
In our study, EEG signals were obtained from 12 subjects via the set of 31 recording electrodes.At the first stage, the ANN input was presented in a vector form of N = 31 dimension (x1, … , xN) (see Figure 1(c)).The EEG trials were classified into two groups (left leg imagery and right leg imagery) with the help of ANNs with different configurations: SVM, MP, RBF, and LN (see Materials and Methods for detailed description of the ANNs structure).
In Figure 1(a), the classification accuracy of each network was calculated for all 31 EEG channel.The data were averaged over all subjects and shown as mean ± SD.One can see that network of linear neurons did not exhibit significant performance with accuracy less than 65% for most subjects.At the same time, the results obtained for SVM, RBF, and MLP demonstrated averaged classification accuracy of 76.5%, 77.9%, and 72.4%, respectively.Having compared these ANN architectures, one can find RBF to be the most optimal architecture, which classification accuracy significantly exceeded the accuracy rates of both SVM and MLP (n = 12, * P > 0 05 via paired sample t-test).
The demonstrated accuracy score of 77.9% was achieved for a nonoptimized input, that is, for the whole set of EEG channels containing oscillations in a wide frequency range.However, previous studies show that if one takes into account all possible features of a multichannel EEG for the classification task, the results have an extremely highdimensional feature space that significantly increases input complexity and decreases the accuracy rate.According to this observation, here, we propose to decrease the input feature space basing on spatial and frequency representations of the motor-related EEG.
In order to reduce the number of EEG channels, we analyze the RBF-based accuracy rate obtained for different predefined sets of channels (see Materials and Methods for detailed description of predefined channels' combinations).Having compared the results of such classification, we optimize the channels' combination to obtain satisfactory classification accuracy using a small number of electrodes.
In Figure 2(a), the values of classification accuracy are shown for 9 most representative configurations (see Table 1 for the description of all considered configurations).Figure 2(b) shows the number of the channels belonging to each configuration.In Figure 2(c), the marked brain areas show the regions where the recording electrodes are located.One can see that the most accurate result is obtained using combination S 1 which corresponds to full placement (31 electrodes) (see Figure 2(a)).At the same time, despite the best recognition performance, we cannot consider this combination as optimal due to a large number of channels (see Figure 2(b)).
The recognition in right and left hemispheres (S 2 and S 3 , resp.) does not show significant results.The reason of poor performance of RBF in these areas can be the fact that motor imagery causes the response in remote brain areas; thus, the best recognition score can be obtained using the combination of the electrodes which location is capable to catch this interaction.With this aim, we consider S 5 corresponding to the combination of frontal and temporal lobes (F + F p + T).One can see that among other channels' combinations (except for S 1 ), S 5 provides the best recognition quality.
One can see that frontal lobe covers the largest brain area, and its combination with temporal lobes still contains too many electrodes.Considering these areas separately, we can note S 9 as the most appropriate choice due to a smaller number of channels (8 electrodes versus 12 in S 5 ) and about the same level of the classification score.It should be noted that frontal lobe is strongly associated with motor activity (e.g., walking), decision making, and many other important cognitive and emotional aspects [30,31].This result is in agreement with the previous research [19,32], where the time-frequency analysis revealed highly pronounced arm's motor imagery events in event-related desynchronization of delta band in frontal cortex.
Finally, similarly to [19], we carried out the timefrequency analysis of motor-related EEGs in order to find brain areas, where the neural dynamics exhibit the most pronounced differences during right and left leg imagery.The time-frequency analysis was performed with the help of wavelet decomposition in three frequency bands: delta (1-5 Hz), alpha (8-12 Hz), and beta (15-30 Hz) (see Material and Methods for detailed description).In every band, the wavelet energy was calculated for each movement type 5 Complexity by its averaging over all corresponding EEG trials.In Figure 3(a), the differences between the energy values corresponding to right and left hand imagery are painted by different color.One can see that in the beta frequency band (15-30 Hz), the difference is homogeneously distributed over the cortex, and it is difficult to find the region where such differences are most pronounced.In the 8-12 Hz range, the maximal difference is achieved for frontotemporal area    Complexity (in our notation, for channel combination S 5 ), while the minimal difference is obtained in central and parietal areas (combinations of S 7 and S 13 ) that coincide with premotor cortex location [33].Finally, in the low-frequency band (1-5 Hz), one can observe the most pronounced differences in frontal lobe (S 9 ), central lobe (S 11 ), and occipital lobe.In the cases of alpha and delta rhythms, one can distinguish the most pronounced difference in the right hemisphere.Such features of the time-frequency EEG structure affect the ANN performance.In Figure 3(b), the histograms show classification accuracy (mean ± SD) achieved via the RBF network for different types of input EEG: nonfiltered EEG and filtered with cutoffs at f c 1 = 4 Hz and f c 1 = 15 Hz.One can see that in the case of 31 EEG channels (S 1 ), the exclusion of spectral components above 15 Hz leads to an increase of classification accuracy (from 76% to 82%).Instead, in the case of smaller number of channels, an increase of classification accuracy for 15 Hz filtration becomes smaller (from 73% to 77% for frontal EEG (S 9 ) and from 70% to 73% for parietal and central EEG (S 7 )).For the case of parietal EEG, where the analysis of wavelet energy averaged over 8-12 Hz does not reflect changes between left and right hand movements, the f c 1 filtration does not lead to an increase of classification accuracy.
Having considered the classification accuracy obtained for EEG filtered with cutoffs at f c 1 = 4 Hz (i.e., spectral components above 4 Hz are excluded), one can see a further increase of classification accuracy for all channel combinations.
The obtained results evidence the correlation between the performance of ANN-based classification and features of EEG signals in both spatial and frequency domains.The extraction of such features by analyzing EEG in group of participants and its use for preprocessing input data allows a significant increase (from 72% to 90% for frontal EEG) (n = 12, * P < 0 01 via paired sample t-test) to the classification accuracy of single EEG trials in all subjects in the group.

Discussion
The classification of EEG trials associated with motor imagery using artificial neural networks is a widely explored research topic by many researchers in different fields of science [34].In terms of artificial intelligence, it is essential to reveal how artificial neural networks establish complex dependencies in nonlinear and nonstationary signals in order to reach significant progress in the development of ANNbased systems.Along with classification of motor-related EEG, it is also especially important to classify other types of brain activity, such as epilepsy patterns [35], sleep stages [36], and mental disorders [37].
In the classification problem of motor-related EEG algorithms, ANNs demonstrate high (more than 90%) classification accuracy.At the same time, an effective use of such classifiers requires fine adjustment of the network parameters with account for individual features.In order to minimize the individual variability, in the present work, we have optimized the ANN based on the EEG data of 12 untrained volunteers.
At the first step, we have compared different ANN structures and achieved the accuracy rate of 78 ± 10% for radial basis function (RBF) network and 76 ± 12% for support vector machine (SVM) in the case of the 31-channel input.It is known that SVM is considered as the most promising tool for classification of single EEG trials [38].In particular, Ma et al. [39] described SVM as a technique which allowed to solve the problems associated with small sample sizes and high dimensions and could achieve classification accuracy above 83.5%.In this context, the best performance has been obtained for SVM with nonlinear kernel based on radial basis function (RBF-SVM).It was shown that while the use of linear SVM with spatial and temporal principal component analysis (PCA) demonstrated 73% accuracy [40], RBF-SVM allowed to reach up to 93% accuracy in combination with independent component analysis (ICA) [41] and 81% in combination with genetic algorithm (GA) [42,43].The radial basis function (RBF) neural network architecture was applied by Barios et al. [44] to classify patients with chronic renal failure and demonstrated 86.6% accuracy without optimization.Later, Pei et al. [45] with improved RBF network demonstrated 87.14% accuracy in classification of left and right hand motor imagery tasks.More recently, Hamedi et al. [46] compared the performance of RBF with SVM and multilayer perceptron (MLP) in the classification task of motor-related EEG.As a result, the RBF network demonstrated the classification accuracy much higher than MLP, while the SVM-RBF accuracy was 3% greater.
The analysis of scientific literature allows us to conclude that the initial accuracy rate of the classification of motorrelated EEG reaches 80% when the optimization is not applicable, and effective optimization algorithms can increase accuracy up to 95%.Such a decrease of the accuracy in the case of nonoptimized EEG is caused by an extremely highdimensional feature space of input data and reported not only for motor-related tasks.In particular, Hagmann et al. [47] revealed that 200 hours of single-channel EEG recording contains 12% noise leading to erroneous classification.Furthermore, 80% of the EEG features turned out irrelevant in the case of Alzheimer's disease diagnosis [48].
It should be noted that optimization algorithms are mostly based on the mathematical projection of original features onto a lower dimensional space.The basic methods for the feature selection are based on filter, wrapper, and hybrid approaches [49].Genetic algorithms based on ANNs are effectively used for feature optimization of biological signals, such as electroencephalography (EEG) and electrocorticography (ECoG).For instance, a genetic algorithm was used by Li et al. [43] for optimization of the input channels combination for the MLP-based neural network and relevance evaluation of each EEG channel to a current task.It was revealed that the channel selection provides better understanding of results obtained by the classifier.Another method of EEG data selection for classification was proposed by Tomida et al. [11].The method was based on the estimation of true covariance matrices of each motor imagery task.In another study, Sreeja et al. [50] revealed that selection of 30 electrodes placed on premotor cortex, supplementary motor cortex, and primary motor cortex in combination with 7 Complexity preprocessing provides up to 95% accuracy of motor imagery EEG classification.
Despite the demonstrated possibility of optimization techniques to significantly increase classification accuracy, they strongly depend on initial data features, which along with motor-related features include other patterns related to individual subject characteristics that require preliminary calibration of the classifier for each subject.According to this fact, in our study, we propose the optimization method based on the motor-related EEG features based on the spatiotemporal and time-frequency EEG analysis in the group of subjects.Such approach allowed us to reach up to 90 ± 5% classification performance with only 8 electrodes by using an optimal set of EEG spectral components.Recently, Yang et al. [12] reported the 80% accuracy using 10 channels and 86% for 6 channels, obtained with a MLP-based neural network using a genetic algorithm.In addition, Tam et al. [51] achieved highest average accuracy rate of 90% for 8 channels using a spatial filtering method.It is worth mentioning that Arvaneh et al. [52] proposed a novel sparse common spatial pattern (SCSP) algorithm for optimization and obtained SVM-based classification accuracy of 81.63% using 13 channels.
Thus, the level of classification accuracy obtained with our approach is higher than that achieved with the help of other optimization algorithms.At the same time, since our method is based on the time-frequency and spatiotemporal EEG features, it is valid for all subjects, and therefore, its accuracy is much less affected by individual variability.

Conclusions
We have applied artificial neural networks for recognition and classification of single EEG trials associated with right and left leg motor imagery in untrained volunteers.By focusing on optimization of classification accuracy, we have reduced complexity of input data.In the context of optimization, we have made the optimal selection of both a set of EEG channels and a frequency band with the help of preliminary analysis of spatiotemporal and time-frequency EEG features that allowed us to reach up to 90 ± 5% classification accuracy using 8 electrodes only.We have compared our results with the results recently obtained using other optimization algorithms (e.g., genetic algorithm, common spatial pattern optimization, and filtering method) and shown that our approach (i) yields higher accuracy than other methods and (ii) is valid for all subjects, and therefore, the accuracy is not affected by individual feature variability.
The developed method is universal because its accuracy is almost independent of the subject's personality.We believe that our approach can be used to increase efficiency of brain-computer interfaces (BCIs) designed for untrained subjects or a group of subjects.
Usually the impedance values varied within 2-5 kΩ.The EEGs were recorded with the electroencephalograph "BE Plus LTM" (EB Neuro SPA) which possessed the registration certificate number FSZ 2011/10629 of 20.09.2011 from the Russian Federation Federal Service of Health Care and Social Development Control.This equipment complies with the following certificates: UNI EN ISO 9001/ISO 9001:2008, EN 46001 ISO 13485:2012, QSR 21 CFR Part 820 Federal Law. x

Figure 1 :
Figure 1: (a) Classification accuracies for support vector machine (SVM), multilayer perceptron (MP), radial basis function (RBF), and linear network (LN) averaged over all subjects; (b) position of electrodes according to extended 10-10 international system on human head; and (c) general model of ANN, where each input neuron x i i = 1, 2, … , n receives data from one of n = 31 electrodes; N li and N ki are neurons of hidden layers, and x n+1 is output neuron.The horizontal bars with asterisk show that RBF classification accuracy significantly exceeded the accuracy rates of both SVM and MLP according to the statistical analysis using paired t-test.

2. 3 .
Preprocessing and EEG Channel Selection.The recorded data were low-pass filtered with cutoffs at f c 1 = 4 Hz and f c 2 = 15 Hz.Table (i) Radial basis function (RBF) network with 251 neurons in hidden layer, 31 input and 1 output linear neurons (ii) Multilayer perceptron (MLP) with one hidden layer consisted of 15 neurons with hyperbolic tangent as an activation function, 31 input linear neurons and one output neuron with logistic activation function (iii) Support vector machine (SVM-RBF) with nonlinear kernel based on radial basis function with values 0.01 < γ < 0.1 and 2000 support vectors in summary (1000 for each class)

Figure 2 :
Figure 2: (a) Radial basis function (RBF) classification performance for different brain areas, averaged over all subjects; (b) number of channels in each combination S i (I = 1, … , 9).For detailed information about combinations, see Table1; and (c) brain areas used for motor imagery classification.
Figure 2: (a) Radial basis function (RBF) classification performance for different brain areas, averaged over all subjects; (b) number of channels in each combination S i (I = 1, … , 9).For detailed information about combinations, see Table1; and (c) brain areas used for motor imagery classification.

Figure 3 :
Figure 3: (a) Difference between the values of wavelet energy associated with right leg and left leg motor imagery.The data are averaged over all subjects (n = 12) and shown as mean ± SD; (b) RBF network classification performance for different brain areas and different filtrations applied to input EEG (n/f: without filtration; f c 2 : spectral components above 15 Hz are removed; f c 1 : spectral components above 4 Hz are removed).The data are averaged over all subjects.