The purpose of this study is to classify EEG data on imagined speech in a single trial. We recorded EEG data while five subjects imagined different vowels, /a/, /e/, /i/, /o/, and /u/. We divided each single trial dataset into thirty segments and extracted features (mean, variance, standard deviation, and skewness) from all segments. To reduce the dimension of the feature vector, we applied a feature selection algorithm based on the sparse regression model. These features were classified using a support vector machine with a radial basis function kernel, an extreme learning machine, and two variants of an extreme learning machine with different kernels. Because each single trial consisted of thirty segments, our algorithm decided the label of the single trial by selecting the most frequent output among the outputs of the thirty segments. As a result, we observed that the extreme learning machine and its variants achieved better classification rates than the support vector machine with a radial basis function kernel and linear discrimination analysis. Thus, our results suggested that EEG responses to imagined speech could be successfully classified in a single trial using an extreme learning machine with a radial basis function and linear kernel. This study with classification of imagined speech might contribute to the development of silent speech BCI systems.
People communicate with each other by exchanging verbal and visual expressions. However, paralyzed patients with various neurological diseases such as amyotrophic lateral sclerosis and cerebral ischemia have difficulties in daily communications because they cannot control their body voluntarily. In this context, brain-computer interface (BCI) has been studied as a tool of communication for these types of patients. BCI is a computer-aided control technology based on brain activity data such as EEG, which is appropriate for BCI systems because of its noninvasive nature and convenience of recording [
The classification of EEG signals recorded during the motor imagery paradigm has been widely studied as a BCI controller [
Obviously, for the BCI system, the use of optimized classification algorithms that categorize a set of data into different classes is essential, and these algorithms are usually divided into five groups: linear classifiers, neural networks, nonlinear Bayesian classifiers, nearest neighbor classifiers, and combinations of classifiers [
The extreme learning machine (ELM) is a type of feedforward neural network for classification, proposed by Huang et al. [
In this study, we measured the EEG activities of speech imagination and attempted to classify those signals using the ELM algorithm and its variants with kernels. In addition, we compared the results to the support vector machine with a radial basis function (SVM-R) kernel and linear discriminant analysis (LDA). As far as we know, applications of ELM as a classifier for EEG data of imagined speech have been rarely studied. In the present study, we will examine the validity of using ELM and its variants in the classification of imagined speech and the possibility of our method for applications in BCI systems based on silent speech.
Five healthy human participants (5 males; mean age:
Participants were seated in a comfortable armchair and wore earphones (er-4p, Etymotic research, Inc., IL 60007, United States of America) providing auditory stimuli. Five types of Korean syllables—/a/, /e/, /i/, /o/, and /u/, as well as a mute (zero volume) sound—were utilized in the experiment. Figure
Schematic sequence of the experimental paradigm. Vowels /a/, /e/, /i/, /o/, /u/, and mute were randomly presented 1 s after the beginning of each trial. After the third beep sound, the subject imagines the same vowel heard at the beginning of the trial. The EEG data acquired during the speech imagination period were used for signal processing and classification in this study.
The experimental procedure was designed with e-Prime 2.0 software (Psychology Software Tools, Inc., Sharpsburg, PA, USA). A HydroCel Geodesic Sensor Net with 64 channels and Net Amps 300 amplifiers (Electrical Geodesics, Inc., Eugene, OR, USA) were used to record the EEG signals, using a 1000 Hz sampling rate (Net Station version 4.5.6).
First, we resampled the acquired EEG data into 250 Hz for fast preprocessing procedure. The EEG data was bandpass filtered with 1–100 Hz. Sequentially, an IIR notch filter (Butterworth; order: 4; bandwidth: 59–61 Hz) was applied to remove the power line noise.
In general, EEG classification has problems in terms of poor generalization performance and the overfitting phenomenon because the number of samples is much smaller than the dimension of the features. Therefore, to obtain enough samples for learning and testing the classifier, we divided each imagination trial for 3 s into 30 time segments with a 0.2 s length and 0.1 s overlap. Therefore, we obtained a total of 9000 segments = (6 (conditions)
Overall signal processing procedure for classification. First, each trial was divided into thirty blocks with a 0.2 s length and 0.1 s overlap. Mean, variance, standard deviation, and skewness were extracted from all blocks and channels. Sequentially, sparse-regression-model-based feature selection was employed to reduce the dimension of the features. All features were used as the input of the trained classifier. Because each trial includes thirty blocks, thirty classifier outputs were acquired; therefore, the label of each trial was determined by selecting the most frequent output of the thirty classifier outputs.
Tibshirani developed a sparse regression model known as the Lasso estimate [
The column vectors in
Conventional feedforward neural networks require weights and biases for all layers to be adjusted by the gradient-based learning algorithms. However, the procedure for tuning the parameters of all layers is very slow because it is repeated many times, and its solutions easily fall into local optima. For this reason, Huang et al. proposed ELM, which randomly assigns the input weights and analytically calculates only the output weights. Therefore, the learning speed of ELM is much faster than conventional learning algorithms and has outstanding generalization performance [
In this paper, the activation function
We computed the time-frequency representation (TFR) of imagined speech EEG data for every subject to identify speech-related brain activities. TFR of each trial was calculated using a Morlet wavelet and averaged over all trials. Among the five subjects, we plotted TFRs of subjects 2 and 5 which showed notable patterns in gamma frequency. As shown in Figure
Time-frequency representation (TFR) of EEG signals averaged over all trials for subjects 2 and 5. The EEG signals were obtained from eight electrodes in the left temporal areas during each of the six experimental conditions (vowels /a/, /e/, /i/, /o/, /u/, and mute). The EEG data were bandpass filtered with 1–100 Hz, and a Morlet mother wavelet transform was used to calculate the TFR. The TFRs are plotted for the first 2 s after final beep sound and for the frequency range of 10–70 Hz.
Topographical distribution of gamma activities during vowel imagination for subject 5. Increased activities were observed in both temporal areas when the subject imagined vowels. Time interval for the analysis is 0–3 sec.
Figure
Averaged classification accuracies over all pairwise classification using a support vector machine with a radial basis function kernel (SVM-R), extreme learning machine (ELM), extreme learning machine with a linear kernel (ELM-L), and extreme learning machine with a radial basis function kernel (ELM-R) for all five subjects.
Table
Classification accuracies in % employing SVM-R, ELM, ELM-L, ELM-R, and LDA for subject 2. The highest classification accuracy among the four classifiers is marked in bold for pairwise combination. Classification accuracies are expressed as mean and associated standard deviation. SVM-R, ELM, ELM-L, ELM-R, and LDA denote the support vector machine with radial basis function, extreme learning machine, extreme learning machine with a linear kernel, extreme learning machine with a radial basis function, and linear discriminant analysis, respectively.
Classifier | /a/ versus /e/ | /a/ versus /i/ | /a/ versus /o/ | /a/ versus /u/ | /e/ versus /i/ | /e/ versus /o/ | /e/ versus /u/ | /i/ versus /o/ | /i/ versus /u/ | /o/ versus /u/ | /a/ versus /mute/ | /e/ versus /mute/ | /i/ versus /mute/ | /o/ versus /mute/ | /u/ versus /mute/ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SVM-R | | | | | | | | | | | | | | | |
ELM | | | | | | | | | | | | | | | |
ELM-L | | | | | | | | | | | | | | | |
ELM-R | | | | | | | | | | | | | | | |
LDA | | | | | | | | | | | | | | | |
Table
Classification accuracies in % employing ELM-R for the pairwise combinations, which shows the top five classification performances for each subject. Classification accuracies are expressed as mean and associated standard deviation.
Subjects | |||||
---|---|---|---|---|---|
S1 | | | | | |
(/a/ versus /i/) | (/a/ versus mute) | (/a/ versus /u/) | (/a/ versus /e/) | (/i/ versus /o/) | |
S2 | | | | | |
(/a/ versus /i/) | (/i/ versus mute) | (/i/ versus /o/) | (/a/ versus mute) | (/o/ versus mute) | |
S3 | | | | | |
(/e/ versus mute) | (/i/ versus mute) | (/u/ versus mute) | (/o/ versus mute) | (/a/ versus /i/) | |
S4 | | | | | |
(/i/ versus mute) | (/u/ versus mute) | (/a/ versus mute) | (/e/ versus mute) | (/o/ versus mute) | |
S5 | | | | | |
(/e/ versus mute) | (/i/ versus mute) | (/o/ versus mute) | (/a/ versus mute) | (/u/ versus mute) |
Table
Confusion matrix for all pairwise combinations and subjects using ELM, ELM-L, ELM-R, SVM-R, and LDA.
Classifiers | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
ELM | ELM-L | ELM-R | SVM-R | LDA | ||||||
Condition positive | Condition negative | Condition positive | Condition negative | Condition positive | Condition negative | Condition positive | Condition negative | Condition positive | Condition positive | |
Test positive | 2516 | 1234 | 2649 | 1101 | 2635 | 1115 | 3675 | 75 | 2556 | 1194 |
| ||||||||||
Test negative | 1509 | 2241 | 1261 | 2489 | 1297 | 2453 | 3525 | 225 | 1398 | 2352 |
| ||||||||||
Sensitivity = 0.6251 | Specificity = 0.6449 | Sensitivity = 0.6775 | Specificity = 0.6933 | Sensitivity = 0.6701 | Specificity = 0.6875 | Sensitivity = 0.5104 | Specificity = 0.7500 | Sensitivity = 0.6464 | Specificity = 0.6633 |
Overall, ELM, ELM-L, and ELM-R showed better performance than the SVM-R and LDA algorithms in this study. In several previous studies, ELM achieved similar or better classification accuracy rates with much less training time compared to other algorithms using EEG data [
In this study, each trial was divided into the thirty time segments of 0.2 s in length and a 0.1 s overlap. Each time segment was considered as a sample for training the classifier, and the final label of the test sample was determined by selecting the most frequent output (see Figure
To reduce the dimension of the feature vector, we employed a feature selection algorithm based on the sparse regression model. In the sparse-regression-model-based feature selection algorithm, the regularization parameter,
Effects of varying the regularization parameter on the classification accuracies obtained by ELM-R with sparse-regression-model-based feature selection for subject 2. The parameter value giving the highest accuracy is highlighted with a red circle.
Furthermore, our optimized results were achieved in the gamma frequency band (30–70 Hz). We also tested the other frequency ranges, such as beta (13–30 Hz), alpha (8–13 Hz), and, theta (4–8 Hz); however, the classification rates of those bands were not much better than the chance level in every subject and pairwise combination of syllables. In addition, the results of our TFR and topographical analysis (Figures
Currently, communication systems with various BCI technologies have been developed for disabled people [
In the present study, we used classification algorithms for EEG data of imagined speech. Particularly, we compared ELM and its variants to SVM-R and LDA algorithms and observed that ELM and its variants showed better performance than other algorithms with our data. These results might lead to the development of silent speech BCI systems.
The authors declare that there is no conflict of interests regarding the publication of this paper.
Beomjun Min and Jongin Kim equally contributed to this work.
This research was supported by the GIST Research Institute (GRI) in 2016 and the Pioneer Research Center Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT & Future Planning (Grant no. 2012-0009462).