Motor Imagery EEG Decoding Based on New Spatial-Frequency Feature and Hybrid Feature Selection Method

Feature extraction and selection are important parts of motor imagery electroencephalogram (EEG) decoding and have always been the focus and difficulty of brain-computer interface (BCI) system research. In order to improve the accuracy of EEG decoding and reduce model training time, new feature extraction and selection methods are proposed in this paper. First, a new spatial-frequency feature extraction method is proposed. +e original EEG signal is preprocessed, and then the common spatial pattern (CSP) is used for spatial filtering and dimensionality reduction. Finally, the filter bank method is used to decompose the spatially filtered signals into multiple frequency subbands, and the logarithmic band power feature of each frequency subband is extracted. Second, to select the subject-specific spatial-frequency features, a hybrid feature selection method based on the Fisher score and support vector machine (SVM) is proposed. +e Fisher score of each feature is calculated, then a series of threshold parameters are set to generate different feature subsets, and finally, SVM and cross-validation are used to select the optimal feature subset. +e effectiveness of the proposed method is validated using two sets of publicly available BCI competition data and a set of self-collected data. +e total average accuracy of the three data sets achieved by the proposed method is 82.39%, which is 2.99% higher than the CSP method. +e experimental results show that the proposed method has a better classification effect than the existing methods, and at the same time, feature extraction and feature selection time also have greater advantages.


Introduction
Motor imagery electroencephalogram (EEG) signal is widely used in brain-computer interface (BCI) system, but it has strong randomness and low signal-to-noise ratio and is easily disturbed by physiological and nonphysiological noises, which makes it difficult to decode [1]. In EEG decoding, feature extraction and selection are the core components [2]. On the one hand, extracting discriminative and stable features can effectively improve the performance of EEG decoding [3]. On the other hand, the extracted features usually contain noise and redundant information, so feature selection is required to eliminate invalid information [4]. In addition, feature selection can reduce the feature dimension and the complexity of the classification model and avoid dimension disaster and overfitting. erefore, feature extraction and selection have always been the focus and difficulty of BCI system research.
Common spatial pattern (CSP) is a relatively effective method for feature extraction of motor imagery EEG among many methods [5]. e traditional CSP method extracts logarithmic variance as features after spatial filtering [6], but some studies have shown that this feature extraction method is not necessarily optimal. For example, literature [7] proposed the logarithmic band power (LBP) feature based on the CSP transform, which is called CSP-LBP in this paper.
e experimental results show that CSP-LBP is superior to the traditional CSP method. In addition, the traditional CSP method lacks frequency domain information. erefore, a lot of work has been done to select the optimal frequency band for CSP. For example, in literature [8], the original EEG signal was filtered into multiple subbands by filter bank method, and then CSP was used to extract features. Finally, mutual information was used to select the features of the optimal frequency band. Zhang et al. [9] proposed a sparse filter band common spatial pattern (SFBCSP) method. SFBCSP carried out band-pass filtering on original EEG signals through multiple frequency subbands with a frequency range of 4-40 Hz, a bandwidth of 4 Hz, and an overlap rate of 2 Hz between subbands. CSP was used to extract features on each subband, and the least absolute shrinkage and selection operator (LASSO) was used for frequency band feature selection. Finally, the support vector machine (SVM) was used to classify selected features. Subsequently, Zhang et al. [10] proposed a subband optimization method that implements sparse Bayesian learning of frequency bands (SBLFB) for motor imagery classification. e subbands filtering method was the same as that in literature [9], but sparse Bayesian learning was used to select sparse frequency band features. e above spatial-frequency feature extraction methods filter the original EEG signals into multiple subbands, which requires a large amount of computation and a long time.
Existing feature selection methods mainly include filter, wrapper, and embedded [11]. e filter feature selection method uses evaluation criteria such as information measurement and distance measurement to select features. e wrapper feature selection method generates feature subsets in a specific way and then uses the results of classifiers as the evaluation criteria for feature selection. e embedded feature selection method can automatically remove some features during classifier training, so feature selection and classification can be carried out simultaneously. e above three types of methods have their advantages and disadvantages, and the organic combination of these methods can achieve complementary advantages. erefore, the hybrid feature selection method has been studied widely in recent years. Moradi et al. [12] proposed a novel hybrid feature selection algorithm based on particle swarm optimization (PSO) and the local search strategy, and the local search strategy was embedded in the PSO to select the less correlated and salient feature subset. Jain et al. [13] proposed a hybrid model for gene selection and cancer classification, and the optimal gene subset was selected by correlation-based feature selection method combined with improved binary PSO. Lu et al. [14] proposed a hybrid feature selection algorithm for gene expression data classification.
e algorithm combined the mutual information maximization and the adaptive genetic algorithm to reduce the dimension of gene expression data and remove the redundancies for classification. Ghareb et al. [15] combined six filtering feature selection methods and an improved genetic algorithm to form a new hybrid feature selection method. Literature [16] is a review article that comprehensively introduces the hybrid feature selection method for cancer classification. Although the hybrid feature selection method has been widely used, as far as we know, few hybrid feature selection methods have been applied to EEG decoding. In addition, the existing hybrid feature selection methods are mostly based on intelligent optimization algorithms such as PSO and genetic algorithms, so the feature selection time is relatively long.
In order to further improve the performance of motor imagery EEG decoding, a new spatial-frequency feature extraction method and hybrid feature selection method are proposed in this paper. First, considering the effectiveness of the CSP-LBP method [7] and the frequency defects of CSP, a new spatial-frequency feature extraction method based on CSP transform, filter bank (FB), and logarithmic band power (LBP) is proposed; we call it CSP-FBLBP. e original EEG signals are preprocessed and then spatially filtered by CSP transform. After that, the spatially filtered signals are decomposed into multiple subbands using a filter bank, and the logarithmic band power of each subband is extracted as the feature. Second, a new hybrid feature selection method based on Fisher score (F-score) and SVM is proposed to select subject-specific spatial-frequency features; we call it F-score-h. e Fisher score of each feature is calculated, then a series of threshold parameters are set to generate different feature subsets, and finally, SVM and 10-fold cross-validation are used to select the optimal feature subset. After feature extraction and selection are completed, SVM is used for classification. Two public data sets and a self-collected data set are used to verify the proposed method.
e main contributions of this paper are in two aspects: First, a new spatial-frequency feature extraction method is proposed. CSP is used for dimensionality reduction, and then the spatial projection signal is band-pass filtered using the filter bank method. Finally, logarithmic band power is used for feature extraction. CSP dimensionality reduction effectively reduces the number of signal channels, thereby reducing the calculation amount and time of band-pass filtering, which greatly improves the timeliness of feature extraction. In addition, the experimental results show that logarithmic band power is more effective than the logarithmic variance of the traditional CSP method.
Second, a new hybrid feature selection method is proposed. e Fisher score is used for feature sorting, and the threshold method, SVM, and cross-validation are combined for optimal feature subset selection. e proposed method takes full advantage of the simple calculation of the filtering method and the supervised selection of the wrapped method, which not only reduces the feature selection time but also improves the classification performance of EEG decoding.

EEG Data
Description. Data set 1: data set IIa of BCI competition IV (2008) [17]: this data set contains 22 electrode channels, and the sampling rate is 250 Hz. Nine healthy subjects (A01, A02, A03, A04, A05, A06, A07, A08, and A09) performed left-hand, right-hand, foot, and tongue motor imagery tasks, respectively. Since we only consider binary classification tasks, C 2 4 � 6 groups of binary classification tasks are obtained by permutation and combination of four types of tasks. Since there are nine subjects, 9 × 6 � 54 data subsets could be obtained. e number of samples in the training set and test set of each subject is 144, respectively.
Data set 2: data set IIb of BCI competition IV (2008) [18]: this data set contains 3 electrode channels, and the sampling rate is 250 Hz. Nine healthy subjects (B01, B02, B03, B04, B05, B06, B07, B08, and B09) performed left-hand and righthand motor imagery tasks, respectively. is data set has five sessions, and we only analyzed the data of the third session [10]. e number of samples in the training set and test set of each subject is 80, respectively. Data set 3: data set self-collected from our laboratory: NuAmps amplifier and electrode cap from Neuroscan company are used for scalp EEG signals collection, and the sampling rate is 250 Hz. A total of 36 electrode channel data in this data set, including 30 channels of EEG data, 4 channels of electrooculogram data, and two reference channels, and only 30 channels of EEG data, are analyzed in this paper. Six healthy subjects (S01, S02, S03, S04, S05, and S06) performed left-hand and right-hand motor imagery tasks, respectively.

e Proposed Method.
e data processing flow of the proposed method is shown in Figure 1, which mainly includes preprocessing, feature extraction, feature selection, and feature classification. In the preprocessing stage, all data sets performed 8-30 Hz band-pass filtering using a 6-order Butterworth filter. e time window with 0.5-2.5 s is selected for single-trial data extraction. In the following content, we will introduce the core work of the proposed method in detail.

e New Spatial-Frequency Feature Extraction Method.
e new spatial-frequency feature extraction method includes CSP dimensionality reduction, filter bank band-pass filtering, and logarithmic band power feature extraction.
(1) CSP Dimensionality Reduction. e solution of the CSP objective function can be equivalent to a generalized eigenvalue problem [19]. After the eigenvector matrix is obtained, the eigenvectors corresponding to the first m largest eigenvalues and the last m smallest are selected to form the final spatial filter. Assuming that the spatial filter is W and the single-trial data is D, the spatial projection signal Z can be calculated by the following calculation formula: where W ∈ R C×2m , D ∈ R C×K , C represents the total number of electrode channels, m represents the pair number of the spatial filters, and K represents the number of sampling points for each electrode channel. After the single-trial data D is transformed by CSP, the EEG signal has only 2m channels. e value of m is usually set to 3 or 1, so the number of EEG signal channels is significantly reduced after CSP transformation. For example, data set 1 has 22 electrode channels; if m is set to 3, the EEG signal has only 6 channels after CSP dimensionality reduction. e specific form of the signal Z is as follows: (2) Filter Bank Band-Pass Filtering. e signal Z is band-pass filtered using a filter bank with frequency subbands of 4-8 Hz, 6-10 Hz, . . ., 26-30 Hz. Specifically, band-pass filtering is performed on each channel of the signal Z, which is shown in Figure 2.
(3) e Logarithmic Band Power Feature Extraction. e logarithmic variance is extracted as the feature in the traditional CSP method [6]. However, the experimental results in literature [7] prove that the logarithmic band power is more effective. erefore, after band-pass filtering, the logarithmic band power is extracted as the feature in this paper, specifically as follows [7]: where Z p (i) represents the i − th sample point of the p − th channel of the signals Z.
In the newly proposed spatial-frequency feature extraction method, CSP spatial filtering is performed first, and then band-pass filtering and feature extraction are performed on the spatially filtered signal. is processing has two advantages. On the one hand, after the signal is spatially filtered by the CSP, the signal quality is improved, and the extracted features are more stable and more discriminative. On the other hand, after CSP dimensionality reduction, the signal channel is greatly reduced, thereby reducing the calculation amount of band-pass filtering. erefore, the time of feature extraction is greatly reduced, and it is not affected by the actual number of electrode channels.

Hybrid Feature Selection Method.
e Fisher score can measure the distinguishing ability of features between two categories [20]. e Fisher score is obtained by calculating the variance ratio between-classes and within-classes of each feature, details as follows: where F(i) represents the Fisher score of i − th future. n + and n − , respectively, represent the number of positive samples and negative samples, and n � n + + n − represents the number of total samples. k,i is the feature value of the i − th feature of the k − th negative class sample, and p is the feature dimension. e larger the F value, the stronger the discrimination of the corresponding features [21]. e traditional F-score method sorts the features according to the Fisher score and then selects the top K features for subsequent classification.
However, it is difficult to determine exactly how many features should be selected to achieve the best classification effect. erefore, a hybrid feature selection method based on F-score and SVM classifier is proposed in this paper; we call it F-score-h. Unlike the filter feature selection method based on F-score, F-score-h uses feature weights (i.e., the Fisher score of the feature) to select the optimal feature subset, as shown in Figure 3. Specifically, after the features are sorted by the Fisher score, we set a series of thresholds to generate different feature subsets; the features greater than the set threshold will be selected. e set of candidate parameters for the threshold is Th ∈ 0, 0.05, 0.1 . . . , 0.8 { }. For each threshold parameter, the average verification accuracy of each feature subset is calculated by combining SVM and 10-fold cross-validation (CV). e threshold corresponding to the highest average accuracy is selected, and the optimal feature subset is further selected according to the optimal threshold. e newly proposed hybrid feature selection method takes advantage of the small amount of calculation of the filtering method and the supervised selection of the wrapped method, which can take into account the time efficiency and the classification performance for feature selection at the same time.

SVM Classification.
SVM is used as the classifier. e SVM classification model used in this paper is as follows [22]: where x i ∈ R p represents the i − th feature sample (feature vector). y i represents the i − th label. ξ i represents the i − th slack variable. x i is mapped into a higher-dimensional space by ϕ(x i ), and C > 0 is the regularization parameter. Using the primal-dual algorithm to solve (5), the following decision function can be obtained: where α i is the Lagrange multiplier, is the kernel function, and sgn(·) represents a symbolic function. SVM is implemented with the linear kernel using the LIBSVM toolbox [22]. e model parameter of SVM adopts the default setting of the toolbox [22].

Comparison Methods and Parameter Settings.
In order to verify the effectiveness of the proposed feature extraction method, the proposed method is compared with the other four CSP methods, which are the traditional CSP method [6,19], CSP-FB [11], SFBCSP [9], SBLFB [10], and CSP-LBP [7]. If there is no special instruction, the pair number of spatial filters for CSP and its improvement methods are set as follows: m � 3 for data set 1 and data set 3; m � 1 for data set 2; SVM is used for classification. e comparison algorithms and their parameter settings are as follows: CSP: CSP feature extraction refers to literature [6,19]. CSP-FB: the parameter setting of the CSP-FB algorithm refers to literature [11]. F-score-h is used to select features.
SFBCSP: the parameter setting of the SFBCSP algorithm refers to literature [9]. Seventeen subbands (4-8 Hz, 6-10 Hz, . . ., 36-40 Hz) with a bandwidth of 4 Hz and an overlap rate of 2 Hz are used for band-pass filtering. A 6-order Butterworth filter is used. LASSO is used to select sparse band features.
SBLFB: the parameter setting of the SBLFB algorithm refers to literature [10]. e subbands setting is the same as SFBCSP. Sparse Bayesian learning is used to select sparse band features.
CSP-FBLBP: CSP-FBLBP is used for feature extraction, and F-score-h is used for feature selection. Tables 1-3, respectively, show the classification accuracy and the total average classification accuracy of all the subjects in the three data sets. e highest accuracy is marked in bold. In Table 1, the left-hand, righthand, foot, and tongue motor imagery tasks in data set 1 were represented by letters L, R, F, and T, respectively. L versus R means left-hand and right-hand binary classification tasks, and the others can be deduced by analogy. Due to space constraints, only the average classification accuracy of each binary classification task is given. It can be seen from Table 1 that CSP-FBLBP achieves the highest average classification accuracy on data set 1, and the accuracy is 3.24% higher than that of CSP. CSP-LBP and CSP-FB are better than CSP, but SFBCSP and SBLCSP are lower than CSP. Similarly, CSP-FBLBP also achieved the highest average classification accuracy on data sets 2 and 3; see Table 2 and Table 3 for details.

Experimental Results.
In order to make a more intuitive comparison of the classification effect achieved by various methods, the average classification accuracy achieved by different feature extraction methods is shown in Figure 4. It can be seen from Figure 4 that the classification effect of CSP-FBLBP is significantly better than other methods. e total average classification accuracy of CSP, CSP-FB, SFBCSP, SBLFB, CSP-LBP, and CSP-FBLBP in all data is 79.40, 80.53, 75.88, 75.63, 80.01, and 82.39, respectively.
Furthermore, the distribution of classification accuracy achieved by various feature extraction methods is shown in Figure 5. e red line represents the median value of classification accuracy. It can be seen that the median value of CSP-FBLBP is higher than that of other methods. e maximum value of CSP-FBLBP is 100%, and the minimum value of CSP-FBLBP is also higher than other methods. In addition, the accuracy distribution of CSP-FBLBP is relatively compact and close to the top. erefore, CSP-FBLBP is superior to other methods.
In order to fully reflect the advantages of CSP-FBLBP, we further studied the time efficiency of CSP-FBLBP; the running time of various feature extraction methods is shown in Table 4. e training sets of the three subjects (A01, B01, and S01) are selected to calculate the feature extraction time. e feature extraction time of CSP-FB, SFBCSP, SBLFB, and CSP-FBLBP includes two parts, namely, CSP spatial filtering time and band-pass filtering time. For CSP-FB, SFBCSP, SBLFB, and CSP-FBLBP methods, two types of time are listed in brackets. e first one represents the spatial filtering time, and the last one represents the band-pass filtering time. It can be seen from Table 4 that the feature extraction time of SFBCSP and SBLFB is the longest, mainly because their band-pass filtering is relatively time-consuming. It is worth pointing out that the feature extraction process of SFBCSP and SBCSP is the same, so the feature extraction time is the same. Although the feature extraction time of CSP-FBLBP is longer than that of CSP-LBP and CSP, such time does not affect the use of CSP-FBLBP in a real-time BCI system. In addition, we can see that CSP-FBLBP has a greater time advantage than CSP-FB, SFBCSP, and SBLFB.

Comparison Methods and Parameter Settings.
In order to verify the effectiveness of the proposed feature selection method, the proposed method was compared with four other feature selection methods, namely, LASSO [23], genetic algorithm (GA) [24], binary particle swarm optimization (BPSO) algorithm [25], and binary differential evolution (BDE) algorithm [26]. CSP-FBLBP is used for feature extraction, and SVM is used for classification.
LASSO: the alternative parameter set for LASSO is λ ∈ 0.1 × 1, 2, . . . , 30 { }. e optimal parameter is selected by 10-fold cross-validation. LASSO with regression model is implemented by the SLEP toolbox [27]. After the LASSO model is determined, the features with a weight coefficient greater than 0 are selected as the optimal feature subset.
GA: the parameter setting of GA refers to literature [24]. e binary encoding is selected as the feature encoding method. e fitness function is the classification accuracy of the k-nearest neighbor classifier, where k � 5. e population size is 10, the number of iterations is used as the termination condition of the algorithm, and the maximum number of iterations is 100. e crossover probability is 0.8, and the mutation probability is 0.01.
BPSO: the implementation of BPSO refers to literature [25], and the parameter setting is consistent with literature [25]. e fitness function is the classification accuracy of the k-nearest neighbor classifier, where k � 5. e population size is 10, the number of iterations is used as the termination condition of the algorithm, and the maximum number of iterations is 100. e acceleration coefficients of the BPSO are set as c 1 � 2, c 2 � 2. e maximum and minimum velocities are 6 and − 6, respectively. e maximum and minimum inertial weights are 0.9 and 0.4, respectively.
BDE: the parameter setting of BDE refers to literature [26]. e population size is 10, and the number of iterations is used as the termination condition of the algorithm, and the maximum number of iterations is set to 100. e crossover probability is 0.9.

Experimental Results.
e average classification accuracy achieved by different feature selection methods is shown in Figure 6. e total average classification accuracy of LASSO, BPSO, GA, BDE, and F-score-h in all data is 77.57, 80.62, 80.56, 80.43, and 82.39, respectively. F-score-h is significantly better than other feature selection methods. BPSO, GA, and BDE are equally effective, and the effect of LASSO is relatively poor. e distribution of classification accuracy achieved by various feature selection methods is shown in Figure 7. It can be seen that the median value of F-score-h is higher than that of other methods. e maximum value of F-score-h is 100%, and the minimum value of F-score-h is also higher than other methods. In addition, the overall classification accuracy distribution of F-score-h is relatively compact and close to the top. ese results fully prove the superiority of F-score-h. e running time of various feature selection methods is shown in Table 5.
e training sets of the three subjects (A01, B01, and S01) are selected to calculate the feature selection time. e feature selection time of F-score-h is the  shortest, which is much lower than other methods. e feature selection time of BPSO, GA, and BDE methods is relatively long. In contrast, F-score-h has a huge time advantage. In summary, F-score-h has great advantages in classification performance and feature selection time.

Results Compared with Other Existing Methods.
In order to more fully reflect the advantages of the proposed method, the classification results of the proposed method are compared with that of the recently published papers. e classification results of data set 1 (L versus R binary classification task) are shown in Table 6, and the classification results of data set 2 are shown in Table 7. In data set 1, the proposed method is superior to other existing methods. In data set 2, the proposed method is second only to the NCFS method [35] and is better than most existing methods. From the above experimental results, it can be seen that the classification effect of the proposed method has certain advantages.

Discussion
Comparing the classification results of CSP-FB and CSP as well as CSP-FBLBP and CSP-LBP can prove that selecting a subject-specific frequency band can improve the classification performance of CSP. e reason why CSP-FB and CSP-FBLBP obtain better classification results is mainly that these two methods use the filter bank method to make up for  show the features selected by F-score-h. From the feature index, it is possible to calculate which channel and which frequency band the selected feature belongs to. It can be seen from Figure 8 that only a few features with high scores are retained. e channel (spatial information) and frequency band (frequency information) selected for different subjects are different; that is, the optimal spatial-frequency features are subject-specific. CSP-FBLBP jointly considers subjectspecific spatial-frequency features, so a better classification result is achieved. In addition, CSP-FBLBP and CSP-FB use the same method to compensate for the frequency defects of CSP, but the classification effect of CSP-FBLBP is better than CSP-FB.
is result shows that the feature type extracted after CSP spatial filtering is also very critical. It can be seen from the experimental results that the logarithmic band power is better than the traditional logarithmic variance features. Based on CSP transformation, it is worth studying to further improve the feature extraction method.
Compared with the existing spatial-frequency feature extraction methods (CSP-FB, SFBCSP, and SBLCSP), CSP-FBLBP has a greater time advantage, and the feature extraction time is significantly lower than the existing methods. e feature extraction time of SFBCSP and SBLFB is the longest, mainly due to the time of band-pass filtering. SFBCSP and SBLFB decompose the original EEG signals into 17 frequency subbands. e number of subbands and channels is relatively large, and the amount of calculation is relatively large, so the feature extraction time is long. After CSP dimensionality reduction, the number of signal channels of CSP-FBLBP is greatly reduced, so its feature extraction time is significantly reduced. In addition, comparing the feature extraction time of CSP-LBP and CSP as well as CSP-FBLBP and CSP-FB can show that the calculation time and complexity of logarithmic band power features are lower than a logarithmic variance.
From the comparative analysis of the above experimental results, it can be concluded that the F-score-h feature selection method has achieved better classification results, and its feature selection time also has significant advantages. F-score-h is a hybrid of filter and wrapper feature selection methods. On the one hand, the filter feature selection method has a small amount of calculation, so the calculation time is short; on the other hand, the wrapper feature selection method uses the classification performance of the classifier as an evaluation standard, its classification performance is generally better. F-score-h takes into account the advantages of both filter and wrapper methods, so it has achieved better classification performance.  In this paper, for the LASSO method, we select the features whose weight is greater than 0 as the optimal feature subset. Generally, the larger the feature weight, the more important the corresponding feature. However, the optimal feature subset is selected by the LASSO model, which is not necessarily the best on the SVM classifier [36]. Combined with LASSO and SVM for feature selection, the classification effect may be better [37]. e classification effect of BPSO, GA, and BDE is relatively poor; there are many reasons. First of all, the genetic algorithm may fall into a local optimal situation, BPSO may appear a "premature" phenomenon [25], and BDE may not be able to effectively converge. In addition, the selection of initialization parameters for BPSO, GA, and BDE also has a great influence on feature selection. How to choose more suitable model parameters is a very critical issue.   To further compare the advantages of F-score-h, we compared F-score-h with F-score. F-score is a filter method that uses Fisher scores to rank the features and then select the top K features for classification. e classification accuracy when a different number of features is selected is shown in Figure 9. e feature dimension extracted in data set 2 is only 20, while the feature dimension of data sets 1 and 3 is 60; in order to compare the variation of the average classification accuracy of the three data sets with the number of features on the same axis, the number of features is mapped to a percentage. For example, 25 on the abscissa of Figure 9 indicates that the selected feature number is 25% of the total number of features. It can be seen from Figure 9 that the number of features corresponding to the optimal classification accuracy of each data set is different. e filtering method selects the same number of features for all data sets, and its classification effect is not good. F-score-h selects features through feature weights, so the selected feature subset is more discriminative and contains less redundant information. In addition, F-score-h can select subject-specific features. erefore, F-score-h is better than F-score. e average classification accuracy of each data set when the F-score takes a different number of features is shown in Table 8. It can be intuitively observed from Table 8 that the accuracy of F-score-h is optimal in each data set.

Conclusion
e new feature extraction and selection methods have been proposed in this paper. In the new feature extraction method, the logarithmic band power is used to replace the logarithmic variance, and the filter bank method is used to compensate for the frequency defects of CSP. In the new feature selection method, the Fisher score is used to sort the features, and then a series of threshold parameters are set; SVM combined with cross-validation is used to select the optimal threshold parameters so as to obtain the optimal feature subset.
e experimental results show that the proposed feature extraction and feature selection method has better classification performance than existing methods, and both feature extraction time and feature selection time have greater advantages, which can be applied to real-time BCI systems.
Although the proposed method has achieved good classification results, the impact of the time window on CSP is not considered in the feature extraction process, and the proposed method still has a large room for improvement. In future work, we will jointly consider efficient time-spatialfrequency feature extraction and selection methods.

Data Availability
In this study, we used three data sets for experiments. Data sets 1 and 2 are public BCI competition data sets, which have been deposited on the BCI competition website. Data set 3 is self-collected by our laboratory and is not publicly available but can be obtained from the corresponding author upon reasonable request.

Conflicts of Interest
All authors declare that they have no conflicts of interest.