Single-Trial EEG Classification via Common Spatial Patterns with Mixed Lp-and Lq-Norms

,


Introduction
e new communication channel between brain and machine is widely studied in the brain-computer interface (BCI) community based on electroencephalogram (EEG) [1]. It has potentially large application in the domains of biomedical engineering, rehabilitation, education, and entertainment [2]. e central issue in BCI is to accurately decode brain activities. It involves discriminative feature extraction of the brain signals [3].
In fact, the EEG signals are of multichannel time series. ey have high temporal resolution. A large number of machine learning algorithms are used to perform the extraction of discriminative EEG features, where the samples are the EEG signals collected at different time points [4,5]. For the brain activity of motor imagery (MI), there exists a phenomenon called event-related synchronization/ desynchronization (ERS/ERD) [6]. at is, the brain energy in the corresponding region of the cerebral cortex increases in the μ-band (8)(9)(10)(11)(12)(13) and the β-band (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), while the rhythmic energy in the contralateral region significantly decreases [7,8]. Based on the ERS/ERD principle, the technique of common spatial patterns (CSP) carries out spatial filtering so as to reflect the difference between two EEG classes [9][10][11]. By maximizing or minimizing the ratio of spatially filtered variance of the two EEG classes, a lot of literature address the application or development of CSP. For example, local temporal CSP (LTCSP) [12], comprehensive CSP (cCSP) [13], and spatial-temporal filter [14]. Some methods have been developed based on sparse learning for EEG analysis. For example, the optimization of spatial patterns using multiview learning within CSP [15], prediction of antidepressant response based on EEG signature [16], sparse Bayesian extreme learning machine for EEG classification [17], and sparse group representation model [18]. Also, some methods address the problem of channel selection, say, based on correction [19] and bispectrum [20]. Recently, a novel method based on CSP (called CSP-LBP), which uses logarithmic band power (LBP) instead of variance to form the objective function, is developed to extract features [21]. CSP-LBP re-distributes the features so that they can be easily classified. e frequency domain CSP (FDCSP) employs the samples in the frequency domain as input signals of CSP [22]. e transformed data reduce the signal complexity, and some statistical features such as mean, median, maximum, and skewness are extracted. e classification results by the artificial neural network verify the effectiveness of FDCSP.
In the case of clean data, the classical CSP-based methods exhibit satisfactory performance in discriminative feature extraction. However, the EEG data is usually polluted by electromyogram (EMG), electrooculogram (EOG), and spikes. It is, therefore, important to resort to a robust avenue. We have ever developed L1-norm-based CSP (CSP-L1) [23] for the purpose of robust modelling, in which the sample dispersion is formulated with the L1norm.
e L1-norm is then used as internal feature selection [24]. In contrast with the L2-norm, the L1-norm has the robust property from the perspective of statistical theory [25]. e L2-norm involves the square operation, which would enlarge the negative effect of outliers associated with big deviations. In order to balance the big deviations, the resulting projection is likely to be distorted. e L1-norm, however, uses the absolute value operation that could reduce the bad impact of outliers since the dominance of the big deviations is weakened. e L1-norm is then generalized to the Lp-norm scenario for CSP (CSP-Lp) [26] as well as in machine learning [27]. More importantly, the two classes of EEG signals are collected under two different mental conditions, and different mental activities may reflect different intrinsic properties of EEG signals. Specifically, for the motor imageries of left and right hands, the brain demonstrates contralateral effect. e EEG signals recorded under the two mental conditions are related with different cerebral hemispheres and may have different expressions of neurophysiological information. It may be suggested unnecessarily to use the same norm to measure the band powers (i.e., dispersions of samples) of the two classes of EEG signals. It is expected to extract discriminative features by considering individual class of EEG signals specifically. If using two different norms to measure the dispersions of the two classes of samples, there may still be room to promote the capacity of spatial filtering.
In this paper, we consider further generalizing the filtering performance of CSP-L1 by modelling individual class of EEG signals. Specifically, we define the dispersions of the two class of samples in the objective function by using the Lp-and Lq-norms (0 < p and q < 2). We term the proposed approach as CSP-Lp/q. It is noted that, in the field of machine learning, the mixed norms have shown great advantage in classification recently [28]. e CSP-Lp/q method aims to find the optimal projection vectors by optimizing the ratio of the Lp-norm dispersion of one class to the Lq-norm dispersion of the other class. e features are then extracted using the mixed Lp/q-norm dispersions. With the features, the classification is implemented via Fisher linear discriminant analysis (LDA) [29].
at is, the feature vectors produced on training trials are regarded as input samples, by which the objective function of the Fisher criterion is evaluated. According to the principle of maximizing the ratio of between-class to within-class scatter, LDA seeks optimal projection vectors. e testing trials are then discriminated into the class that is nearest with the testing trials in terms of LDA-projected features. We point it that the proposed CSP-Lp/q approach has the following three advantages. (1) CSP-Lp/q incorporates robust modelling by defining the dispersions of the two classes of samples with different robustness-associated norms. (2) CSP-Lp/q is a generalization of CSP, CSP-L1, and CSP-Lp. Mathematically, if taking p � q � 2 and p � q � 1, then the conventional CSP and CSP-L1 methods occur, respectively. Likewise, CSP-Lp is a special case of CSP-Lp/q by setting p � q. (3) An iterative procedure is given to optimize the objective function of CSP-Lp/q.
We organize the rest of this paper as follows. We briefly review the classical CSP and CSP-L1 methods in Section 2. In Section 3, the CSP-Lp/q method, as well as its iterative procedure, is presented. e multiple spatial filters' extension and feature extraction are also outlined in this section. Section 4 illustrates the experimental results on three real EEG datasets. en, the paper is concluded in Section 5.

Conventional CSP and Its L1-Norm-
Based Version e basic idea of the classical CSP is to contrast the difference of two EEG classes by using the spatial filtering technique. Assume that X l ∈ R s×n (l � 1, . . . , T x ) and Y r ∈ R s×n (r � 1, . . . , T y ) are the EEG trials of the two classes, where s represents the number of sensors adopted in the experiment, n is the number of samples recorded within each epoch, and T x and T y are the numbers of trials of the two classes, respectively. We assume that EEG signals are already band-pass filtered, centered, and scaled.
Denote the whole trials of the two classes by X � (X 1 , X 2 , . . . , X T x ) and Y � (Y 1 , Y 2 , . . . , Y T y ), and relabel the columns (i.e., normalized samples) of X and Y as X � (x 1 , x 2 , . . . , x N x ) ∈ R s×N x and Y � (y 1 , y 2 , . . . , y N y ) ∈ R s×N y , respectively. Note that N x � n × T x and N y � n × T y . en, the covariance matrices of the two classes are simply formulated as Mathematically, the problem of seeking the spatial filter ω with CSP boils down to the optimization of the objective function given by where ‖·‖ 2 denotes the L2-norm. It turns out to be an eigenvalue decomposition problem: V x ω � λV y ω. e few leading eigenvectors are associated with the largest or smallest eigenvalues, onto which the projected variances could produce discriminative features.
While CSP defines its objective function based on the L2norm, the CSP-L1 approach, by contrast, utilizes the L1norm to define the objective function given by where ‖·‖ 1 denotes the L1-norm which is the sum of the absolute values of entries. Due to the indifferentiality of the absolute function at the zero point, the optimization of CSP-L1 is usually implemented by the iterative algorithm.

CSP with Mixed Lp-and Lq-Norms Optimization
In the presence of outliers, CSP-L1 exhibits advantage due to the robust modelling property of the L1-norm. Further, it is a good practice to employ the Lp-norm for robust modelling since the L1-norm may not be the optimal norm selection for robust modelling and its generalized Lp-norm is more flexible for accommodating varying EEG signals [26]. Due to the fact that EEG signals are collected under different mental conditions and may embody different intrinsic expression of neurophysiological information, it is beneficial to model the individual class of EEG signals specifically. Based on the basic neurophysiological principle, we propose using the Lp-norm to measure one class and the Lq-norm to measure the other class in defining the objective function as follows: where ‖·‖ p and ‖·‖ q denote the Lp-norm and the Lq-norm, respectively. e Lp-norm of a vector u � (u 1 , u 2 , . . . , u d ) T is defined as ‖u‖ p � ( d m�1 |u m | p ) 1/p . We refer to (3) as mixed Lp-and Lq-norms-based CSP and denote it by CSP-Lp/q. Clearly, in the case of p � q � 1, CSP-Lp/q turns out to be CSP-L1. So, CSP-L1 is a special case of CSP-Lp/q. Since the optimization (both maximization and minimization) of CSP-Lp/q is not straightforward due to the mixed norms, in the following, we design an iterative algorithm under the framework of bound optimization to calculate the filter bases.

Algorithmic Implementation.
e maximization of the objective function of CSP-Lp/q is presented as follows.
(1) Filter initialization: let t be the iterative index and set t � 0, and take any s-dimensional vector with unit length as the initial spatial filter ω(t).
(2) Polarity definition: define two polarity functions a i (t) and b j (t) as where sgn(·) is the sign function. (3) Gradient computation: compute the gradient vector as Here, in case of ω T (t)x i � 0 or ω T (t)y j � 0, we then add a small turbulence to ω T (t). (4) Filter update: update ω(t) by where δ is a learning rate parameter which takes a small positive value. Since only the direction of the basis vector is of interest, ω(t + 1) is normalized to unit length. (5) Convergence inspection: if ω(t + 1) does not substantially augment the value of the objective function, i.e., (J(ω(t + 1)) − J(ω(t)))/J(ω(t)) is less than a threshold ε, then break the run and set ω * � ω(t + 1). Otherwise, set t ⟵ t + 1, and return to Step 2. (6) Filter determination: output the spatial filter ω * .
It is seen that the algorithmic complexity is O(s(N x + N y )) per iteration. In reality, the total number of iterations and the specific EEG signals would determine the final computational complexity.

Algorithmic Justification.
e algorithmic procedure provided above is justified by the following theorem.

Theorem 1. With the iterative procedure for CSP-Lp/q, the objective function J(ω(t)) is monotonically increased by each step of iteration t.
We establish the theorem from the perspective of bound optimization [30,31]. e bound optimization is a very powerful and general optimization framework. By using the bound optimization, we need to coin a surrogate function Q(ω | ω(t)), which satisfies the requirement that the value of the function J(ω) − Q(ω | ω(t)) is smallest at the point ω(t).
en, for the surrogate function Q(ω | ω(t)), the next basis ω(t + 1) is updated such that erefore, the bound optimization is an iterative procedure. Its convergence could be theoretically confirmed. Specifically, the monotonicity of the objective function J(ω) is presented below [30]:

Mathematical Problems in Engineering
To achieve a surrogate function for maximizing J(ω), the following two lemmas are needed.
Lemma 1 (see [32]). Letting p ≥ 1, for two vectors u ∈ R d and v ∈ R d , one has the inequality and the equality holds when taking u � v, where the operations |·| p−1 and sgn( · ) are applied to the elements of a vector and°performs the element-wise product between two vectors. Lemma 2 (see [33]). Letting 0 < p < 2, for vector u ∈ R d and vector v ∈ R d with nonzero elements, one has the variational inequality If u � v, the equality in the above expression is estab- We coin a surrogate function as where can be served as a valid surrogate function due to the fact that J(ω) − Q(ω | ω(t)) ≥ 0 and the equality holds when taking ω � ω(t). e fact is demonstrated as follows.
We compare the numerator and the denominator of J(ω) with the ones of Q(ω | ω(t)), respectively. For simplicity, we drop out the two constants T x and T y in J(ω) since they do not affect the computation of the basis vector. Firstly, we calculate the numerator. By letting u � X T ω and v � X T ω(t) and substituting them into the inequality of Lemma 1, we have that Clearly, when taking ω � ω(t), i.e., u � v, the equality in (13) appears. Secondly, we go on to consider the denominator. By letting u � Y T ω and v � Y T ω(t) and substituting them into the inequality of Lemma 2, it follows that Again, the equality in (14) arises when taking ω � ω(t), i.e., u � v. Combining (13) and (14), we have that and (12) is a valid surrogate function.
As guaranteed by the principle of the bound optimization, to maximizeJ(ω), it suffices to iteratively augment the surrogate function Q(ω | ω(t)) at each iteration t. Differentiating Q(ω | ω(t)) with respect to ω, we obtain the gradient value at the point ω(t) given by which exactly is g(t) given in (6). at is, d(t) is the vector of the gradient-ascending direction of Q(ω | ω(t)) at the point ω(t). Consequently, by calculating the basis ω(t + 1) according to (7), we have Q( , which implies that update step (7) satisfies condition (8). Note that, in the above derivation, we in fact require that p ≥ 1 and ω T (t)y j ≠ 0(j � 1, 2, . . . , N y ) (refer to the applicable conditions of Lemmas 1 and 2). In the case 0 < p < 1, we would additionally require ω T (t)x i ≠ 0(i � 1, 2, . . . , N x ). en, we directly differentiate the objective function J(ω) with respect to ω and obtain the gradient value at the point ω(t) given by where we use the fact that the derivative of the sign function at the nonzero point is zero. e theorem is thus established. erefore, the theorem suggests that the iterative algorithm would converge to a locally maximum point. e globally maximum value of the objective function is possibly reached if different initializations are tried. Particularly, the initialization could be set as the basis vector of the conventional CSP method. e optimal basis vector corresponding to the maximal value of the objective function is thus recorded.

Multiple Spatial Filters' Extension.
Suppose ω 1 , ω 2 , . . . , ω k are the first k spatial filters obtained. e (k + 1)th spatial filter ω k+1 is selected such that where I s is an s-dimensional identity matrix, W k is a matrix formulated as W k � (ω 1 , ω 2 , . . . , ω k ), and ϖ k is an s-dimensional coefficient vector. at is, ω k+1 lies in the orthogonally complementary space which is spanned by ω 1 , ω 2 , . . . , ω k . Substituting (18) into the objective function J(ω), we have that ϖ k can be likely calculated by the bound optimization technique with the deflated samples: en, the basis vector ω k+1 is specified according to (18) using ϖ k obtained. Consequently, the bases matrix W k is renewed by padding W k as W k+1 � (W k , ω k+1 ), where ω k+1 is normalized such that its length is one. Due to the fact that the deflation procedure could be performed based on the previous results. e iterative algorithm for obtaining K spatial filters maximizing J(ω) is formally summarized in Algorithm 1.

Feature Extraction.
We apply the multiple spatial filters to the EEG samples so as to extract features. at is, the set of orthonormal spatial filters ω 1 , ω 2 , . . . , ω K maximizing the objective function J(ω) and another set of orthonormal spatial filters ω 1 ′ , ω 2 ′ , . . . , ω K ′ maximizing the objective function 1/J(ω) are used as projection bases. e contrast of difference between the two classes is formulated with respect to the mixed Lp-and Lq-norms. erefore, for one single EEG trial Z, the feature vector is formed as where f is a 2K-dimensional feature vector.

Experiments
We examine the performance of a family of CSP-based methods, i.e., CSP-Lp/q with varying p and q which include CSP, CSP-L1, CSP-Lp, CSP-LBP, and FDCSP. e experiments are performed on three datasets of BCI competitions: datasets III (a) and IV (a) of BCI competition III and dataset II (a) of BCI competition IV.

EEG Datasets.
ree datasets of BCI competitions, which are publicly available, are employed to evaluate the performance of the proposed CSP-Lp/q approach in EEG classification. In the datasets, 17 subjects were cued to perform motor imageries of hands and tongue activities, and EEG signals were simultaneously recorded. In this paper, we are interested in binary classification (i.e., left versus right hand or right hand versus foot motor imageries).

Dataset III (a) of BCI Competition III.
e EEG signals collected from three persons (s1, s2, and s3) were included in this dataset [34]. ere were 60 electrodes employed in total, and the sampled frequency was of 250 Hz. e numbers of total EEG trials for s1, s2 and s3 were 180, 120, and 120, respectively. e total trials for each subject were equally divided into training and testing trials. at is, the numbers of training (also the testing) samples were 90, 60, and 60 for s1, s2, and s3, respectively. e purpose was to classify the testing trials into left and right hands' motor imageries.

Dataset IV (a) of BCI Competition III.
is dataset [34] was composed of five persons, i.e., aa, al, av, aw, and ay, with 118 electrodes. ere were 280 EEG trials for each subject, and the numbers of the training trials for the five subjects were, respectively, 168, 224, 84, 56, and 28. e rest trials were used as the testing trials. We were interested in classifying right hand and foot motor imageries.

Dataset II (a) of BCI Competition IV.
e EEG signals in this dataset were generated from nine persons (A01E, . . ., A09E), who were cued to perform the activity of four different motor imageries, i.e., left hand, right hand, foot, and tongue [35]. Twenty two electrodes were placed on the scalp to measure EEG signals. For each subject, there were 288 trials in each session with 72 trials per condition. In our experiment, one session of trials was used as training data, while the unlabelled trials in the other session as testing data. Consequently, the number of the training (also the testing) was 144 for each subject since we only perform the classification of the left and the right hand motor imageries.

Preprocessing and Settings of Experiments. A fifth-order
Butterworth filter with the frequency band between 8 Hz and 35 Hz (i.e., cover μ and β bands) was applied to the EEG signals. For the second dataset, EEG time series were windowed in the range of 0.5 s to 3.75 s relative to the visual stimulus, while for the other two datasets, the interval was set as 0.5 s to 2.5 s. e initial projection bases of CSP-Lp/q were set as the corresponding solutions of the conventional CSP method. In the iteration procedure of CSP-Lp/q, the learning rate parameter δ varied from 1e − 6 to 9e − 4. e convergence threshold ε was set as 1e − 5. As commonly suggested in [9], we adopted three pairs of projection bases (i.e., six components) to filter the EEG samples in carrying out CSP and CSP-Lp/q. As confirmed by the experiments, the classification accuracy would not increase if the number of components became large. By using LDA, the six-dimensional feature vectors (which were produced by CSP-Lp/q on the training trials) were used as samples to train LDA. Consequently, by maximizing the objective function of the Fisher criterion, we obtained one optimal basis vector, onto which the feature vectors were projected. e resulting onedimensional scalars were then fed into the nearest-neighbor Input: Training data x i (i � 1, 2, . . . , N x ) and y j (j � 1, 2, . . . , N y ), norm quantities pp and qq, and number of spatial filters KK. Output: Spatial filters ω 1 , ω 2 , . . . , ω K .
(1) Let k � 0, and set B as the s-dimensional identity matrix.
classifier. at is, the labels of the testing trials were endowed as the labels of the nearest training trials. e parameters p and q were determined by using the strategy of two-fold cross-validation.
at was, for each combination of p and q, the filters were obtained based on the training set (subset of the total training set), and then, the classification accuracy was evaluated on the validation set (the rest of the total training set). is procedure was repeated two times, and in each time, the validation set was different. We used the average of the classification rates across the two folds as classification accuracy. We selected the combination of p and q that resulted in the maximum average classification accuracy. en, we re-computed the filters using all the two folds with the optimal combination determined. With the filters obtained, the generalization capacity was estimated on the testing trials, in which the testing process was not concerned with the training of the filters and the parameters.

Outliers Simulation.
To highlight the classification performance of the proposed technique in the situation of outliers with varying p and q, artificial outliers were introduced into the training samples. Specifically, we design the randomly generated outlier from an s-dimensional Gaussian distribution N(μ + 3σ, Σ), where μ and σ were the quantities of channel-wise mean and standard deviation of all the training samples, respectively, and Σ was the Mathematical Problems in Engineering 7 covariance matrix of the training samples. ese artificial outlier points were imposed to the original EEG training samples at randomly selected positions. For each incorporation, the experiments were repeated ten times. e mean classification accuracy was figured.

Experimental Results.
We investigated the classification accuracy of the proposed CSP-Lp/q approach with varying p and q in the range 0 < p and q ≤ 2 on the three datasets. Particularly, CSP-Lp/q becomes the conventional CSP method when taking p � q � 2 and turns out to be CSP-L1 in the case of p � q � 1. For visual perception, the classification accuracies of CSP-Lp/q with different values of p and q on all the 17 subjects of the three datasets are shown in Figure 1, where the step length of p and q was set as 0.1. To clearly see the difference between classification accuracies with different p and q values, the quantities on subject s1 are shown in Table 1 as an example. It was seen that most of the classification accuracies exceeded 70% on the three datasets, which demonstrated the effectiveness of the CSP-based spatial filtering techniques. From Figure 1, it was observed that, in most cases, the highest classification accuracy of CSP-Lp/q was obtained when p and q took different values. It thus demonstrated the necessity of the utilization of different norms in modelling different classes of EEG samples. It was also noted that, in most cases, the optimal values of p and q were less than 1. It may imply that, for EEG signals, small norms were preferred so as to accommodate outliers accompanied. For visual perception, examples of spatial filters obtained by different CSP methods are delineated in Figure 2, where FDCSP was not shown since the frequency quantities were used as input data. e classification accuracies of CSP-Lp/q, as well as CSP, CSP-LBP, FDCSP, and CSP-L1, on the 17 subjects are shown in Table 2, where CSP-Lp/q was compared with the benchmark method CSP and with three state-of-the-art methods: CSP-LBP [21], FDCSP [22], and CSP-L1 [23]. For different subjects, the classification results obtained were quite different, which further illustrated individual

82.22
Mathematical Problems in Engineering 9 difference in the BCI domain. It was straightforward that the average classification accuracy of CSP-Lp/q was higher than that of CSP (i.e., p � q � 2) and CSP-L1 (i.e., p � q � 1) since CSP and CSP-L1 were the two special cases of CSP-Lp/ q. Also, CSP-Lp/q was superior than CSP-LBP and FDCSP. Moreover, the classification accuracies of 7 out of 17 subjects have reached more than 90% when using CSP-Lp/q. e experimental comparison of training times on subject s1 costed by the five methods (i.e., CSP-Lp/q, CSP, CSP-LBP, FDCSP, and CSP-L1) were 91s, 13s, 42s, 78s, and 43s, respectively. Here, the experiments were run on the platform of Lenovo inkPad with 2.90 GHz CPU and 8.00 GB RAM. It was not surprising that the iterative algorithms consumed more training times. Especially, the CSP-Lp/q method needed additional time to train the parameters p and q.
Statistical analysis was implemented to investigate the significance of performance differences between different methods. We adopted the nonparametric Friedman test [36] to meet the purpose since the classification accuracies were reported on the same subjects and then were correlated. During the test, the classification accuracies produced by each method were regarded as a group, and blocks were formed with different subjects. With the rank statistic of the test, it was indicated that the performance differences between different methods were significant (p < 0.05). Further, the Friedman rank sum test [37] was conducted to perform the post hoc multiple comparison. It was found that the proposed CSP-Lp/q method exceeded the other approaches (p < 0.05). e classification accuracies of the five methods (i.e., CSP, CSP-LBP, FDCSP, CSP-L1, and CSP-Lp/q) in the case of increasing occurrence frequencies of outliers on the three datasets are presented in Figures 3-5. Due to individual difference, the classification accuracies varied on different subjects when outliers were introduced. From these figures, it was seen that the classification accuracies of most methods declined with the increased frequency of outliers. Clearly, on almost all subjects, the conventional CSP method and CSP-LBP had the sharpest decrease. at is, the two methods were influenced by the outliers badly. Due to the introduction of frequency, FDCSP was better than CSP and CSP-LBP. However, CSP, CSP-LBP, and FDCSP are essentially based on the L2-norm, which is sensitive to outliers. By contrast, the classification accuracy of CSP-L1 had relatively mild decrease, which demonstrated the effect of the L1-norm in suppressing the outliers. CSP-L1 is based on the L1-norm, which, however, is a special case of CSP-Lp/q (i.e., p � q � 1). Generally, among the five methods, CSP-Lp/q had relative superior classification accuracies on the three datasets. In these situations, the decreasing range was small, and the classification accuracies were still satisfactory on most subjects. Note that the optimal combination of p and q may vary in different occurrence frequencies of outliers. Again, the advantage of selecting appropriate norms for robust modelling was illustrated.

Conclusion
In this paper, we proposed a new approach, called CSP-Lp/q, by using the mixed Lp-and Lq-norms for modelling two classes of EEG signals individually. e objective function of the CSP-Lp/q approach was defined by reformulating the sample dispersion of one class with the Lp-norm, while the other class with the Lq-norm. It took the class difference into account fully when defining sample dispersion. anks to the mix-norms adopted, the CSP-Lp/q approach was a robust generalization of CSP, CSP-L1, and CSP-Lp. e classification performance of the proposed CSP-Lp/q technique was demonstrated by the experiments on three datasets of BCI competitions. One limitation of the proposed method was that the optimal combination of p and q in CSP-Lp/q may depend on the data on hand. is was the issue of model selection. A practical avenue was to use the strategy of cross validation to determine them. It was an interesting but challenging issue to set it analytically. Another limitation was that, compared with the traditional CSP method, the implementation of the proposed method was in an iterative way. e determination of the step length δ in the gradient-ascent procedure, however, was not a trivial issue since inappropriate setting would make the iteration trapped into local optimization. We are currently studying the element-wise update with the close-form solution. Further, the proposed method could be extended into the online (incremental) version, which is useful for the online BCI system. Finally, we pointed out that the proposed technique of adopting Lp-and Lq-norms was a general framework for neuroscience decoding. It could be applied to other relative spatial patterns, say, discriminative spatial patterns [38].

Data Availability
All data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.