Multilinear Discriminative Spatial Patterns for Movement-Related Cortical Potential Based on EEG Classification with Tensor Representation

The discriminative spatial patterns (DSP) algorithm is a classical and effective feature extraction technique for decoding of voluntary finger premovements from electroencephalography (EEG). As a purely data-driven subspace learning algorithm, DSP essentially is a spatial-domain filter and does not make full use of the information in frequency domain. The paper presents multilinear discriminative spatial patterns (MDSP) to derive multiple interrelated lower dimensional discriminative subspaces of low frequency movement-related cortical potential (MRCP). Experimental results on two finger movement tasks' EEG datasets demonstrate the effectiveness of the proposed MDSP method.


Introduction
e core task of brain-computer interface (BCI) is to extract specific useful components from complex electroencephalography (EEG) signal. Currently, the commonly extracted EEG components include event-related desynchronization/ synchronization (ERD/ERS) [1], steady-state visual evoked potentials (SSVEPs) [2], event-related potentials (ERPs) [3], and movement-related cortical potentials (MRCPs) [4]. Among them, MRCP is a slow negative shift that precedes naturally voluntary movement or motor imaging. is negative shift contains two components, i.e., bereitschaftspotential (BP) and negativity slope (NS), respectively, generating from 1-2 s and 0.4 s before movement [5]. Furthermore, MRCP is low frequency signal [6] and always submerged with the background noise. It, therefore, is difficult to reconnoiter the signal. e detection of MRCP has been widely studied by BCI researchers. Particularly, discriminative spatial patterns (DSP) [7] are proposed to enhance the detection of MRCP patterns. Geometrically, DSP aims to find discriminative directions onto which projected scatters between any EEGs are maximized, and meanwhile, projected scatters within any EEGs are minimized [8]. However, most of the current DSP-based methods analyze limited spatial information while ignoring the inherent structure information of original EEG signal.
In real world, the collected physiological signal usually has some specialized structures and these structures are usually in the form of tensors with second or higher order. For example, an original EEG signal is a second-order tensor, i.e., matrix, whose rows and columns represent channels and time series, respectively. When time-frequency analysis is performed on the signal, the third dimension representing frequency information will be shown. us, the data driven method which considers the underlying data structure is required when analyzing real physiological signal. is paper is motivated by the tensor algebra and tensor subspace analysis [9][10][11][12]. In this work, the EEG data is firstly encoded as a tensor with second or higher order by continuous wavelet transform (CWT). en, tensor-based discriminant analysis theory is explored to optimize subspaces algorithm. To uncover the underlying structures in these problems for EEG analysis, this paper proposes the multilinear discriminative spatial patterns (MDSP) as a subspace learning method that includes classification.
In summary, our main contributions are as follows. Firstly, we extend the DSP algorithm restricted to 2D data to tensor objects of any order. Secondly, an iterative optimization method that alternately solves the problem of optimal projection is customized for MDSP. Finally, against the special form of features extracted by MDSP, we propose a new tensor classification method based on nearest neighbors. e advantages of our MDSP algorithm are as follows: (1) MDSP is a general multidimensional dimensionality reduction method. Compared with traditional timefrequency analysis methods, it can avoid the undersample problem (the dimensionality of feature is much larger than the number of training samples) [13] by working on each order of the training tensors separately. (2) MDSP derives multiple interrelated lower dimensional discriminative subspaces. ese subspaces are not independent of each other, which is in line with the inherent structural characteristics of the actual signal.

Related Work
ere have been theoretical analyses that provide a multilinear supplement to linear algebra. And, the work of directly using linear algebra is very common. Some such models from [14,15] propose that linear dimensionality reduction approaches have been extended smoothly to multilinear subspace reduction algorithms, following the principle of multilinear algebra.
More neoterically, the joint solution of tensor and hotspot technologies to solve specific problems has attracted widespread attention. For instance, Makantasis et al. [16,17] presented a new model that trains the neural network by tensor decomposition and significantly reduces the number of samples required in hyperspectral image classification. Meanwhile, common spatial patterns which are restricted to 2D data have also been extended to the hyperspectral image of arbitrary order [18]. Regarding the problem of multiview clustering, Wu et al. [19] firstly constructed a third-order tensor by stacking similarity matrices from each view and then captured the low-rank structure information by t-SVD in the tensor space. Furthermore, they also attempted to construct new multiview features from view-specific affinity matrix by low-rank tensor learning [20].
Motivated by the significant performance of tensor in the field of computer vision, an increasing number of brain scientists have attempted to apply tensor to BCI community [21,22]. Zheng et al. [23] proposed a tensor-based multitask learning method to assess human cognitive activities from multiclass EEG signal. To overcome the limitations of noisy and missing data of EEG, a tensor completion model is designed [24], which utilizes tensor-based algorithms to clean and complete data. Van Eyndhoven et al. [25] trained multilinear subspace learning on multichannel EEG and tested on a single channel, thus creating a real breakthrough for practical application.

Methods
In this section, we present our tensor-based MDSP method. Firstly, we briefly review the original DSP method. en, we introduce some concepts and definitions related with this work to help understanding. Finally, the proposed MDSP and classification extension based on tensor are presented with detailed mathematical formulas.

Discriminative Spatial Patterns.
Assume that X i ∈ R c×t is the EEG signal of trial i with c as the number of channels and t is the number of samples in time, y i ∈ 1, 2, . . . , p is the class label of X i , n j is the number of trials of class j, and n � p j�1 n j meaning the number of all trials.
us, the within-class scatter matrix S w and the between-class scatter matrix S b are defined as follows: where M j � (1/n j ) n j i:y i �j X i means the average of class j and M � (1/n) n i�1 X i means the average of all classes. DSP attempts to seek a set of vectors U(U ∈ R c×d , d ≤ c) to maximize the Fisher criterion given by where tr(.) denotes the operation of matrix trace. According to the Lagrange multiplier method, U � (u 1 , u 2 , . . . , u d ) can be obtained by computing the equation us, the filtered feature F i ∈ R d×t corresponding to X i are given by

Multilinear Algebra.
In this paper, we assume the bold uppercase symbols represent tensor objects, for example, Here, we first introduce two definitions related to this article.
Definition 1 (mode-k unfolding or unfolding-matricization). e mode-k unfolding is the process of rearranging the elements of a tensor (along the mode-k) to obtain a 2 Computational Intelligence and Neuroscience matrix. For an n-order tensor X ∈ R m 1 ×m 2 ×···×m k ×···×m h , the mode-k unfolding of this is e operation of mode-k unfolding is denoted as X k ⇐ k X.
To simplify the notation of multimode product in this paper, we denote that

Multilinear Discriminative Spatial Patterns.
As an improvement of DSP, MDSP utilizes multiple interrelated subspaces which can collaborate to discriminate different classes. Suppose that X i ∈ R m 1 ×m 2 ×···×m h is the EEG signal of trial i; in this paper, h � 2 or 3. e aim of MRCP is to find a set of optimal projection matrices U 1 , U 2 , . . . , U k , . . . , U h (U k ∈ R m k ×m k ′ ) that leads to the most accurate classification in the projected tensor subspace, where Similar to DSP, MDSP maximizes the between-class scatter and, at the same time, minimizes the within-class scatter measured in each optimal projection matrices. at is, where M j � (1/n j ) n j i:y i �j X i is the average tensor of the samples belonging to class j and M � (1/n) n i�1 X i is the total average tensor of all the samples.
Equation (5) is equivalent to a high-order nonlinear optimization problem with high-order nonlinear constraints. erefore, there are dependencies between different projection matrices to be sought, and it is difficult to find the optimal solution. Consequently, we have to use an iterative optimization method to alternately find the discriminant subspace projection matrices MDSP need. In each iteration, to seek the optimal projection matrix U k , we assume that other projection matrices U 1 , U 2 , . . . , U k−1 , U k+1 , . . . , U h are known. So, the optimization problem with high-order nonlinear constraints with equation (5) is changed to In equation (6), U k is the only unknown variable. us, equation (6) aims to pursue the best projection matrices U k , which maximize the interclass scatter and, at the same time, minimize the intraclass scatter just as the general DSP in equation (2). e entire iterative procedure of MDSP is listed in Algorithm 1.

Classification.
After learning the best projection matrices, a new high-dimensional sample X can be rewritten as F through equation (4). en, the class label of the new sample is predicted by the average tensor of the closest category, that is,

Experiments
In this section, we evaluate the proposed approach for a binary classification problem. e objective of the experiments is to detect two-class finger tapping activities (i.e., left/ right finger tapping) from the EEG signal. In our experiments, the tensor-based scheme for finger tapping classification is shown in Figure 1.

EEG Datasets.
We handle the detection on two datasets with the problem of two-class classification (i.e., left/right finger tapping). We use EEG signal as input data. A brief description of the two datasets is as follows.
Dataset IV < self-paced 1 s > of BCI competition II (also called BCI competition 2003) [26] provided 416 trials of 500 ms length, which were ended 130 ms before pressing a key, and only 316 trials were provided with labels and were regarded as the training set. erefore, the remaining 100 trials without labels were regarded as the testing set. e EEG signals were recorded at 1000 Hz and in a version down-sampled at 100 Hz (recommended) with a band-pass filter between 0.05 and 200 Hz. EEG data for voluntary finger tapping movement [27] is from 14 healthy individuals. e EEG signals were recorded for three conditions of right finger tap, left finger tap, and resting state (in this paper, we only focus on right finger tap and left finger tap) with 19 channels and sampling frequency of 1024 Hz. e dataset consists of 40 trials of 6 s in length (−3 s to 3 s) for each condition and each participant. e dataset was shared after preprocessing and artefacts removal using independent component analysis. Figure 2 shows the average waveforms of EEG signals across trials of one second of a subject.

Preprocessing. For both datasets, a low-pass fifth-order
Butterworth filter with cut-off frequency of 7 Hz is applied to obtain MRCPs samples. en, we use the total segment (−630 ms to −130 ms) and part segment (−330 ms to −130 ms) before the motor movement for features extraction in dataset 1. For dataset 2, the data were subsampled to 100 Hz from the raw signals before filtering by a zero-phase low-pass filter. Each EEG trial was segmented from −2 s to Computational Intelligence and Neuroscience Input: a training set Χ i , y i of EEG signals, Χ i ∈ R m 1 ×m 2 ×···×m h with the class label y i ∈ 1, 2, . . . , p . Output: the optimal projection matrices U k ∈ R m k ×m k ′ | h k � 1 (1) initialize U 1 (t) � I m 1 , U 2 (t) � I m 2 , . . . , U h (t) � I m h , (I means identity matrix).
(2) for t � 1, 2, ..., t max do break (14) end if (15) end for (16)   In this paper, we utilize CWT to reveal the change of EEG signal in time and frequency domains. CWT measures the similarity between the input signal and the basic signal (also called mother wavelet). In the wavelet transform, we select the complex Gaussian wavelet as mother wavelet and preset the length of scale sequence as 10. us, we obtain EEG tensor representation which is three-order shape representing the number of channels, time points, and steps of frequency.

Parameters' Selection.
In this experiment, MDSP needs to choose various parameters, namely, each number of filters in interrelated discriminative subspace, and the start point of the time window. For dataset 1, we seek the optimal parameter combination on the training set and evaluate on the testing set like in the competition. For dataset 2, we choose the optimal parameters by a 5 × 5-fold cross-validation method on the EEG data of each subject. Figure 3 reveals how different parameters affect the whole process. From Figures 3(a)-3(o), it can be inferred that 2-4 filters in each dimension achieve better results. Redundant filters are insignificant for the experiment and even add redundant information, thereby reducing accuracy. is is consistent with the experience of selecting the number of filters in other EEG-based subspace algorithms, such as common spatial pattern (CSP) and DSP. Furthermore, Figure 3(p) presents the best time window (among 0.1-0.8 s before finger tapping), which is also coincident with the time when the NS potential occurs.
Note that, the threshold ε is used to check convergence of the iterative optimization procedures. e mode-k convergence error at iteration t is us, the total convergence error is err(t) � h k�1 err k (t). en, the algorithm is treated as convergence when err (t) ≤ ε, where ε � 0.01. e maximum number of iteration t max is chosen to be 50 in our experiments. While the number of iteration t ≥ t max , we think that the procedure on the combination of projection subspace does not converge, namely, filters in different subspaces cannot work well in coordination.
us, we ignore the combination of parameters. Table 1 gives the results of the three methods with the best parameters on different time segments for the 100 testing trials. Here, MDSP is referred to as MDSP (2D) and MDSP (3D) for problems with tensor of second and third order, respectively. From Table 1, it is observed that (1) MDSP, whether working on 2D or 3D signals, significantly outperforms DSP in each experimental trial. (2) For MDSP, the total segment 0.5 s yields better result than part segment 0.2 s, which implies that MDSP is insensitive to the choice of time periods.

Results and Discussion.
Note that the recognition rates of most previous research studies require better manual selections of time segments. In our experiments, MDSP has better robustness to the time segments and automatically selects the best time information over a longer time segments, thereby reducing the influence of manual selections. Besides, our experiments focus on only low frequencies MRCP-based EEG classification (rather than joint classification of MRCP and ERD-based) because previous work has proven that MRCP can be well combined with ERD in BCI [7,28]. Figure 4 shows the effect of the number of iterations for MDSP on 2D and 3D signals while assuming that all tensor modes have the same reduced dimensionality for simplicity (m k ′ | h k�1 � 4). Although the convergence of MDSP is not guaranteed from the mathematical formula, Figure 4(a) shows that the total error does not increase sharply after a certain number of iterations on the excellent channel.    Computational Intelligence and Neuroscience Additionally, Figure 4(b) displays that MDSP provides a stable classification accuracy as it approaches convergence. Table 2 lists the results of five DSP-based MRCP on dataset IV, BCI competition II. Among them, RST is named regularized spatial-temporal filter, combining DSP and regularization ideas. AST is named adaptive spatial-temporal filter, computing a spatial filter automatically by Gaussian kernel and linear ridge regression (LRR). PSTF is named pipeline of spatial-temporal filter, combined by a series of systematic approaches. Our result is equal to the best result. Furthermore, MDSP considers the tensor form and can naturally combine regularization terms to improve accuracy, just like RST. Figure 5 is the boxplot of the classification obtained with the three methods on the 14 subjects in dataset 2. e boxplot demonstrates that MDSP (3D) performs better than MDSP (2D) and DSP, and MDSP (2D) is more stable than DSP though they perform similarly on accuracy. From these, we conclude that the collaboration of multiple subspaces can greatly enhance the classification capability.
Statistical analysis was performed with one-way analysis of variance (ANOVA) and a multiple comparisons procedure was performed as a post hoc analysis [31]. e statistical results demonstrated a statistically significant difference in the accuracy for the three methods (p < 0.05). Moreover, the post hoc analysis showed that MDSP (3D) had a significantly greater accuracy than MDSP (2D) and DSP (highest p < 0.05).
It implies that the subspaces derived from frequency-domain have additional discrimination capability compared with time domain and spatial domain.

Computational Cost.
For original EEG signal X ∈ R c×t , the time complexity of the MDSP is O(c 3 + t 3 ) for each loop. Specifically, Table 3 compares the time complexity and space complexity of MDSP and DSP. Here, r is the number of iterations that makes the MDSP optimization procedure converge and f is the dimensionality of frequency after CWT.
More generally, for any an nth-order tensor X ∈ R m 1 ×m 2 ×···×m k ×···×m h , the time complexity of the MDSP is O(r( n i�i m 3 i )) and the space complexity O( n i�i m 2 i ). Although the MDSP training procedure requires many loops to converge, it is acceptable for ordinary computer  Methods Accuracy DSP 72 RST [29] 79 AST [29] 79 PSTF [30] 75 MDSP 79

Conclusions
In this paper, we proposed an effective feature extraction method, called MDSP, to seek a series of optimal interrelated projections for discrimination in multiple lower dimensional tensor subspaces. Compared with DSP, MDSP discovers more useful discriminant information by constructing multiple interrelated subspaces and thus has better discrimination ability.

Data Availability
All data included in this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.
Computational Intelligence and Neuroscience