^{1}

^{1}

^{2}

^{3}

^{4}

^{1}

^{2}

^{3}

^{4}

Motor-imagery-based brain-computer interfaces (BCIs) commonly use the common spatial pattern filter (CSP) as preprocessing step before feature extraction and classification. The CSP method is a supervised algorithm and therefore needs subject-specific training data for calibration, which is very time consuming to collect. In order to reduce the amount of calibration data that is needed for a new subject, one can apply multitask (from now on called multisubject) machine learning techniques to the preprocessing phase. Here, the goal of multisubject learning is to learn a spatial filter for a new subject based on its own data and that of other subjects. This paper outlines the details of the multitask CSP algorithm and shows results on two data sets. In certain subjects a clear improvement can be seen, especially when the number of training trials is relatively low.

The development of BCI systems is an active research domain that has the goal to help people, suffering from severe disabilities, to restore the communication with their environment through an alternative interface. Such BCI systems can be divided in several categories based on the signal features they use. Some of these features like the P300 [

Because of the low spatial resolution of electroencephalography (EEG), a commonly used method to improve this resolution is the common spatial pattern (CSP) algorithm introduced by Koles [

One way to further improve a subject-specific CSP filter is to use the data recorded from other subjects, additionally to the subject's own data. To this end, we will use some ideas of multisubject learning, an active topic in machine learning [

Section

The goal of the basic CSP method is to learn a set of spatial filters for one subject that maximizes the signal variance for trials of one class while at the same time minimizes the signal variance for trials of the other classes. For the two-class case, this can be formulated as follows:

We now want to use data of other subjects to improve the filters for specific subjects. To accomplish this, we first need a spatial filter

The parameters

The above equation can be rewritten to a simpler form, that is, a sum of convex-to-convex ratios

To find the maximum of (

From here on, this method is denoted by the abbreviation “mtCSP.”

Before giving the details of the cluster-based multisubject CSP algorithm, we present an optimization algorithm for clustering CSP filters. This algorithm is inspired by [

So, let us start with a simplified version of the optimization framework proposed in [

For spatial filters, however, we have to find a more appropriate metric. As explained in [

In Section

First, we introduce multiple shared filters

Finally, we also want to find a good initialization for the variables. To accomplish this, we use the clustering algorithm described in Section

For the simulated data we generate two clusters of 20 similar tasks. The training set of each task contains data for two conditions, each condition counting 15 samples. The source variables are generated from a two-dimensional Gaussian distribution with zero mean and covariance matrix dependent on the condition, but the same for both clusters and all tasks,

(a) shows the training set that is used to compute both the bCSP and clmtCSP filters. The data points themselves are not plotted, instead we only draw the standard deviation contours of the data's estimated covariance matrix, together with its corresponding principal vectors (representing the ellipse's principal axis). Blue and black contours correspond to the first class or condition, while green and red contours represent the other class. The goal of the computed filters is to align the principal vectors to the axes. The results for both bCSP and clmtCSP are shown in (b) and (c) figures, respectively. Here, the contours denote the standard deviations according to the estimated covariance matrix of the “unmixed” sources. Concerning the clmtCSP method, if the contours are drawn in blue and green, it means that they have been estimated as being in the first cluster according to the algorithm. If it is red and black, the task is estimated as belonging to the second cluster. The true cluster number is given in the title of each subplot.

Training set

bCSP

clmtCSP

(a) compares the variance ratios of the bCSP solution with the clmtCSP solution on the first cluster, while (b) makes the comparison for tasks of the second cluster. The number above or below each pair of bars is the

bcsp

clmtcsp

For the experimental data sets we use data of the third BCI competition (BCIC3 data set (on

The set of the BCI competition contains data recorded from 118 electrodes where the subjects performed two tasks: right hand motor imagery and foot imagery. Five subjects are included in the set and each subject recorded 280 trials. We take a fixed test set of the last 180 trials while the first 100 are retained to construct the training sets. To limit the number of parameters that needs to be computed by the optimization algorithm, the number of channels is reduced to 22. The ones selected are Fp1, Fpz, Fp2, F7, F3, Fz, F4, F8, T7, C3, Cz, C4, T8, P7, P3, Pz, P4, P8, POz, O1, Oz and O2.

In the MPI set, each subject performed 30 left hand motor imagery trials and 30 right hand motor imagery trials. This was repeated once for the test set resulting in a total of 120 trials per subject. The same subset of electrodes is used as before except for two channels which were not recorded for some of the subjects.

As there are only five subjects in the BCIC3 data set, we assume that all subjects are similar. Consequently, we will simply apply the first proposed algorithm, that is, mtCSP. The MPI data set, however, contains too many subjects to assume that they are all similar. Hence, we will apply the cluster-based “clmtCSP” method with a predefined number of clusters, namely, three. Four cluster seems too many for only 14 subjects, as this could potentially leave some clusters with very few subjects. On the other hand, we did not choose two for reasons of complexity as it increases the number of subjects per cluster and thus the dimensionality of problem (

All signals are band-pass filtered between 8 and 30 Hz. The trade-off parameters

Figure

Cross-validation accuracies per parameter combination of

BCIC3 data set

BCIC3 data set

The results for each subject separately are given in Tables

Classification accuracies per subject for the BCI competition data set.

5 trials | 10 trials | 20 trials | 30 trials | |||||

Subject | bCSP | mtCSP | bCSP | mtCSP | bCSP | mtCSP | bCSP | mtCSP |

0.49 | 0.73 | 0.54 | 0.64 | 0.66 | 0.71 | 0.61 | 0.69 | |

0.80 | 0.73 | 0.95 | 0.93 | 0.95 | 0.94 | 0.94 | 0.94 | |

0.56 | 0.58 | 0.59 | 0.63 | 0.44 | 0.62 | 0.56 | 0.64 | |

0.69 | 0.57 | 0.69 | 0.56 | 0.66 | 0.58 | 0.55 | 0.54 | |

0.92 | 0.86 | 0.84 | 0.93 | 0.85 | 0.85 | 0.88 | 0.87 | |

Mean | 0.69 | 0.69 | 0.72 | 0.74 | 0.71 | 0.74 | 0.75 | 0.79 |

Classification accuracies per subject for the MPI set.

5 trials | 10 trials | 20 trials | ||||||

Subject | bCSP | clmtCSP | bCSP | clmtCSP | bCSP | clmtCSP | ||

1 | 0.80 | 0.68 | 0.78 | 0.73 | 0.85 | 0.85 | ||

2 | 0.85 | 0.83 | 0.83 | 0.77 | 0.87 | 0.85 | ||

3 | 0.45 | 0.43 | 0.53 | 0.57 | 0.58 | 0.60 | ||

4 | 0.58 | 0.53 | 0.72 | 0.75 | 0.77 | 0.77 | ||

5 | 0.53 | 0.47 | 0.52 | 0.48 | 0.62 | 0.60 | ||

6 | 0.58 | 0.67 | 0.60 | 0.60 | 0.70 | 0.70 | ||

7 | 0.83 | 0.92 | 0.90 | 0.92 | 0.95 | 0.95 | ||

8 | 0.38 | 0.52 | 0.48 | 0.48 | 0.53 | 0.53 | ||

9 | 0.57 | 0.70 | 0.58 | 0.62 | 0.63 | 0.63 | ||

10 | 0.68 | 0.53 | 0.60 | 0.62 | 0.63 | 0.60 | ||

11 | 0.50 | 0.53 | 0.42 | 0.52 | 0.40 | 0.43 | ||

12 | 0.52 | 0.68 | 0.65 | 0.70 | 0.63 | 0.63 | ||

13 | 0.62 | 0.60 | 0.63 | 0.58 | 0.57 | 0.60 | ||

14 | 0.53 | 0.53 | 0.50 | 0.47 | 0.55 | 0.57 | ||

Mean | 0.68 | 0.70 | 0.70 | 0.70 | 0.71 | 0.71 |

The first thing we notice for the BCIC3 set is that for 5 trials (from here on, we state the number of training trials per class, e.g., when we mention 5 training trials, we mean 5 trials per class, thus 10 in total.) The impact of the multisubject version is relatively low, although this is the area where we suspected the impact would be the largest. Nevertheless, for some subjects, like subject

The difference between both methods becomes apparent in the case of ten training trials where the mtCSP method achieves better or equal accuracies compared to the bCSP method on all subjects, except again subject

Table

We presented a multisubject extension to the basic CSP algorithm in order to reduce the number of training trials and to improve performance by learning spatial filters across subjects. It involves a nonconvex optimization problem and thus a global solution is not guaranteed when employing standard optimization techniques. However, the optimization of such a sum of convex to convex ratios is a hot topic in optimization theory. We can expect, that in the future, implementations will come available that guarantee global convergence and are scalable to handle high-dimensional problems. The authors in [

The main downside of the proposed methods is that we have to perform cross-validation to select good parameter values. Firstly, this takes time to compute, rendering the methods impractical as one can record more data within that time frame to compute good filters. Secondly, enough data needs to be available to determine the parameter values through cross-validation. This is of course in contrast with the aim of the proposed algorithms to reduce the number of training trials. In order to find indicators for the potential of the methods on a low number of training trials, we performed cross-validation by averaging scores over several folds and subjects. This leads to more stable and reliable estimates of the parameter values. We then choose the parameter values the same for all subjects. However, the need for cross-validation could be avoided by employing the Bayesian framework. In order to learn a model across several subjects in this framework, the use of shared priors will be the topic of future research.

An open question is how it compares to other CSP variants that learn from other subjects [

Due to the way we perform cross-validation, it is impossible to show the method's true potential. Nevertheless, some of the results indicate that (cluster-based) multisubject learning for CSP leads to a noticable improvement for some subjects. That some subjects suffer from these methods could be avoided if the trade-off parameters could be chosen reliably for each new subject separately with little training data.

Finally, we want to add that this manner of including the clustering in the optimization problem may be employed for cluster-based multisubject classifiers too. Note that Fisher's discriminant analysis [