A Combination of Pre-and Postprocessing Techniques to Enhance Self-Paced BCIs

Mental task onset detection from the continuous electroencephalogram (EEG) in real time is a critical issue in self-paced brain computer interface (BCI) design. The paper shows that self-paced BCI performance can be significantly improved by combining a range of simple techniques including (1) constant-Q filters with varying bandwidth size depending on the center frequency, instead of constant bandwidth filters for frequency decomposition of the EEG signal in the 6 to 36 Hz band; (2) subjectspecific postprocessing parameter optimization consisting of dwell time and threshold, and (3) debiasing before postprocessing by readjusting the classification output based on the current and previous brain states, to reduce the number of false detections. This debiasing block is shown to be optimal when activated only in special cases which are predetermined during the training phase. Analysis of the data recorded from seven subjects executing foot movement shows a statistically significant 10% (P < 0.05) average improvement in true positive rate (TPR) and a 1% reduction in false positive rate (FPR) detections compared with previous work on the same data.


Introduction
Brain-computer interface (BCI) is an alternative communication and enabling technology which offers the potential to provide a nonmuscular path for physically impaired individuals to convey messages and commands to the external world [1]. Various applications of BCI for able-bodied individuals have also been investigated [2][3][4]. Depending on the mode of operation, BCI systems are categorized into two main classes: synchronous (system paced or cue paced) and asynchronous (self-paced). In a synchronous BCI, the earliest type of BCI, the analysis, classification, and output activation of the system are carried out during predefined time intervals controlled by the machine/computer using cues; that is, the onset of the mental activity is known in advance. This mode of interaction is not natural for most typical applications since the system dictates when the user can control the application and must be switched off when the user does not wish to use the system so that control cues are stopped. In contrast, asynchronous or self-paced systems are more natural for real-life applications since self-paced BCIs allow the user to control the system when desired. These systems have two operational states: intentional control (IC) and no control (NC) [5]. During the IC state the user controls and activates the BCI by intentionally varying their brain signals. NC is the period the user is free to perform any action such as watching TV, reading, relaxing, and eating unless activating the BCI. Continuous classification of the EEG signal is required to reveal the onset of IC or mental activity. The ultimate goal of a self-paced system is detecting the IC states and activating the system in these periods while staying completely inactive during NC periods therefore the percentage of true output activation during IC states, true positive rates (TPR), and false activation during NC states, false positive rates (FPR), determines the self-paced BCI system performance.
One of the most common mental strategies for BCIs is motor imagery because its features are well-defined physiologically. Movement-related potential (MRP) is a response in the EEG signal as a result of particular limb movement 2 Advances in Human-Computer Interaction which lies in the frequency band below 4 Hz and starts about 1.5-1 sec before the movement onset [6]. In addition, due to the movement or imagination of the movement, EEG signal energy in specific frequency bands and also in specific regions of brain fluctuates producing an event-related desynchronization (ERD) before and during movement and eventrelated synchronization (ERS) in the beta frequency band after termination of the movement [7]. As ERD/S features are observable in both real and imaginary movement [8,9] and also more accurate labeling of the EEG signal is possible for real movement asynchronous experiments, real movement data are used for testing new machine learning algorithms in most of the self-paced BCIs [5,[10][11][12][13][14][15][16][17][18][19].
Detection of only one movement from the ongoing EEG signal has been considered in self-paced BCI configurations by different BCI groups [5,[10][11][12][13][14][15][16][17][18][19]. BCIs capable of detecting only one brain pattern from the continuous EEG signal are referred to as a brain switch and are suitable for controlling different applications.
The first self-paced BCI system, referred to as Low-Frequency-Asynchronous Switch Design (LF-ASD), was proposed in [5] by Mason and Birch and was designed to detect the MRPs in the EEG signal recorded during right index finger movement. A wavelet-like function was applied to extract the features, and a 1-nearest neighbor (1-NN) classifier was used to distinguish the IC and NC classes/states. Several changes such as adding the energy normalization transform in the feature extraction block [10], adding a moving average and debounce window in a postprocessing block for decreasing the FPR [11], subject customization of the feature generator's parameter [12], and incorporating the knowledge of the past paths of features into the system [13] have been applied for improving the performance of the LF-ASD.
In the last design of this switch, Fatourechi et al. [14] proposed an improved version of the LF-ASD by extracting features from three neurological phenomena: movementrelated potentials, changes in the power of Mu rhythms, and changes in the power of Beta rhythms to detect the IC states. A stationary wavelet transform followed by matched filtering was applied as a feature extraction method. A set of SVM classifiers were used for each neurological phenomenon classification from the idle state. Although the offline reported results of this paper show significant improvement in LF-ASD performance, the EEG signals are not continuously classified in this research. Another drawback of this design is that the NC periods were recorded in a special situation where the subject was asked to count the number of times that a white ball bounced off the screen [14].
In [15] movement onset detection from 64 EEG channels recorded during right-hand movement was investigated. Using the power spectral density estimated by the Thomson Multiplier Method for narrow-band spectral analysis of each EEG channel and Davis Bouldin Index, the best features were extracted and selected. A naïve Bayes classifier was applied to classify each sample to detect the movement onset. In another work of this group [16], the first fully unsupervised system for self-paced BCIs was suggested. An unsupervised classification method based on Gaussian Mixture Model (GMM) was applied.
In [17] Qian et al. developed a novel paradigm for a motor-imagery-based brain-controlled switch that was interactive in the sense that the users performed repeated attempts until the switch was turned on. The beta band eventrelated frequency power from a single EEG Laplacian channel, recorded during the motor imagery of finger movement, was monitored online. When the relative ERD power level exceeded a predetermined threshold the switch was turned on.
Another brain switch designed in [18] proved the suitability of one single Laplacian derivation for detecting foot movement in ongoing EEG. Twenty-nine band pass filters with 2 Hz bandwidth from 6 to 36 Hz with 1 Hz overlap were applied for extracting the band power values of the EEG signal. Two distinct SVM classifiers were used to detect ERD and ERS patterns separately. In the postprocessing block a fixed dwell time and fixed refractory period for all 7 subjects were used to reduce the false detections of the brain switch. Using receiver operating characteristic (ROC) for balancing TP and FP, each SVM classification performance and a combination of the SVM outputs with a product rule were reported. The results demonstrate that the ERS patterns are more successful in detecting the onset of the foot movement in ongoing EEG signal. The result of [19] also proves that ERS phenomena are suitable for realizing a brain switch due to some features such as its subject-specific stability, specificity, and somatotopic organization. According to the above characteristic of ERS, in this paper we only consider ERS as a neurological phenomenon for discriminating foot movement onset from the idle state.
In this paper we improve the onset detection performance of a brain switch designed in [18]. For frequency decomposition of the EEG signal we apply constant-Q filters instead of constant bandwidth filters in self-paced BCI systems. Constant-Q frequency decomposition has previously shown to produce better classification accuracies in determining right-and left-motor imagery in synchronous motor-imagery-based BCI systems [20,21]. We show that these filters significantly improve the performance of the brain switch and reveal the ERS features in the ongoing EEG signal much better than constant bandwidth filters.
Another innovation proposed in this paper is selecting the optimum postprocessing parameters such as dwell time and threshold for each subject and each combination of train/test runs. Most of the research in self-paced system design has a special postprocessing block for decreasing the FPR. Event-by-event analysis of self-paced BCI systems has been proposed in [22] and for its modification "threshold", "dwell time", and "refractory period" also introduced. In most of the self-paced systems [18,19] "dwell time" and "refractory period" are fixed for all the subjects, and they report the best results achieved in the test phase by changing the threshold but for online application; all the parameters should be available from the training phase. Therefore a fixed threshold should be selected from the training data of each subject. In this paper "dwell time" and "threshold" are selected in the training phase, and refractory period is fixed Advances in Human-Computer Interaction 3 for all the subjects. We observed that sometimes the selected threshold and dwell times are very low for the test phase classification output and in this situation the false positive rates increase. Therefore we apply a debiasing block before postprocessing which decreases the FPR by readjusting the classification output based on the current and previous brain states. This block is activated just in special cases which are determined from the training phase because in some cases adding debiasing results in decreasing TPR.
The remainder of the paper is organized as follows. Section 2 outlines data acquisition and the methodology of feature extraction, classification, performance evaluation, and postprocessing parameter selection. Results of using constant-Q filters and optimum postprocessing parameters in the brain switch performance are illustrated and discussed in Section 3; finally conclusions are presented in Section 4.

Data Description.
Our analysis is performed on the data provided by the laboratory of Brain Computer Interface (BCI-Lab), Graz University of technology [18]. Data was acquired from 7 subjects during the execution of a cue-based foot movement. Each subject performed 3 runs with 30 trials on the same day. At the beginning of the trial (t = 0) a "+" was presented; then at t = 2 the presentation of an arrow pointing downwards cues the subject to perform a brisk foot movement of both feet for about 1-second duration. The cross and cue disappear at t = 3.25 s and at t = 6 s, respectively. At t = 7.5 the trials end. In between trials, a wait period of maximum 1 second occurs (Figure 1(a)). The recording was made using a g.tec amplifier and Ag/AgCl electrodes. The sampling frequency was 250 Hz. Sixteen monopolar EEG channels covering sensorimotor areas were measured. From these data, one small Laplacian derivation [23] over electrode position Cz was computed using orthogonal neighbor electrodes ( Figure 1(b)). The surface Laplacian is approximated as follows: where V Cz is the scalp potential EEG of the Cz channel and S j is a set of four orthogonal neighbor electrodes.

Feature Extraction.
Finding a suitable representative of data which makes the classification or detection of brain patterns easier is the goal of feature extraction. We select an appropriate feature for extracting ERS as a stable and more detectable movement-related pattern from spontaneous EEG signals. The energy increase in specific frequency bands as a result of correlated deactivation of neural networks in specific cortical areas of the brain is referred to as eventrelated synchronization (ERS). Band power which reveals the energy or power fluctuations of the signal in specific frequency bands is employed in this paper. Since band power features have low-computational requirements, they have been used widely in fast online BCI signal processing in selfpaced applications [24][25][26].

Frequency Decomposition.
Frequency decomposition of a signal is done using constant bandwidth or constant-Q (Q is the quality factor) filters. In constant-Q frequency decomposition, the ratio of center frequency to bandwidth for all the filters is the same and equal to Q. In other words, for low frequencies the frequency resolution is better while for high frequencies the time resolution is better. After selecting center frequencies, for different amounts of Q, the bandwidth of each filter is calculated as where Bw is the filter's bandwidth, f c is the center frequency, and Q is the quality factor of the filter. Different values of Q result in various frequency decompositions of the signal. If Q is selected to be small, the bandwidth of the filters is large. The wideband signal components might be more contaminated with substantial noise in the EEG signal. For large Q, the bandwidth of the filters is small. In this situation the percentage of overlap of neighboring frequency bands decreases and therefore cannot provide a proper redundancy of signals. In this paper for two different Q ratios (Q = 2 & Q = 3) we constructed two sets of fifthorder Butterworth bandpass filters with center frequencies 4 Advances in Human-Computer Interaction at 6, 6.9, 7.8, 9, 10.2, 11.7, 13.4, 15.3, 17.5, 20.0, 22.8, 26.1, 29.8, and 33.5 Hz as suggested in [21] and cover the total range from 6 to 36 Hz. The frequency responses of these filters with Q = 2 are illustrated in Figure 2. It is obvious that the filter banks with constant Q may increase the redundancy of information in the feature set. The reasons behind the vast frequency band selection from 6 to 36 Hz are as follows: firstly the significant frequency characteristics of motor-related patterns are in beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) and mu (8)(9)(10)(11)(12) Hz) rhythm components [27] and secondly the optimal frequency bands of ERS vary among subjects. The ERD/S time-frequency maps of all subjects in Figure 3 show the differences between each subjects' optimum ERS frequency bands. For each subject, ERD/S map using constantbandwidth filters and constant-Q filters is plotted. Figure 3 has been plotted using the ERDS map toolbox of Biosig [28].

Band Power.
During offline analysis, 28 logarithmic band power features were extracted from time segments of 1 s length to give a comprehensive spectral description of the EEG signals from 6 Hz to 36 Hz. Each segment has 250 samples with an overlap of 125 samples between adjacent segments. The logarithmic band power features were computed with two sets of constant-Q filters (Section 2.2.1). Each time segment of 1-second length was digitally band pass filtered, squared and averaged over all samples within the time segment and transformed with logarithm.

EEG Data
Labeling. The continuous EEG data is categorized into two classes: baseline and movement. According to the results of [18,19] the ERS occurring after the end of the motor task is the dominant feature for realizing an asynchronous brain switch. Therefore all the samples were labeled for the classification of ERS against all other brain activities. According to the ERD/S map of the subjects, ( Figure 3) the ERS happens mostly in t = 4-5 seconds in each trial. Therefore the samples in t = 4-5 s of each are labeled as movement class or (class 1), and the rest of the samples are labeled as baseline or (class 0). The data labeling is the same as [18].

2.4.
Classification. Support vector machines (SVMs) are supervised learning methods that classify the data by constructing an N-dimensional hyper-plane for a given feature set. Several advantages of SVM are as follows: it has a good generalization property as a result of selecting the hyperplane which maximizes the margins, SVM is less prone to overtraining, and it is also insensitive to the curse-of-dimensionality.
The Gaussian-kernel-based SVM classifier has been used in self-paced BCI research successfully [14,18,19,29]. The SVM performance depends on the regularization parameter C and the Gaussian kernel bandwidth σ. These parameters should be properly selected in the training phase. The goal is to identify C, σ using training data so that the classifier can accurately classify testing data. We use the libsvm software [30] for implementing the SVM since this software provides the posterior class probabilities in the output. In order to combine both conditions into a single measure we calculate the Youden index [31] TF for each pair as follows: The best C and σ, the pair which maximizes TF, and the whole training set (two runs) are used to train a final SVM [18].

2.4.2.
Testing. The remaining run is used for testing the trained SVM. In order to simulate an online asynchronous system, we continuously compute logarithmic band power features applying a 1-second moving window at the rate of the sampling interval. The SVM classifier calculates the posterior class probability for patterns of the test run ( Figure 4).

Performance
Evaluation. Performance measurement of the online self-paced paradigm is carried out in an event-byevent manner while in the training phase TPR and FPR were measured on the basis of sample by sample analysis. Before event-by-event analysis, the event class posterior probability of classifier was postprocessed using threshold, dwell time, and refractory period [22]. Dwell time is the amount of time that the output signal of the classifier must exceed the threshold to be considered as a control event. When one Advances in Human-Computer Interaction   Advances in Human-Computer Interaction control event is detected, the output signal will be ignored during a refractory period. If the control event is detected in the intentional control (IC) periods of each, it is regarded as a true positive, but any detection out of the IC period is a false positive. For evaluation the time interval from t = 3 to 5.5 seconds of each is considered as the IC period. This interval is the same for all the subjects. The refractory period is also fixed for all the subjects and is equal to 750 samples or 3 seconds. The refractory period limits the number of detections of the brain switch and that longer values reduce the number of false activations at expense of lowering the bit rate. The event-by-event TPR and FPR are computed as follows [18]: where TP and FP are the number of true positive and false positive detections, respectively. NTP is the number of IC periods. Since only one detection is allowed in the brain switch design, for this dataset NTP = 30 (30 trials in test run). Figure 5 shows the onset detection of the brain switch during 5 trials of the test run. Determining the threshold and dwell time for each subject and each combination of runs plays a very important role in the final performance of the system in terms of detecting the movement onset. The best threshold value in each dwell time corresponds to the point of the ROC curve (TPR versus FPR) closest to line y = 1 − x where the FPR is taken to be horizontal (xaxis) and the TPR is vertical (y-axis). Analysis of a set of ROC curves leads to optimal value selection.  The following equation is used for debiasing the classifier output at time instant t [32]: where y t is the classifier output at time instant t, C t is the zero mean output, and τ is the number of previous classifier outputs used for averaging. In a case of 20 seconds window size, τ = 250 × 20 = 5000 samples [32]. Figure 6(a) shows an example of decreasing the number of FPs using the debiasing block. However, in some cases (Figure 6(b)) debiasing might lead to a decrease in true positive rate.

Automatic Activation of Debiasing Block.
Since the overall mean of the classifier output signal may decrease as a result of debiasing, it is probable that the best threshold selected in the training phase would be high; that is, after debiasing a threshold which is too high it results in low FP but also low TP which is not desirable. Therefore in the training phase we can perform analysis to determine whether the debiasing is required or not for final evaluation.
In the training phase for the best threshold and dwell time selection, only half of the trials of the second run are used for determining the optimum postprocessing parameter. The other half of the trials is used for checking the necessity to use the debiasing block in final test session. Using the optimum threshold value and dwell time, TF = TPR − FPR (event-by-event analysis) is calculated for two situations: with debiasing and without debiasing. The higher TF value determines whether to apply the debiasing in test phase.

Results and Discussion
The results of onset detection are summarized in Tables 1-4. In all the tables for each subject TPR and FPR values are reported in the form of mean ± standard deviation. As explained for each subject, 3 different combinations of train/ test runs are possible. The average of TPR and FPR values of all 3 combinations for each subject is presented in the tables.   In order to show the effect of using constant-Q filters in improving the performance of the Brain switch, in Table 1 we illustrate the results of applying a set of fifth-order Butterworth filters with constant bandwidth (2 Hz bandwidth and 1 Hz overlap between 6 to 36 Hz) in one column and with constant-Q (Q = 2 and Q = 3 and 14 center frequencies from 6 to 36 Hz) in another column. The intentional control is 3-5.5 seconds for evaluation, the threshold is 0.5, and the refractory period is equal to 750 samples. For the dwell times equal to 30, 40, and 60 samples, the results are reported in three subcolumns for each type of filter to show the changes of the performance in different dwell times.
According to the results of Table 1 the average TPR achieved by applying constant-Q filter is significantly better than the constant bandwidth approach. In a two-sided nonparametric statistical test, the Wilcoxon signed rank test [33,34] was used that the improvement is statistically significant (P < 0.05).
These results confirm that constant-Q filters are more capable of extracting ERS features from the ongoing EEG signal compared with constant bandwidth filters. The results of constant-Q filtering prove our prediction according to the ERD/S map ( Figure 3) using constant-Q filters. According to these maps for all subjects denser ERS is present using constant-Q filters in contrast to a sparse ERS using constant bandwidth filters. One of the nice features of the constant Q filter is its increasing time resolution towards higher frequencies and increasing frequency resolution in lower frequencies which contributes to define more precisely the movement onsets in EEG signal. This characteristic decreases the nonstationary effects of the EEG signal and results in performance improvement specially for some of the subjects which suffers more from nonstationarity. Moreover, the filter banks with constant Q may increase the redundancy of information in the feature set therefore when the subjectspecific frequency bands are not applied, the classifier performance can be improved. The results in the remaining tables are therefore reported with constant-Q filters. For different dwell times we report the results to show the effect of choosing optimum dwell time in the final performance of 8 Advances in Human-Computer Interaction Table 4: Comparison of the results of this paper and results presented in [18].

Subjects ID
Constant-Q filter + dwell time selection Results of paper [18] TPR (mean ± SD) FPR (mean ± SD) TPR (mean ± SD) FPR (mean ± SD) S1 98 ± 4 3 ± 3 9 7 ± 5 3 ± 2 S2 61 ± 2 8 ± 1 6 1 ± 10 7 ± 2 S3 100 ± 0 3 ± 2 9 4 ± 5 4 ± 4 S4 96 ± 2 6 ± 2 8 3 ± 12 4 ± 3 S5 65 ± 9 7 ± 2 5 4 ± 14 7 ± 2 S6 91 ± 10 6 ± 3 7 9 ± 12 8 ± 1 S7 83 ± 9 5 ± 2 5 2 ± 20 6 ± 2 Average 85 ± 6 5 ± 2 7 4 ± 21 6 ± 3 each subject. In Table 1 the threshold value is also fixed for all subjects. Adjusting the threshold also can improve the results of the classification significantly. Increasing and decreasing the dwell time and threshold value selection can impact the result considerably; therefore for designing a brain switch suitable for online applications, it is recommended that these values are determined in the training phase properly. Table 2 shows the results of the brain switch where threshold and dwell time are selected automatically in the training phase as suggested. This automatic selection is done since we cannot select the proper value for these parameters randomly. The mean value of the TPR is satisfactory, but the FPR value of three subjects S2, S5, and S7 are high. The high value of FPR for these subjects shows that the best dwell time and threshold selected using the training data are not necessarily the best ones for the test data because EEG signal is nonstationary. In order to decrease false positive detections the debiasing block can be added to the output of the classifier. As explained in Section 2.6.2, this block is not activated always. The posterior probability of the classifier will be debiased if its effectiveness has been confirmed in the training phase. The results of automatic applying the debiasing block in the output of the classifier are illustrated in Table 3. Although the mean value of TPR has decreased for some subjects the problem of a high number of false detections has been solved using this block. Therefore the automatic selection of the dwell time and threshold value along with automatic activation of the debiasing block results in acceptable performance since the FPR is less than 10% for all the subjects. Although it seems some of the columns of Table 1 (dwell = 60) also result in the same performance, selecting that specific dwell time and threshold which lead to acceptable performance cannot be done randomly, and in some cases random selection of these parameters (dwell = 30 and 40) might result in the high FPR value (FPR > 10%).
In order to prove our claim of improving the performance of the brain switch designed in [18] we compare the results of our method and the reported results of [18] in Table 4. The results of [18] were reported after ROC curve analysis in the test phase. The maximum TPR associated to a FPR < 0.1 had been selected for each combination of runs for each subject. The left side of the Table 4 shows our results calculated after ROC analysis in the test phase while the best dwell time has been selected in the training phase. The right side of Table 4 is the reported results of ERS classification [18]. Comparing the mean TPR and FPR results of the right and left columns, the performance improvement of the brain switch provided by the proposed adjustments is clear. A twosided nonparametric statistical hypothesis test, Wilcoxon signed rank [33,34], between the accuracies obtained by the proposed method and those reported in [18], shows a significant improvements in the TPR (P < 0.03), while the FPR decrease is not statistically significant (P = 1).
The differences between the results reported in the Table 4 (left column) and Table 3 originate from threshold value. In Table 3 the results are calculated in only one threshold value selected in the training phase, while the results illustrated in the Table 4 are selected between various results calculated for different threshold values in the test phase with the same criteria of [18] (maximum TPR associated with FPR < 0.1).
In the following we briefly count the differences between the brain switch designed in this paper and in [18] which leads to performance improvement: (1) applying constant-Q filters instead of filters with equal bandwidth provides more separable features between event class and nonevent class. (2) Increasing the refractory period from 2 seconds (500 samples) to 3 seconds (750 samples) has a positive effect on the final performance. Since the summation of the refractory period and dwell time is less than the time duration between two consecutive events, increasing the refractory period does not cause any problem. (3) Automatic threshold value and dwell time selection using training data prepare the system for online application. For the cases with low selected threshold value and short optimum dwell time, debiasing plays an important role in decreasing the number of false detections. Its automatic activation using the training results prevents losing the high true positive rates in the cases of some subjects.

Conclusion
The results of this paper illustrate that using constant-Q filters without any optimization of frequency bands for each subject resulted in more separable features of event and nonevent class samples. The automatic selection of dwell time and threshold values in the training session makes the brain switch suitable for online applications. Adding a debiasing block to the classification output signal only in special cases which are predetermined during the training phase also resulted in a more accurate brain switch. Overall, the study shows that combining a range of simple techniques including constant-Q filters, optimum threshold values, optimum dwell time, and automatic activation of a debiasing block improves detection of foot movement in ongoing EEG. Although previous studies have investigated these processes individually, this study provides new evidence to suggest that combining these various techniques can improve selfpaced BCI performance. With the proposed combination of methods, no adjustment or collaboration (e.g., threshold setting) is required during the test phase, whereas many other studies require a number of parameters to be adjusted to account for nonstationary changes.