Sequential Probability Ratio Testing with Power Projective Base Method Improves Decision-Making for BCI

Obtaining a fast and reliable decision is an important issue in brain-computer interfaces (BCI), particularly in practical real-time applications such as wheelchair or neuroprosthetic control. In this study, the EEG signals were firstly analyzed with a power projective base method. Then we were applied a decision-making model, the sequential probability ratio testing (SPRT), for single-trial classification of motor imagery movement events. The unique strength of this proposed classification method lies in its accumulative process, which increases the discriminative power as more and more evidence is observed over time. The properties of the method were illustrated on thirteen subjects' recordings from three datasets. Results showed that our proposed power projective method outperformed two benchmark methods for every subject. Moreover, with sequential classifier, the accuracies across subjects were significantly higher than that with nonsequential ones. The average maximum accuracy of the SPRT method was 84.1%, as compared with 82.3% accuracy for the sequential Bayesian (SB) method. The proposed SPRT method provides an explicit relationship between stopping time, thresholds, and error, which is important for balancing the time-accuracy trade-off. These results suggest SPRT would be useful in speeding up decision-making while trading off errors in BCI.


Introduction
Noninvasive brain-computer interface (BCI) based on the electroencephalogram (EEG) offers a new means of communication to locked-in or paralyzed patients [1,2] and controlling a prosthesis [3,4] without reliance on the usual neuromuscular pathways. The critical challenge of BCI technology is to classify the brain signals and mental tasks accurately and fast. However, the EEG recorded from the scalp has the characteristics of low strength, low SNR (signal noise ratio), and the EEG difference under different mental tasks is not significant. Therefore, various pattern recognition algorithms were used in BCI system to extract and classify EEG features.
Event-related desynchronization/synchronization (ERD/ ERS) patterns of motor imagery are effective features for EEG-based BCI systems. The experiments show that the phenomenon of ERD/ERS varies among individuals. Therefore, a pattern recognition algorithm should be used to facilitate decoding "motor intent," both to find subjectspecific EEG features that maximize the separation between the patterns generated by executing the mental tasks and to train classifiers that minimize the classification error rates of these specific patterns. Currently, feature extraction for discrimination of left-and right-hand motor imagery EEG is usually based on EEG band power (BP). For example, autoregression (AR) model [5], discrete Fourier transformation (DFT) [6], and wavelet transforms (WT) [7] have been used to extract EEG features for classification. The wavelet method is one of the most effective algorithms. However, the success of wavelet application greatly depends on the proper selection of subject-specific parameters. Actually, the wavelet transform can be considered as projecting the EEG onto a wavelet basis and the band power as the modulus values of projective coefficients. Inspired by the wavelet method, we introduce a new feature extraction method based on power projective bases to classify EEGs without constrain of wavelet forms.
Moreover, the ability to make rapid decisions based on transient stimuli is a unique aspect of our brains' capacity to process information. Broadly speaking, signal detection theory (SDT) and sequential analysis (SA) are two branches of mathematical models that provide a theoretical framework for understanding how decisions are made [8]. SDT converts a single observation into a categorical choice. According to different decision rules, there are different testing approaches to this problem [9]. For example, Bayesian decision theory is derived by minimizing the posterior expected loss, while Neyman-Pearson (NP) criterion seeks to find the best error probability ( ) level test. Like most statistical classification methods, for example, linear discriminant analysis (LDA) and support vector machines (SVM), the classification error is the only characteristic of the SDT decision strategies. The necessary number of observation samples determined by the criteria could be very large, which is especially impractical for BCI applications. To control brain-actuated devices, such as robotics and neuroprostheses, both fast decision-making and a stable control signal with a minimal error rate are important [10,11]. Therefore, recent attentions have been paid to the variable-length sequential sampling model.
A systematic theory of optimal stopping emerged with the work by Wald on the optimality of the sequential probability ratio test (SPRT) [12]. The SPRT achieves a desired error rate with the smallest number of samples, on average. Therefore, in this paper, we introduce a new feature extraction method based on power projective base to classify the EEGs by combining the sequential probability ratio test (SPRT) approach to obtain a continuous dynamic estimate of brain state with accuracy and decision speed balance.

Data Description.
The EEG data used in this work were obtained from thirteen subjects from BCI Competitions II, III, and IV. The task was performed based on left-and righthand motor imagination.
2.1.1. Dataset III from BCI Competition II. This dataset contains EEG data from one subject (S1) [13]. The data were recorded from three channels (C3, Cz, and C4) and sampled at 128 Hz. The data consist of 140 labelled and 140 unlabelled trials with an equal number of left-and right-hand trials. Each trial has a duration of 9 s, where a visual cue (arrow) is presented pointing to the left or the right after 3 s preparation period followed by a 6 s motor imagery (MI) task.

Dataset IIIb from BCI Competition III.
The second dataset contains EEG data recorded over the channels C3 and C4 from three subjects (S2, S3, S4) with some corrections [14]. The data were sampled at 125 Hz. Training and testing sets were available for each subject. Except for the subject O3 that has only just 320 trials for each set, the subjects S4 and X11 contain 540 labelled and 540 unlabelled trials. Each trial has duration of 7 s which consists of 3 s for preparation period and 1 s for a visual cue presentation, followed by another 3 s for the imagination task.

Dataset IIb from BCI Competition IV.
This dataset contains EEG data from nine subjects (S5-S13) [15]. The data were recorded from three bipolar channels (C3, Cz, and C4) and 3 EOG channels. The sample frequency was 250 Hz. Training and testing set was available for each subject. Each subject participated in two screening sessions without feedback and three online feedback sessions with smiley feedback. The trials without feedback had duration of 7 s, and a visual cue was presented for 1.25 s followed by another 4 s for the imagination task. The trials with feedback had duration of 7.5 s, and a visual cue was presented for 4.5 s until the end of motor imagination.

Feature Extraction Method Based on Power Projective
Base. Motor imagery can be regarded as mental rehearsal of a motor act without any obvious motor output. Recent studies show that when performing motor imagination, (8)(9)(10)(11)(12)(13) and (18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) rhythms are found to reveal event-related desynchronization and synchronization (ERD/ERS) over sensorimotor cortex just as when one performs motor tasks. Due to nonstationary effects having often been observed in brain signals, we proposed a power projective base method to extract classification features from C3 and C4 channels. This method improves the classification accuracy by maximizing the difference of the average projective power between twoclass signals. Specifically, the solution of the projective bases can be achieved by generalized eigenvalue decomposition for each subject.
Let X ∈ × be the training dataset from one channel, where ∈ { , } denotes the left-or right-hand motor imagery tasks, denotes the sampling points, and is the number of trials. Moreover, let u ∈ be the projective basis and ‖u‖ = 1. The projective power of signal x , = 1, 2, . . . , on the projective basis u is So the mean projective power (u) can be calculated as where R = X X / is the autocorrelation matrix and it is usually positively definite.
To formulate the objective function to be the ratio of the two-class average projective powers, By maximizing or minimizing (u) to be max or min , the corresponding eigenvector u max or u min is the optimal projective base to be solved. The optimization of (3) could be solved by taking a generalized eigenvalue decomposition Computational and Mathematical Methods in Medicine 3 method. First of all, we can get the following decomposition as where T = [t 1 , . . . , t ] is the generalized eigenvector matrix and is the generalized eigenvalue. Therefore, the ratio of mean projection power (u) turns to where k = T −1 u. Since (u) has the maximum value max = 1 and minimum value min = . Obviously, the corresponding vectors k can be obtained by Then, we have which means that u max and u min are the first column u 1 and last column u of T, respectively. Choosing u max or u min as the projective base depends on which is larger between 1 and 1/ . The projective power for the signals of the channels C3 and C4 onto their own projective bases is then stacked together into a 2-dimensional feature vector z = [ C3 , C4 ] T .

SPRT Classification
Method. Sequential analysis is a statistical decision model that assumes decisions are formed by continuously sampling information until the response criterion is satisfied. Once a boundary has been reached, the decision process is concluded and a response is elicited. The number of observations needed for a decision is not determined in advance of the experiment, but by the observations obtained during the test. The data should be fed to the SPRT algorithm sequentially, so we divide each trial into segments with overlap and each one has the same length as that of the projective base used in feature extraction.
Taking into account the nonstationarity of the EEG sampling information, we assume that the probability distribution of the th segment feature z for class is (z ), ∈ { , }, and then the probability ratio for the th segment is and the evidence accumulation turns to where is the number of accumulated segments. Assuming the segments are independent for computation convenience [12], we have where (z 1 , z 2 , . . . , z ), ∈ { , } is the join probability distribution of dimensional vector (z 1 , z 2 , . . . , z ). This assumption is violated by our data in practice.
The decision rule with two thresholds and is With two thresholds, we have the option to increase or decrease which will increase the probability to make a correct decision (by waiting to accumulate more data or evidence) but decrease the probability of making a wrong decision (by delaying the decision). The error probabilities are defined as If (z 1 , z 2 , . . . , z ) ≥ is satisfied, we define the corresponding space of vector (z 1 , z 2 , . . . , z ) to be . With (11), we have Equation (14) is then integrated in yielding That is, Analogous reasoning for (z 1 , z 2 , . . . , z ) ≤ yields Thus, the two detection thresholds and are related to the error probabilities by The two kinds of error probabilities can be lowered by either increasing or decreasing . However, due to the limited number of segments, the indecision ratio will increase as the error probability | or | is decreased. Therefore, we may not obtain the optimal result by simply increasing or decreasing . The suitable and could be achieved by the following optimization criteria.
Under the assumption that the features follow a Gaussian distribution, we can take logarithm on both sides of (10) to obtain the log probability ratio (log PR), which leads a sequential probability ratio test (SPRT) as where is the Mahalanobis distance and is the log PR of the th segment.
We can derive the average for each class: For any given threshold pair and , the number of accumulated segments to make a correct decision, that is, stopping time, for class , is ( | ) which satisfies where inf (⋅) is the minimum element of set of . Generally, ( | = ) and ( | = ) may be different. Since the stopping time is a key point in the sequential analysis, we constrain the two thresholds by unifying ( | ) of two classes to be equal, that is, ( | = ) = ( | = ) ≜ . Then the thresholds are given by For any given stopping time , there is a corresponding threshold pair and . The decision rule with two thresholds and is From this decision policy, we can see that other than assigning one of the two classes and , the decision functions may still be undecided and continue testing to the next observation. The "undecided" response keeps the number of errors (false positives or false negatives) low, which is useful for avoiding making excessive mistakes to speed up decisions, for example, a BCI control wheelchair running into an obstacle [16]. In addition, when it is still undecided when reaching to the stopping time , we specify that when = , the decision rule is Till now, with the above decision rules, the consequent results, such as accuracy, mutual information (MI) [17], the steepness of MI [18], and average decision time, will only rely on the stopping time and the data to be analyzed. Depending on the actual specific needs, we can set the accuracy, MI, the steepness of MI, and average decision time as the optimization target, respectively, to determine the optimal stopping time opt . At the same time, the two thresholds are determined.

Feature Extraction.
To evaluate the performance of our method, we tested it using BCI Competition Datasets II and Dataset IIIb from BCI Competition III. They were obtained from four subjects, denoted as S1-S4. The task performed was based on left and right-hand motor imagination. The dimension of the projective base, that is, the length of the sliding window, is set to be 1 s. The time-domain waveforms of the optimal projective base of the two channels for subject S1 are shown in Figures 1(a) and 1(b). The corresponding frequency spectra are shown in Figures 1(c) and 1(d). The average projective power time courses during the right-hand (dash line) and left-hand (solid line) imagined movement for the C3 and C4 are displayed in Figures 1(e) and 1(f). From this figure, we can see that the projective bases are similar to modulated sine signals and the spectra have band-pass characteristics which are similar to that of wavelet base. For this subject, the projective power dominates in the rhythm. During the first 3.5 s (0.5 s after cue presentation), the projective power curves under two conditions are close; after 3.5 s, distinct difference in the projective power can be observed which provides a good classification feature. The projective bases for subjects S2 and S3 are similar to that of subject S1. The power projective base of subject S4 is shown in Figure 2. In contrast with subject S1, the reactivity patterns of the projective bases of this subject are quite different. The waveforms of the projective bases are oscillating faster. Obviously, the frequency of projective power is higher than that of subject S1. As seen in Figures 2(c) and 2(d), the projective bases display a peak in rhythms. Moreover, the results of the average projective power time courses demonstrate that the patterns of ERD/ERS subject S4 are quite different. That means this projective power method is subject-adaptive as well as avoiding the parameter setup in advance.

Classification
Results. Two kinds of experiments were performed to evaluate the performance of the proposed machine learning method. One is to evaluate the projective power feature extraction method and the other is to evaluate the SPRT classification performance. In the first one, for the purpose of benchmarking, we compared the classification accuracy (ACC), the mutual information (MI) with two benchmark feature extraction algorithms, and DFT and WT based on the sequential Bayesian classifier [17,18]. These methods were also applied to the data of subjects S1-S4. The classification accuracy (ACC) and mutual information (MI) of the three methods in consideration are listed in Table 1, where Avg. denotes the averaged indexes over all four subjects. The results of WT are derived from Lemm's method which won the BCI competitions in 2003 and 2005 for motor imagery datasets. From Table 1, we can see that the proposed power projective method outperforms two benchmark methods for every subject. Compared with wavelet method, the average ACC of our method increased from 84.5% to 87.6% and MI increased to 0.468 bits. Furthermore, a paired -test analysis was used to compare the classification accuracies of the three methods. The paired -test result confirms that with projective base method the ACC and MI across subjects are significantly higher than that with WT ( = 15.1, < 0.01) and with DFT ( = 8.432, < 0.01).
Then we compared the ACC and MI between several state-of-the-art nonsequential classifiers used in the BCI community and their sequential ones with the projective base feature. Those methods were LDA, SVM, and Bayesian, and the corresponding sequential methods were denoted as SLDA, SSVM, and SB. The tenfold cross-validation was carried out for classification tests of EEG data in this study; that is, the datasets for each subject were divided into ten subsets, and the following procedure was repeated ten times. Each time, one of the ten subsets was used as the test set and the other nine were used as the training set. The average recognition rate was evaluated across all ten folds. These methods were applied to the three BCI competition datasets described in Section 2.1 from thirteen subjects S1-S13. Table 2 shows the classification accuracy of all the methods on the competition test data, where "Avg." denotes the averaged results over all the subjects. By applying the obtained effective features with the power projective method on the nonsequential methods, a classification accuracy of 76.0%, 71.6%, and 75% was achieved by LDA, SVM, and Bayesian, respectively. The accuracies achieved by all of the corresponding sequential classifiers were greater than the nonsequential one. The paired t-test result confirms that with sequential classifier the accuracies across subjects are significantly higher than that with nonsequential ones ( = 6.15, < 0.01). Overall, the classification accuracy is higher when the sequential scheme is adopted.
Additionally, it is also important to consider the stopping time for speedy BCI applications without sacrificing accuracy. Therefore, we further analyze the classification accuracy, the MI, and stopping time of the SPRT and compared the timeaccuracy trade-off between SPRT and SB. The classification results of the SPRT are provided in Table 3. By applying the obtained effective features with the power projective method on the SPRT, an average classification accuracy of 84.1% was achieved. The proposed SPRT classifier with the projective base features outperforms the SB classifier with the same feature extraction method in terms of classification accuracy across all the subjects except for subject S2. In terms of average accuracy, SPRT outperforms SB classifier. Analyzing the overall results, it can be concluded that SPRT with the projective base feature extraction method outperforms the state-of-the-art methods on the standard datasets.
The resulting time courses of the accumulative classification information, the so-called accumulative evidence, of our SPRT method and SB method are shown in Figure 3. Figure 3(a) shows that the SB classification method gains information from around 4 s (3 s preparation period and 1 s window length). The cumulative Bayesian posterior probabilities reach the extremum at around 5.5 s, indicating a peak decision confidence at this time. However, the accumulative information falls down at the end of trial. The result shows that the effective control takes place during the middle of a trial. More evidence could not increase the classification performance any further.
Compared with the SB method, the cumulative process of SPRT has monotonicity (shown in Figure 3) which makes it possible to improve the accuracy with more evidence available. During the SPRT classification process, once the accumulative evidence exceeds one of the thresholds, an immediate decision will be given. Moreover, the thresholds can be adjusted to change the decision time. That is to say, with two broader threshold choices, a larger number   Figure 3: The average accumulative process of classification information for subject S1: (a) SB method; (b) SPRT method. of observations may be required to improve the accuracy, and vice versa. This can be seen from Figure 3(b), given the expected stopping time 1 , the thresholds are 1 , 1 . When the expected stopping time is set to be 2 , the thresholds change to be 2 , 2 . Obviously, it will achieve a higher accuracy with more decision time. This depicts the inherent tradeoff between decision time (costs) and accuracy (benefits) of the SPRT method. The resulting time courses for the classification accuracy, the MI, and SMI (calculated as MI(t)/( − 3 s) for > 3.5 s) for subject S1 are presented in Figure 4. The SMI quantifies the response time. During the first 4 s, the classification performs at a rate no better than chance. Afterwards a steep ascent in the classification accuracy can be observed, meanwhile reflecting a raising MI. The increase of MI indicates an increase in separation ability between left-and right-hand motor imagery. The maximum classification accuracy and the maximum MI are achieved at about 6.4 s and 7.1 s, respectively. The observation of nonmonotonic relationship between accuracy and the time is due to the limited data available which leads to the final decision. With consideration about time, the maximum steepness of MI is obtained at around 4.6 s. The general sequential framework of the present approach is customizable to suit different task objectives, such as improving accuracy, MI, or steepness of MI. However, the optimization of one particular objective will come at the expense of the others.

Conclusion
In this paper, we present a SPRT method in conjunction with power projective base method to recognize mental states. The power projective method was first developed to determine features by maximizing the average projection energy difference of the two types of signals. With the accumulative evidence curve, the proposed SPRT sets the two-constrained thresholds based on a desired expected stopping time. The SPRT method adds the benefit of a customizable trade-off between accuracy and decision speed. Specifically, the thresholds in this method were determined without predefined error probabilities. Using standardized datasets, improved performances were demonstrated.
Although this study suggests that SPRT would be useful to balance the speed and accuracy for different BCI applications, we realize that further work requires investigation. Future work will attempt to validate the proposed method with a larger dataset and implement it in our BCI-actuated robotic system. Moreover, we will investigate the multiway SPRT theory and its application to the multiclass BCI systems.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.