Correlation Assisted Strong Uncorrelating Transform Complex Common Spatial Patterns for Spatially Distant Channel Data

The Strong Uncorrelating Transform Complex Common Spatial Patterns (SUTCCSP) algorithm, designed for multichannel data analysis, has a limitation on keeping the correlation information among channels during the simultaneous diagonalization process of the covariance and pseudocovariance matrices. This paper focuses on the importance of preserving the correlation information among multichannel data and proposes the correlation assisted SUTCCSP (CASUT) algorithm to address this issue. The performance of the proposed algorithm was demonstrated by classifying the motor imagery electroencephalogram (EEG) dataset. The features were first extracted using CSP algorithms including the proposed method, and then the random forest classifier was utilized for the classification. Experiments using CASUT yielded an average classification accuracy of 78.10 (%), which significantly outperformed those of original CSP, Complex Common Spatial Patterns (CCSP), and SUTCCSP with p-values less than 0.01, tested by the Wilcoxon signed rank test.


Introduction
Noninvasive measurements of physiological signals including electroencephalogram (EEG), electrocardiogram (ECG), and electromyogram (EMG) have become widely used throughout the biomedical industry [1][2][3][4][5]. Out of the various feature engineering methods, researchers have shown that the common spatial patterns (CSP) algorithm is a strong feature extraction algorithm for multichannel EEG data, yielding high performance for classification problems [6,7]. CSP is a mathematical methodology to decompose spatial subcomponents of multivariate signals, whose variance difference between two classes is maximized [8]. CSP designs spatial filters for the multichannel EEG signals based on the spatial distribution of neural activities in the cortex areas [6,7] and adopts a supervised learning approach, while the other spatial filter algorithms such as principal component analysis (PCA) and independent component analysis (ICA) are designed in an unsupervised manner [9,10].
Furthermore, a complex version of CSP, termed CCSP, uses the covariance matrix that maintains the power sum information of the real and imaginary parts of the complexvalued data [11]. Another complex-valued CSP algorithm, termed analytic signal-based CSP (ACSP), was proposed by Falzon et al. to discriminate different mental tasks [12,13]. However, given that the Hilbert transformed analytic signals could only produce circular signals (rotation invariant probability distribution) and that physiological signals are improper (mismatch of power between different channel data), the augmented complex CSP was introduced to fully exploit the second-order statistics of noncircular complex vectors [11,14].
Strong Uncorrelating Transform CCSP (SUTCCSP), which is an advanced version of the augmented complex CSP, was applied to the two-class classification problem of motor imagery EEG and produced a minimum of 4% improvement over the conventional CSP, ACSP, and augmented CSP [11]. This is due to the power difference information preserved in the pseudocovariance matrix, accompanied with the sum 2 Computational Intelligence and Neuroscience of power maintained in the covariance matrix. However, during the simultaneous diagonalization process of the covariance and pseudocovariance matrices, the correlation term vanishes owing to the process of applying the strong uncorrelating transform [11,15,16]. Such effort to preserve correlation has not been made so far for the CSP algorithm, and the correlation assisted version of SUTCCSP is newly proposed in this paper.
The basic terminologies and procedure of SUTCCSP and the proposed method are explained in Section 2, followed by extensive simulation results on the benchmark motor imagery dataset of 105 subjects in Section 3. Finally, the concluding remarks are given in Section 4 with additional discussions in terms of the performance difference of distinct channel pairs that have less correlation compared with results of Section 3.

Proposed Method
Here we explain SUT based on the terminologies used in [9,14] and show how the correlation information is utilized with CSP algorithms [11,16].
Let be a complex-valued random vector such as where is √(−1), is the real part, and is the imaginary part of a complex random vector. X is a zero-mean complexvalued matrix consisting of values with the form of (1), where denotes the two different classes, ∈ {1, 2}. X has the dimension of the number of channels by the number of samples. Then the covariance (C) and pseudocovariance (P) matrices are defined as follows: where (⋅) is the statistical expected value operator and (⋅) H is the conjugate transpose. Then, we can define the composite covariance (C ) and pseudocovariance (P ) matrices as follows: Here C can then be decomposed into Θ and Λ as follows: where Θ has eigenvectors in each column for the corresponding diagonal eigenvalues of Λ . Note that Θ and Λ consist of real elements and the nondiagonal elements of Λ are zero. This allows C to be whitened by the whitening matrix Φ = Λ −1/2 Θ H in the original CCSP algorithm, resulting in ΦC Φ H = I, where I denotes the identity matrix [11].
Using the whitening matrix Φ = Λ −1/2 Θ H from the original CCSP algorithm [11], the pseudocovariance matrix can also be decomposed using Takagi's factorization as shown in the following equation [17]: where Δ and Λ are yielded by symmetric matrices. This leads to a derivation of the strong uncorrelating transform matrix S as follows: Using the matrix S, it is now possible to diagonalize the covariance and pseudocovariance matrices simultaneously. Firstly, the composite covariance matrix can be diagonalized as follows: where Y and Λ are the estimations of eigenvectors and eigenvalues of M, respectively. Next, the composite pseudocovariance can also be diagonalized as follows: whereŜ is the strong uncorrelating transform matrix for the pseudocovariance andŶ andΛ are the estimations of the eigenvectors and eigenvalues ofM, respectively. Therefore, the two spatial filters W andŴ can be designed as follows: Finally, the spatially filtered vector, Z, is calculated as follows: Computational Intelligence and Neuroscience 3 Let be the number of data channels, and the th row vector in Z; where corresponds to each row of the new matrix Z . Now the final subfeatures, and , by SUTCCSP are calculated as follows: where varies between 1 and 2 and var(⋅) is the variance of (⋅). Here, selecting one pair of filter is equivalent to choosing the first and last rows in each real and imaginary part of the covariance and pseudocovariance matrices, separately. The number of filter pairs was chosen to maximize the performance for each subject. Such consideration of selecting the appropriate number of filter pairs could be important in real time applications. Next, Pearson's correlation coefficient for and is calculated as follows [17]: where std(⋅) is the standard deviation of (⋅) and is the mean of . The maximum number of correlation coefficients between the real and imaginary parts of (1) is equal to the number of channel pairs due to the multichannel attribute of the data. The high dimension of the number of channel pairs should be reduced owing to the curse of dimensionality. PCA is applied to reduce the high dimension in this paper, due to its simple implementation and fast speed [18,19].
Let Γ be the matrix containing , for ( − 1)/2 channel pairs, where is the number of channels. By applying PCA to the correlation coefficient matrices, the principal component coefficients, known as loadings, are estimated [20]. Here we will define Ψ as an -by-matrix of loadings, where L is the reduced number of dimensions. An additional subfeature containing the correlation information of two data channels is calculated as follows: The final feature matrices for two different classes are , , and for each class. In this paper, the covariance matrix information from the original CSP is added to the feature matrices of CCSP, SUTCCSP, and CASUT, which could provide a fair test to compare CSP with these three algorithms. Accordingly, the feature matrices of CASUT were designed to contain the information of variance, power sum, and difference, as well as the correlation information lost due to the strong uncorrelating transform.

Experiments
3.1. Data Acquisition. As Park et al. used the Physiobank Motor Mental Imagery (MMI) database to test the performance of SUTCCSP, this study uses the same dataset in order to compare the proposed CASUT with the former CSP algorithms including SUTCCSP [11,[21][22][23]. Out of the 109 subjects who conducted the left-and right-hand motor imagery tasks, three subjects (S088, S092, and S100) had damaged recordings, and one subject (S104) had an insufficient amount of data [15,24]. For these reasons, 105 subjects were used to examine the classification accuracy of CASUT. All subject data consist of 45 trials of performing the left-and right-hand tasks, which were recorded using 64 channel electrodes with the 10-10 EEG system and sampled by 160 Hz [25].
In order to verify the performance of CASUT in preserving the correlation information, the channel pairs that yield high correlation coefficients were selected (values over 0.9 and less than or equal to 1). All trials for the left-hand motor imagery task of 105 subjects were combined into one single trial set, and the correlation coefficients of the all possible distinct 2016 pairs among the 64 channels were calculated. Then the average of the correlation coefficient values over all trials of the left-hand task was calculated, in order to determine which channel pair has a high correlation coefficient. The same calculation was conducted on the trials of the right-hand motor imagery task. The channel pairs, whose correlations were in the range of the following equation, were denoted as where ( , ) is a pair of two distinct channels and , , are the correlation coefficients between and , and is a number in the range of 0 ≤ ≤ 9.
The EEG recordings were preprocessed using the fifthorder Butterworth IIR bandpass filter extracting the frequency components into 8-25 Hz [6,26,27]. Such preprocessing techniques were identical to the preprocessing techniques used by Park et al. [11].

Analysis of 105
Subjects. The average classification accuracies over all 105 subjects were calculated in order to compare the proposed algorithm with CSP, CCSP, and SUTCCSP. Table 1 shows the average classification rates with the standard deviations for each algorithm. Note that the classification rate of CASUT outperforms those of CSP, CCSP, and SUTCCSP.
The normality was tested to determine whether to use the parametric or nonparametric version of a statistical test 4 Computational Intelligence and Neuroscience    Table 2 show that the classification accuracies of CSP algorithms could not always satisfy the normality assumption [28]. Therefore, the nonparametric Friedman test was used instead of the parametric ANOVA, to compare three or more matched groups regardless of their normality [29,30]. The -value for the Friedman test, which was less than 10 −15 , indicates that it is safe to perform the post hoc test. Instead of the parametric paired Student's -test, the Wilcoxon signed rank test, which can be used regardless of the normality, was conducted as the post hoc test [28]. Although the average classification accuracy difference between CASUT and SUTCCSP looked small, the Wilcoxon signed rank test performed on the accuracies of the two algorithms yielded significant -values (<0.05), as shown in Table 3. The -values, 1 , 2 , and 3 , indicate the results of the Wilcoxon signed rank test conducted on the classification accuracies of CASUT compared with those of original CSP, CCSP, and SUTCCSP, respectively.

Analysis of Significant Subjects.
For a thorough validation of the classification performances of the CSP algorithms, an additional analysis that was conducted by Park et al. was adopted by selecting the significant subjects prior to any further analysis [11]. This is crucial due to the possibility of little brain network information in the recorded EEG and activities of poorly performed subjects, based on the study of Ahn and Jun [31]. For these reasons, the subjects were categorized as significant, when the performance of each subject exceeded the minimum classification accuracy of 64%, defined using the confidence limit of 95% [32]. Figure 1 shows the number of significant subjects for each CSP algorithm. It can be observed that the number of significant subjects using CASUT was the highest out of all four CSP algorithms. The results throughout this chapter were based on the histograms of Figure 1, from which the data of the significant subjects were chosen for further analysis.  Table 4 lists the average classification accuracies over the significant subjects and their standard deviations for CSP algorithms. It can be also noted that the average classification rate of CASUT outperformed those of CSP, CCSP, and SUTCCSP. The KS test was also performed for the significant subjects. However, the results from Table 5 indicated that the classification accuracies of the CSP algorithm do not follow the normal distribution. Accordingly, the Friedman test, which can be used regardless of the normality, was conducted. The -value from the Friedman test yielded a value less than 10 −12 , and thus the post hoc test was conducted and shown in Table 6. Note that the low -values (<0.01) by the Wilcoxon signed rank test demonstrate the enhanced performance of CASUT.
Additional plots of the error bar and whisker diagram of the classification accuracies of CSP, CCSP, and CASUT were illustrated in Figures 2 and 3, respectively. The blue crosses in Figure 2 were identical to the average classification rates shown in Table 4. The red lines in Figure 3 indicate the median classification rates, and it can be observed that the median of CASUT outperforms those of the other three CSP algorithms. The superiority of CASUT over the other CSP algorithms was also confirmed by the Wilcoxon signed rank test results in Table 6.
In Figure 4, the scatterplots comparing classification rates of CASUT with CSP, CCSP, and SUTCCSP were displayed. The red dots above the dotted green lines indicate that classification rates were higher by CASUT than the other CSP algorithm. The black dots mean that CASUT and the compared CSP algorithm have the same classification rates, Computational Intelligence and Neuroscience 5  Classification Rate (%) Figure 2: Error bar of the classification accuracies of CSP, CCSP, SUTCCSP, and CASUT. Note that CASUT produces higher classification rates compared with those of the other CSP algorithms, which is confirmed by the Wilcoxon signed rank test results of Table 6. and blue means that the performances of CASUT are lower than those of the compared CSP algorithm. This demonstrates the fact that the majority of classification accuracies by CASUT were higher than those of the other CSP algorithms. Additionally, when two or more subjects yielded the same classification accuracies by two of the different algorithms, the dots for the subjects in these figures were duplicated. Therefore, the number of selected subjects in Figure 1 and the number of dots in Figure 4 may differ.
Lastly, the number of subjects, classified significantly using CASUT and classified insignificantly using the other CSP algorithms, was counted and shown in Figure 5. The bar chart indicates the number of subjects that were classified as significant by CASUT, but not by CSP, CCSP, and SUT, respectively.
On the other hand, there was only one subject whose data was classified as insignificant by CASUT, while the other CSP algorithms classified it as significant. These results also demonstrate the superiority of CASUT over the other conventional CSP algorithms.

Analysis of Correlation Assisted CSP.
The various versions of CSP algorithms were additionally investigated for further interpretation of the effects of correlation information on the features of motor imagery tasks. To this end, correlation assisted CSP (CACSP) is defined as a CSP algorithm containing the correlation information, whereas correlation assisted CCSP (CACCSP) is defined as CCSP including the correlation information. The benchmark tests including CSP, CACSP, CCSP, CACCSP, SUTCCSP, and CASUT could provide an exact interpretation of the effects of correlation information on the features of the motor imagery tasks. Table 7 lists the average classification rates calculated using CSP, CACSP, CCSP, CACCSP, SUTCCSP, and CASUT in the same conditions set in Table 4. Friedman test was conducted and a -value less than 10 −15 was confirmed. In Table 8   gradually increasing classification rates could be expected as shown in Table 4.
Similarly, the performances of CSP and CCSP increase as the correlation information is added to their original features. Additionally, the highest classification accuracy in these benchmark tests was yielded using CASUT, indicating that CASUT outperforms all former CSP algorithms introduced so far.

Discussion and Conclusion
The correlation range chosen to evaluate the performance of CASUT was 9 , based on (15). As shown in Figure 6, the number of channel pairs for each correlation range ( 0 to 9 ) differs from zero to 301 channel pairs. In order to examine the effects of the correlation information on the CSP algorithms, the average classification accuracies over 105 subjects across different correlation ranges were calculated based on the same analysis in Section 3. Results demonstrate that the performance of CASUT gradually decreases as the correlation information is degraded as shown in Figure 7. Additionally, Figure 8 illustrates the resulting -values estimated using the Wilcoxon signed rank test on CASUT compared with SUTCCSP, indicating less significance with small correlation coefficients. This proves that CASUT is the most effective feature extraction approach, when sufficient correlation information exists among the multichannel data.
The limitations of SUTCCSP have been addressed in this study due to the loss of the correlation information during the simultaneous diagonalization process of the covariance and pseudocovariance matrices. To that end, the correlation assisted version of SUTCCSP, denoted by CASUT, has been proposed for the first time by preserving the correlation information among multichannel data. The proposed algorithm was tested on the two-class motor imagery classification problem, and its classification accuracies obtained using the channel pairs with high correlation were significantly improved by CASUT compared with those of CSP, CCSP, and SUTCCSP, with -values less than 0.01. Additional experiments on the various ranges of correlation prove that the correlation information is crucial to the classification of the two-class motor imagery tasks and that CASUT yields the highest classification accuracies compared with the other CSP algorithms.