Sinc-Windowing and Multiple Correlation Coefficients Improve SSVEP Recognition Based on Canonical Correlation Analysis

Canonical Correlation Analysis (CCA) is an increasingly used approach in the field of Steady-State Visually Evoked Potential (SSVEP) recognition. The efficacy of the method has been widely proven, and several variations have been proposed. However, most CCA variations tend to complicate the method, usually requiring additional user training or increasing computational load. Taking simple procedures and low computational costs may be, however, a relevant aspect, especially in view of low-cost and high-portability devices. In addition, it would be desirable that the proposed variations are as general and modular as possible to facilitate the translation of results to different algorithms and setups. In this work, we evaluated the impact of two simple, modular variations of the classical CCA method. The variations involved (i) the number of canonical correlations used for classification and (ii) the inclusion of a prefiltering step by means of sinc-windowing. We tested ten volunteers in a 4-class SSVEP setup. Both variations significantly improved classification accuracy when they were used separately or in conjunction and led to accuracy increments up to 7-8% on average and peak of 25–30%. Additionally, variations had no (variation (i)) or minimal (variation (ii)) impact on the number of algorithm steps required for each classification. Given the modular nature of the proposed variations and their positive impact on classification accuracy, they might be easily included in the design of CCA-based algorithms that are even different from ours.


Introduction
A Brain-Computer Interface (BCI) is a system enabling direct communication between the brain and the outside, as it directly translates the recorded neural activity into a control signal for an external device (e.g., a computer, a machine, or a speller) [1]. Among noninvasive systems, electroencephalography-(EEG-) based BCIs are the most widespread [2], and they can rely on four possible electrophysiological sources: slow cortical potentials (SCPs), event-related desynchronization/synchronization (ERD/ERS), event-related potentials (as P300), or Steady-State Visually Evoked Potentials (SSVEPs) [3]. Among these, SSVEP-based BCIs are appealing for their high accuracies and information transfer rate (ITR), thanks to the high signal-to-noise ratio of SSVEPs even without user training [4]. For this reason, SSVEP-based BCIs have been raising increasing attention over the years [5,6].
SSVEPs are periodic brain signals elicited over the occipital cortex by visual stimulations with frequencies higher than 6 Hz [7]. In case different flickering objects (LEDs, symbols, and squares) are simultaneously presented, an analysis of the SSVEP spectral content permits to reconstruct which stimulus the user is focusing on.
Traditionally used methods perform SSVEP recognition based on power spectral density analysis (PSDA) [7]. In PSDA-based approaches, spectral powers are estimated from the EEG spectrum at the target stimulation frequencies and used as a feature for classification [8][9][10]. However, PSDAbased methods can suffer from noise sensitivity if few channels are acquired, besides requiring relatively long signal portions (e.g., >3 s) to estimate the spectrum with a sufficient frequency resolution [11][12][13]. A promising and increasingly used approach, which has recently attracted the interest of researchers [14][15][16][17], is the one based on Canonical Correlation Analysis (CCA) [7].
CCA is a multivariate statistical method able to reveal the underlying correlation between two sets of data [18]. For SSVEP recognition, CCA is performed several times between the considered EEG segment and a set of sine-cosine 2 Computational Intelligence and Neuroscience reference signals modeling the pure SSVEP responses to each stimulation frequency [7]. The frequency response showing highest correlation with the analyzed EEG portion is finally recognized as the observed one.
The efficacy of the CCA approach has been widely proven, and its superiority to PSDA in terms of speed, accuracy, and computational load has been shown [19,20]. For this reason, several CCA variations have been proposed over the years [11-13, 15, 21-26].
Some CCA variations, as [11-13, 15, 21, 23], modified the SSVEP reference signals by including subject-specific features from each user's EEG. The work in [24] enriched the algorithm with incorporating intersubject information from the signals of multiple subjects. In [25], an effort was made towards compensating the natural decrease in signal-tonoise ratio of SSVEPs at higher stimulation frequencies by correcting classification gains based on the shape of individual background EEG. Finally, in [22,26], CCA was repeated multiple times for each stimulation frequency, each time processing the signal with a different IIR bandpass filter, to combine different aspects of the same EEG response.
Although each introduced variation produced significant increments of classification accuracy, all of them tended to increase the complexity of the algorithm. They indeed either required additional user training, to incorporate information from individual EEG data [11-13, 15, 21, 23], or increased computational load by multiplying the number of CCAs to assess each stimulation frequency [22,26]. However, we believe that even taking simple procedures and low computational costs may be relevant, especially to favor the spread of low-cost and high-portability devices. In addition, it would be desirable that variations are as general or scalable as possible to facilitate the translation of results to different setups.
Given these premises, this work presents two simple and modular variations based on the classical CCA method. The variations regard (i) the number of correlations considered for classification and (ii) the preprocessing of the signals. We show that both modifications can significantly improve classification accuracy but still leaving the whole procedure training-free and with no (variation (i)) or minimal (variation (ii)) impact on the number of steps required for each SSVEP identification.

The Standard CCA Method for SSVEP Recognition.
Canonical Correlation Analysis (CCA) is a multivariate statistical method [18] used to reveal the underlying correlation between two sets of data. Given two sets of random variables X ∈ R 1 × and Y ∈ R 2 × , CCA finds the two corresponding sets U = AX ∈ R 1 × and V = BY ∈ R 2 × (linear combination of the original ones through A ∈ R 1 and B ∈ R 2 ), called canonical variables, so that the correlation between each pair or rows ( , ) is maximized: with leaving ( , ), ( , ), and ( , ) uncorrelated if ̸ = . Each CCA leads to a number of solutions equal to the minimum between the numbers of rows in A ( 1 ) and B ( 2 ). The solutions , sorted in descending order, are called canonical correlations and are a measure of the similarity between the two sets of original data.
The use of CCA in the field of SSVEP recognition was first proposed by Lin et al. in [7]. Given stimulation frequencies to be distinguished, CCA is performed times, one for each stimulation frequency , between the multichannel EEG signal in X ∈ R ch × ( ch acquired channels, time samples) and a set of sine-cosine reference signals in Y ∈ R 2 harm × modeling the pure SSVEP responses. Each set Y is composed as follows: sin (2  ) cos (2 2 ) sin (2 2 ) . . .

cos (2 harm )
sin (2 harm ) ) ) ) ) ) ) ) ) ) ) , where is the stimulation frequency, is the sampling rate, and harm is the number of harmonics included in the analysis.
Every CCA generates a vector of canonical correlations ( 1 , 2 , . . . , min( ch ,2 harm ) ), of which only the first and largest one, 1 , is used as a feature for classification. The analyzed EEG segment in X is indeed assigned to the stimulation frequency leading to the maximum correlation 1 :

Variation 1: Number of Considered Canonical Correlations.
Although the efficacy of the CCA method for SSVEP recognition has been widely proven [14,16] and many variations have been proposed [11-13, 15, 21-27], the majority of approaches consider only the first canonical correlation as a feature for classification. Nevertheless, as already noted by Lin et al. [7], since real EEG signals may be contaminated by noise and show phase transitions, the information might be spread over more than one correlation coefficient. As a first variation of the algorithm, we evaluated the impact of taking a combination of more than one correlation coefficient as a feature for classification, following preliminary results in [28]. Since the canonical variables in U and V are estimated so that each couple ( , ) and ( , ) are uncorrelated for ̸ = and the sine-cosine waves in the reference signals Y are orthogonal between each other, the information contained in each set of canonical variables will always be in quadrature with respect to the others. For this reason, we propose combining the corr considered correlations with using the Euclidean norm: The resulting combination would be used as a feature for classification in place of the largest canonical correlation 1 only. The number corr can range from 1 to the minimum between ch and 2 harm , with ch number of acquired channels and harm number of considered harmonics. In this work, we employed ch = 8 EEG channels (see Section 2.4 for details) and harm = 3 harmonics, so we explored the impact of taking all the possible numbers of considered correlations between 1 and 2 harm .

Variation 2:
Preprocessing with Sinc-Windowing. Another possible variation with respect to literature may consist in adding a preprocessing step to the EEG segments before performing CCA. If we exclude the works in [22,26], employing IIR filter banks, CCA is indeed typically applied without any prefiltering of the EEG signals. Nevertheless, we believe that a narrow-band prefiltering step around the employed stimulation frequencies and their harm harmonics might be useful to increase the signal-to-noise ratio, expectantly enhancing classification accuracy. As a second variation, we evaluated the influence of such type of prefiltering with using a sinc-windowing implementation. The technique of sinc-windowing consists in the convolution of the analyzed signal with an adequately modulated sinc function. As it is known, the inverse Fourier transform of an ideal rectangular band-pass filter centered in 0 and with bandwidth is where f is the frequency and −1 is the inverse Fourier transform. Thus, the filtering around the stimulation frequencies and harm harmonics can be accomplished by means of a convolution with the following function: where is the bandwidth (in this work, = 1 Hz), harm is the number of harmonics, and are the stimulation frequencies.
SSVEP stimulation was provided through four blue LEDs, arranged around a PC monitor. Each LED flickered at a different stimulation frequency ( 1 = 8 Hz, 2 = 9 Hz, 3 = 10 Hz, and 4 = 11 Hz). The four stimulation frequencies were selected before the beginning of the study and were the same for all subjects. All stimulations were provided with a 50 percent duty-cycle. The behavior of the LEDs was controlled by a LabVIEW-Arduino interface.

Experimental
Paradigm. Ten healthy volunteers (aged 22 to 26, 4 males and 6 females) participated in the study. All of them had normal, or corrected to, normal vision. During the experiment, the participants sat on a comfortable chair, with their arms relaxed and their head still, approximately 60 cm distant from the PC monitor.
The experiment was organized into runs and the runs were organized into trials. Each participant underwent a total of 4 runs, each comprising 16 trials. Each trial consisted of three subsequent phases: a 1 s preamble, a 12 s stimulation, and a 2 s break period. During the preamble, a yellow square appeared on the screen indicating the target LED; then all LEDs started simultaneously flickering during stimulation, and the trial ended with a break period, where the LEDs shut off and the square disappeared. The order of the target LEDs was randomized and counterbalanced in each run, so that each LED was gazed for the same amount of time. To summarize, each experiment included a total of 4 runs × 16 trials × 12 seconds = 768 seconds of stimulation, that is, 192 seconds for each class.

Performance Evaluation.
For each subject, we evaluated the average classification accuracy at the end of each run. To highlight the impact of the two proposed variations (composition of the feature vector and sinc-windowing), all accuracies were recomputed using all the possible combinations of methods, that is, a number of considered correlations from one to corr = 6, with or without sinc-windowing. To evaluate the influence of considering different lengths of EEG signal for SSVEP recognition, all accuracies were recomputed with considering signal portions ranging from 0.5 s to 5 s, although the detailed results of statistical tests will be reported only in the case of a 1.5 s window length.
Another commonly used measure of BCI performance, encompassing the concepts of speed, accuracy, and number of choices, is the measure of information transfer rate (ITR), expressed in bit/min. For reasons of completeness, ITR was also provided, and it was computed according to [29] ITR (bit/min) where = 4 is the number of choices, is the classification accuracy (expressed between 0 and 1), and is the epoch duration (in seconds). Accuracy (%) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *  For the sake of comparison with other CCA-based literature methods that might be related to ours, we finally recomputed classification accuracies with the method of Chen et al. in [26], employing IIR filter banks, while we omit the comparison with [22] as not reasonably adaptable to our setup.

Statistical Analyses.
At first, we compared each accuracy to chance level. The value of chance level was obtained by running the simulations as descripted in [30] in the case of a 4-class BCI and taking the upper bound of the confidence interval at = 1% significance, as an analytical expression of chance level was not available for the multiclass case. As concerns statistical comparison between methods, we had to account for the fact that multiple data came from the same subject; that is, the samples could not be assumed to be completely independent. For this reason, instead of using paired-samples -test to compare each method against the others, we ran all evaluations as post hoc tests of a repeatedmeasures ANOVA. The ANOVA design included both the factors "method" (the within-subject factor) and "subject," thus taking into consideration all dependencies among data. Post hoc tests were carried out using Bonferroni correction. The use of parametric statistical tests was justified by the normality of data distributions, as confirmed by the application of a preliminary Kolmogorov-Smirnov test.

Results
The classification accuracies of each subject, run, and method are detailed in Table 1 and summarized in Figure 1. The last two rows of Table 1 indicate the average and peak increment of each method with respect to standard CCA (first column). All the obtained accuracies were significantly higher than chance, as the upper bound of the confidence interval for chance level (with a significance of = 1%) in this particular setup was 30.27%. In Table 2, the results of the post hoc comparisons (Bonferroni-corrected) between each pair of methods are reported. In Figure 2, the accuracy curves of all the considered methods, evaluated with different windows lengths, are shown. In order to avoid redundancies, the detailed ITRs for each subject, run, and method are omitted, as they can be easily computed from the accuracy results in Table 1 and according to (7). Nevertheless, Table 3 reports the average and peak ITR of each combination of methods, together with the average and peak increment in ITR with respect to classical CCA, in the same manner as reported in the last rows of Table 1.
Both proposed variations were able to significantly improve classification accuracy. As regards variation 1, the results in Tables 1 and 2 and Figure 1 clearly show how the consideration of more than one canonical correlation significantly increases classification accuracy in both the sinc-windowing and no-sinc-windowing conditions. Nevertheless, while accuracy significantly increases ( < 0.001, both with or without sinc-windowing) when switching from one to two canonical correlations or from two to three canonical correlations ( < 0.001, in the no-sinc-windowing condition), the increment generally becomes insignificant when taking four, five, or six canonical correlations, with respect, for example, to three. As concerns variation 2, that is, the inclusion of a prefiltering step around the stimulation frequencies and harm harmonics by means of sincwindowing, the results show how this kind of preprocessing always outperformed (with statistical significances ranging from < 0.001 to < 0.01) the corresponding version without processing. Accordingly, when variation 1 and variation 2 were combined, classification accuracy was a fortiori significantly ( < 0.01 or < 0.001) increased with respect to the standard CCA method. To give an example, the accuracies obtained with using four canonical correlations and sincwindowing were averagely increased by 8.20% with respect to the standard CCA method, with a peak increment of even 31.25% (in S08, run 2).
When varying the length of the EEG portions used to recognize the SSVEPs, the behavior of the proposed variations on classification accuracy tended to be confirmed, with the only exception of the 0.5 s window length (Figure 2). While the consideration of more than one canonical correlation always outperformed the use of the largest one only, the positive impact of sinc-windowing emerged only for window lengths greater than 0.5-1 s.
When finally recomputing accuracies with the filter bank CCA method proposed in [26], we confirm that the latter performed significantly ( < 0.001) better than standard CCA. However, the increase in accuracy produced by [26] was not statistically different from some of our proposed variations. Notably, accuracy results obtained with the combinations of four, five, or six canonical correlations and sinc-windowing processing were not statistically different from the results of filter bank CCA [26].

Discussion
Our results show how the simple consideration of more than one canonical correlation can significantly improve the achievable accuracy without any increment of computational load. As already suggested by Lin et al. [7], real EEG signals are affected by noise and can show phase transitions; therefore the information might be spread over more than one correlation coefficient. From a theoretical point of view, if the EEG signals (in the X matrix) were almost unaffected by noise and shared the same phase across electrodes (i.e., the rows in X), then the consideration of only the first canonical correlation would be sufficient to capture the majority of information. As indeed the sine-cosine waves in the rows of each Y are an orthogonal basis, CCA would be able to find that particular linear transformation of Y able to explain the behavior of the SSVEP response in X through maximizing the correlation between a linear combination of X (the EEG signals) and Y , without leaving information behind. However, as X is a multichannel set of data, if we suppose that the SSVEP response might show a different phase across electrodes (i.e., X rows), then at least a second set of canonical variables would be needed to explain the data, and the second set ( 2 , 2 ) would contain a complementary information with respect to ( 1 , 1 ). If we further suppose that, at the same EEG location, the different harmonics of the same SSVEP response might show different delays between each other, then at least another set of canonical variables ( 3 , 3 ) would be needed to capture the information of the SSVEP response not included in the first two sets.
We suggest that all the above-introduced suppositions are likely to be true in real EEG signals. Supposing indeed that the SSVEP response is generated in a limited area of the occipital cortex, it will undergo different delays to reach the different locations of electrodes, due to a delay in spatial transmission. However, we suggest that the second supposition also is reasonable in real EEG. Given indeed the origin of SSVEP in the occipital cortex, the signal has to pass through multiple tissue layers (fluids, bone, and skin) before reaching each EEG location. This is likely to produce phase distortion between different frequency components, besides the well-known spatial blurring effect.
The above-described interpretation fits the experimental results well; indeed the accuracy significantly increased when switching from one to three canonical correlations. We consequently suggest that the consideration of more than one canonical correlation permits to encompass a more complete information on the investigated frequency , and this finally translates in an increased accuracy, revealed in almost every subject and run. From the third set of canonical variables on, we hypothesize that the amount of information described by each correlation depends on each user's individual characteristics, for example, the amount of delay across different harmonics and electrodes, as well as the differential amplitude of the SSVEP response between different harmonics of the same stimulation frequency. According to this hypothesis, from the fourth canonical correlation on, there would not be a group effect anymore, and this would explain why the accuracy increments in the experimental data are not significant anymore.
Besides recommending the consideration of more than one canonical correlation, our results also highlight the positive impact of prefiltering before CCA performance. The presence of a filtering stage around the stimulation frequencies and related harm harmonics may have permitted 8 Computational Intelligence and Neuroscience Table 3: Average and peak ITR (bits/min) of each combination of methods, together with average and peak increment with respect to classical CCA, with a window length of 1.   to enhance the SSVEP response from the background EEG, and this finally translated in a significantly increased accuracy in every considered comparison between corresponding versions of the method, with or without prefiltering. The idea of exploiting band-pass filters to enhance different SSVEP components had been already introduced in the works of Chen et al. [26] and Islam et al. [22], suggesting the use of IIR filter banks. However, both algorithm implementations in [22,26] were proposed to perform multiple prefilterings of the same EEG portion, thus multiplying the number of CCAs to assess each stimulation frequency. Despite being able to produce a significant increase in classification accuracy, this implies a multiplication of the total number of steps required in each SSVEP recognition, with a related sensible increment of computational load. Besides being a novelty with respect to literature, the implementation of the prefiltering by means of sinc-windowing has the advantage of being able to filter multiple frequency components in one single step, by simply modulating the composition of the convolved function. This implies that one more single step is added to each SSVEP recognition independently from the number of stimulation frequencies or harm considered harmonics, thus overall remaining computationally light. A potential limitation of the sinc-windowing technique might be related to the length of the considered signal portions, due to the Gibbs truncation effect [31]. As indeed shown in Figure 2, while for segment lengths longer than 1 s sinc-windowing increased the achievable accuracy, it turned to have even a negative impact when considering a short signal portion of 0.5 s. Figure 2(b) integrates the information of Figure 2(a), reminding that an increase in window length may cause a decrease in ITR (as deducible from (7)), in case the accuracy increase is not enough to contrast the decrease of number of classifications per time. It results that the maximum ITR can be achieved, for each considered comparison, with window lengths of 1.25-1.5 s, while the positive impact of sinc-windowing is most evident up to 2.5-3 s window length. As final comment on the sincwindowing technique, it might be noted that its efficacy was generally confirmed despite the closeness of the chosen stimulation frequencies (8,9,10,and 11 Hz).
As regards the obtained accuracies in absolute terms, our results are in line with literature regarding multiclass SSVEP recognition with the standard CCA technique [7,14,20,26,32], although a subject-specific calibration of the stimulation frequencies and/or their duty cycles [33] could have further increased the performances. In addition, we verified that the combination of our proposed variations could produce the same accuracy increments as other CCA-related methods in literature and particularly the same improvements as filter bank CCA of Chen et al. [26].
As a final comment, we believe that, beyond making a comparison of our methods to literature, the main aim and contribution of this work were giving a systematic study of the effect of two simple, modular, and computationally light variations of the standard CCA algorithm. These proposed variations might be intended as modular "algorithm bricks" and might be flexibly translated to the design of CCA-based algorithm that is even different from ours in order to increase the overall accuracy.

Conclusion
In this work, we evaluated the impact of two simple and modular variations of the CCA algorithm in a 4-class SSVEP recognition setup. The two variations involved (i) the number of considered canonical correlations and (ii) the inclusion of a narrow-band prefiltering step around the employed stimulation frequencies and related harmonics by means of sincwindowing technique. Our results indicate that even simple consideration of more than one canonical correlation can significantly improve accuracy, without any increment of computational load. Notably, there were significant increases in accuracy when switching from one to three canonical correlations, while the increments were not significant from the fourth canonical correlation on. An additional narrowband prefiltering permitted to gain up to 7-8% of accuracy on average, with peaks of 25-30%, with respect to classical CCA. A further advantage of sinc-windowing implementation is that it permits the enhancement of multiple frequency components in one single step, by simply modulating the composition of the sinc-function. Given the modular nature of the proposed variations and the significant increments in accuracy, regardless of whether the variations were used separately or, even more, in combination, together with the minimal computational costs, we believe that they could easily represent valid integrations to be included in future CCAbased designs.