Spatial and Time Domain Feature of ERP Speller System Extracted via Convolutional Neural Network

Feature of event-related potential (ERP) has not been completely understood and illiteracy problem remains unsolved. To this end, P300 peak has been used as the feature of ERP in most brain–computer interface applications, but subjects who do not show such peak are common. Recent development of convolutional neural network provides a way to analyze spatial and temporal features of ERP. Here, we train the convolutional neural network with 2 convolutional layers whose feature maps represented spatial and temporal features of event-related potential. We have found that nonilliterate subjects' ERP show high correlation between occipital lobe and parietal lobe, whereas illiterate subjects only show correlation between neural activities from frontal lobe and central lobe. The nonilliterates showed peaks in P300, P500, and P700, whereas illiterates mostly showed peaks in around P700. P700 was strong in both subjects. We found that P700 peak may be the key feature of ERP as it appears in both illiterate and nonilliterate subjects.


Introduction
A brain-computer interface (BCI) is a system which provides a communication method by utilizing biophysiological signals [1]. BCI system enables the users to communicate with external world through measurements of biological signals and mostly do not require voluntary muscle movement. The system has been utilized to support severe locked-in syndrome (LIS) patients who lack motor ability, such as amyotrophic lateral sclerosis (ALS) and Guillain-Barre syndrome patients, as a means of communication [2][3][4][5][6][7]. Of many biophysiological signals, electroencephalography (EEG) has been most widely used in BCI field for its easiness in and low cost of measurement [8,9].
Among different applications of BCI, event-related potential (ERP) based speller system has been one of the most widely used paradigms. The system was pioneered by Farwell and Donchin [10] in 1988 which utilized oddball paradigm in order to induce visual evoked potential (VEP), especially the P300 response. However, there are still illiteracy problems associated with ERP speller system [11,12]. There has been reports of ERP features other than P300 [13,14] which may be a key feature of distinguishing identifying illiterates.
One of the most prominent classification methods for ERP system is support vector machine (SVM) [15][16][17][18]. SVM is mathematically simple and, with sufficient knowledge of feature matrix, the experimenter can modulate the kernel for the target problem. Unfortunately, the kernel of SVM is sensitive to overfitting [19]. As EEG are measured from multiple electrodes [20][21][22][23], feature matrix can have high dimension with possible duplicates, which increase possibility of overfitting. As most of ERP system paradigms are dependent on P300 peak, the information (peak magnitude and latency) from each electrode should be similar. Moreover, it is hard to extract temporal and spatial information of EEG of a single kernel. Although multiple kernel learning (MKL) problem has been suggested [24], it is hard to extract intuition of the given problem through the method.
Recent development of deep learning provides an alternative approach. The convolutional neural network (CNN) can extract the feature from a given feature vector by using convolution. When an optimal filter is applied, the convolution will magnify the feature of interest and reduce the others [25]. CNN has been used in pattern recognition, especially in image recognition and speech recognition, as it provides topological information within the extracted feature [26][27][28][29][30]. 2 Computational Intelligence and Neuroscience Therefore, data with sequence or topological information can be recognized more efficiently as CNN enables extracting both temporal and spatial information within the raw data. As the ERP shows sequence of rise and fall as a response to visual stimuli, pattern recognition technique as CNN can be applied. Moreover, the convolution kernel of CNN can be used as tool for interpreting the spatial correlation among EEG electrodes.
In this paper, we explore the performance of CNN on ERP data to identify the key features that distinguish illiterates of ERP speller system. The convolution kernels of trained model will be explored to analyze the spatial correlation between cortices and pattern within ERP of each electrode. The subjects were grouped as either strong (nonilliterate) or weak (illiterate) depending on clarity of ERP signals. Results of two groups were compared to analyze difference in features. Figure 2 were used as visual stimuli for the speller system of this paper. Rapid serial visual presentation (RSVP) panel design was adopted for the speller system to avoid gaze effect. During the experiment, screen size icons appeared on the center of the monitor in a random sequence [31]. The oddball paradigm was implemented by presenting target icon with distractors in a random sequence [10]. Each icon appeared 20 times per trial. The interstimulus interval (ISI) between icon appearances was set to 300 ms.

Data
Acquisition. For this paper, 33 subjects (13 female, 20 male) participated in the experiment. The subjects' age ranged from 24 to 30 (mean = 27.25, std = ±1.92). During the experiment, subjects were asked to sit upright on a chair and instructed to keep still. No straps or ties were attached. Subjects were asked to self-report any inconvenience that might bother the concentration.
Each trial was initiated with an acoustic cue instructing the target of the given trial in subjects' mother tongue (Korean). 10 seconds after the acoustic cue was given, the icons appeared on the monitor according to RSVP design in random sequence. The subjects were instructed to mentally count the target occurrence during each trial (Figure 2(b)). Each session consisted of 12 trials. Each icon was selected as a target during the session twice in random sequence.
All subjects were naive; 10-20-minute preexperiment session was given to get subjects used to the procedure. The subjects were asked to self-report if they felt confident of the procedure. After the preexperiment session ended, the measurements of EEG were made. During the experiment, one training session and online session were conducted as a pair. To minimize subject's stress level and fatigue, 10-minute break was given in between training and online session. Each subject conducted minimum of 2 pairs of training and online session. No subjects had participated in more than 4 pairs of sessions.
EEG was collected by B-Alert X10 headset from Advanced Brain Monitoring (ABM) with sampling rate of 256 Hz. The EEG electrodes recorded followed international 10/20 system [32] as shown in Figure 2(a). All experiments were held in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Sangmyung University.

Convolutional Neural
Network. The architecture of CNN for this paper was as shown in Figure 2(c). The CNN consisted of 2 convolutional layers, 2 max-pooling layers, and 2 fully connected layers. Rectified linear unit (ReLU) function was applied as activation function for each convolutional layer since its performance was proven by another [33]. A softmax function was applied to output the last layer to regularize the final output to be between 0 and 1. The output of CNN was vector of 2 elements where each element represented the score of target and nontarget.
The CNN was designed to perform both spatial and temporal filtering. The feature maps of each layer were used to access correlation between adjacent electrodes and temporal feature of target ERP. In the 1st convolutional layer (L 1 ), a filter of size 6 × 20 was applied to extract correlation of EEG recorded in adjacent electrodes. The row number of the filter was set to 6 as 3 electrodes were placed on each lobe (except for occipital lobe where two electrodes were placed). The size of filter enables analyzing the correlation of all 6 electrodes from adjacent lobes. For analysis of temporal feature of feature map from L 1 among different lobes, a filter with size of 1 × 12 was applied for 2nd convolutional layer (L 2 ) whose window size was approximately 100 ms in time scale.
To reduce the receptive field size for ease of calculation and prevent overfitting, max-pooling layers (M 1 and M 2 ) were inserted after each convolutional layer [27,34]. The max-pooling layers downsample the feature map by applying a sliding window without overlap. As the name implies, the maximum value within the window is extracted. As the maxpooling introduces downsampling effect, a generalization of feature map was achieved which prevented overfitting of the model. Sliding window sizes of M 1 and M 2 were 2 × 2 and 1 × 10, respectively.
To further reduce the possibility of overfitting while training the model, drop-out technique was applied on the first fully connected layer (F 1 ). The drop-out technique padded zeros to randomly selected rows in the given feature map. By intentionally losing the data within the feature map, generalization was achieved for the feature map which prevented the model from being overfitted by the training data [35,36].
The size of input matrix fed into the CNN was 14 × 300 where each row corresponded to EEG collected from each electrode in Figure 2.
The CNN architecture was implemented in Python via TensorFlow on Python [37,38]. The Adam optimizer was used to train the CNN which controls the learning rate to use larger step size. 10,000 iterations were conducted for training the model for each subject's data.

Tie Breaking.
Ideally, if the model is perfect, only one icon will be identified as the target for a given trial. However, the system identified multiple icons as the targets in several trials. On the other extreme, the system failed to identify any target icon for some trials. For each case, the tie breaking rule was applied as follows.
(i) Multiple icons cases: When multiple icons were thought to be the target of a given trial by the CNN, the tie breaking rule was applied to select the target among these candidates. Since the first element of output vector represents the icons affiliation to target ERP property, the icon with the greatest value of the element was selected as the target of the trial. (ii) No target case: When the system failed to find the association of the ERP from any icons to property of target ERP, that is, no icons were identified as the target, same rules as those in multiple icons case were applied to select the target for the given trial. In this case, the first elemenet of output vector from all icons was compared. The icon whose first element of output vector was the greatest was selected as the target of the trial.

Analysis.
Both qualitative and quantitative analysis were performed to analyze the characteristics of filters of each convolutional layer. The subjects were divided into two groups according to their relative strength of ERP as follows: (i) ERP detection: if the target icon was detected as positive in a given trial, the ERP is considered detected. The subjects were divided accordingly into either H or L group (H and L for high and low) ERP detection group. The threshold between H and L group was 50%. (ii) Feature map: feature maps from L 1 and L 2 were drawn in color map. As higher weights of feature map denote high discriminant power, the colormap can qualitatively give insight of how each electrode is correlated and at which time the main peak is formed. (iii) Statistical analysis: for quantitative analysis of performance, accuracy, sensitivity, precision, F1 measure, and ROC were calculated for each subject and ANOVA test was held to compare mean difference. The accuracy is defined as the ratio of number of correctly identified trial to total trial numbers. The classical statistic measurements for quantitative evaluation are as follows: (iv) Receiver operating characteristic: receiver operating characteristic (ROC), which plots the sensitivity against specificity, widely used statistical measurement for its diagnostic ability of binary classifier. As the CNN of the paper is a binary classifier, the ROC information is provided to compare the performance of CNN between H and L group.
(v) Peak signal to noise ratio: peak signal to noise ratio (PSNR) is used as measurement of qualitative reconstruction method of compression codes [39]. As the performance of filter will depend on how many core features are extracted from raw ERP, the PSNR of L 1 s feature map was calculated as a mean of measurement of performance. The greater PSNR shows the presence of significantly high weight inside feature map whereas lower PSNR represents only low weights that are present in the given feature map and the discriminant power of the filter is low.

ERP
Although both H and L group show drop in both training and validation error as training iteration continues, the validation error of L subject is higher than that of H subject. The ROCs of H and L subject shown in Figure 3(e) indicate the performance of CNN of H group to be greater than that of L group subject.

Spatial and Temporal
Features. The feature map of each convolutional layer did not contain negative weights associated with negative peaks, such as N1 [40] as the activation function was set to ReLU [33].
The target ERP and feature map of L 1 of sampler H and L subject are shown in Figures 4 and 6. The target ERP shown in both figures is target ERP averaged over all trials. To analyze the correlation of frontal and occipital lobe electrodes, the first 3 electrodes (first 3 rows of averaged target ERP matrix) were copied and pasted at the end of ERP matrix. As shown in Figure 4(a), the target ERP of L group subject shows On the other hand, the correlation of ERP among adjacent electrodes for H group subject shown in Figure 6 indicates the correlation is restricted to specific time range. Most of the high weights of feature maps shown in Figures 6(b), 6(d), 6(f), and 6(e) show significant positive value around P500 and P700 range for frontal and central lobe electrodes. The correlation between central and parietal lobe is shown in Figure 6(c) around P500 range. Some features around P500 region were found to show high correlation among all electrodes. Unlike that of L group subjects, feature map of L 1 for H group subject showed high correlation among all electrodes, where each case shows specific temporal characteristics.
The temporal features shown in feature map in Figure 5 indicate that temporal features associated with P700 peak are present for L group subjects as expected. In Figures 5(a), 5(b), and 5(c), high positive weights were found around P700 range (row 4 and 6). However, most of the feature maps did not show significant weights or were either flat as in Figure 5(i).
The temporal features of H group subjects showed more variety. Some feature maps showed high positive weights in their feature maps around P300 and P500 range as shown in Figures 7(a), 7(b), 7(c), and 7(d), whereas the others indicated significant positive weight around P700 range as in Figures  7(a)-7(i). However, the weight associated with P700 range is more widely defined than those associated with P300 and P500.  0.0135, 0.8.88e − 05, and 0.0072, resp.). A significant mean difference in F1 measure did not exist between H and L group. The accuracy of H and L group was 0.889 and 0.687, respectively. The sensitivity of H group was higher than that of L group, but the precision of H group was significantly lower than that of L group. The area under ROC of H group was significantly higher than that of L group ( value = 0.0137).

Statistical Analysis. Comparison of classical statistical measurements and other measurements is shown in
The PSNR for L 1 of H group was significantly lower than that of L group. As all PSNR measured were negative, the absolute value of PSNR of H group was greater than that of L group. On the other hand, no mean difference of the peak time (PeT.) between H and L group was found ( value = 0.965).

Discussion
In this study, CNN has been used to investigate the spatial and temporal characteristics of ERP that distinguish the performance difference between illiterates and nonilliterates (L and H group). As a comparison of performance, classical statistic measurements as well as filter comparison measurement had been collected to compare the correlation of ERP taken from different EEG electrodes and identify characteristic temporal features associated with each group.
The statistical measurement shows that the mean performance of CNN with H and L group data had significant difference. The accuracy of H group data was higher than that of L group data. Interestingly, although the sensitivity of H group was higher than that of L group, the precision of H group was significantly lower than that of L group. This reflects the fact that the ERP of L group was not identified as target in most of the cases, and the CNN identified ERP from all 6 icons to be nontarget in more than half of the trials.
The learning curve and errors in Figure 3 demonstrate how the statistical measurement affects the performance of CNN. Although the false negative rate remains mostly near 6 Computational Intelligence and Neuroscience 0, as the false positive rate remains close to 0, the learning curve remains stable around .2 for the L group subject. This again reflects the characteristics of L group ERP who were mostly identified as nontarget. Some of the ERP that were identified as target ERP were mostly from nontarget icons, indicating lack of distinctive feature associated with target ERP. However, both false negative and false positive rate drop as training iteration continues for H group subject's data, leading to increase of learning accordingly to the iteration. As the ERP of L group does not have sufficient distinctive features, the model becomes slightly overtrained compared to the model of H group subject as shown in validation error plot in Figures 3(c) and 3(d). The comparison of ROC validates the analysis as ROC of H group was significantly higher than that of L group ( value = 0.0137). As shown in Figure 4, most of the ERP collected from L group were flat in most of the channels. Most of the positive weights in target ERP were observed in frontal and central lobe electrodes (1st and 5th row of Figure 4(a)) which was contrary to the expectation as previous research indicated positive peaks associated with target event were mostly observed in parietal or occipital lobe [41,42]. The correlation of ERP collected from adjacent electrodes did not show existence of significant correlation between occipital and parietal lobe data in L group subjects. On the other hand, ERP of H group were more invigorated, showing stronger activity in P300 area as shown in Figure 6(a). The ERP correlation indicated in feature map also indicated stronger correlation of ERP data collected from occipital and parietal lobe with other lobes. The spatial correlation shown in feature map of H group also indicated that the correlation was restricted in specific time range corresponding to either P300, P500, or P700.
The feature map of 2nd convolutional layer demonstrated the difference in temporal features between H and L group subjects. In most of L group subjects, the feature map did not show strong positive weights and was flat. Some indication of positive weights was mostly restricted in P700 region. On the other hand, the positive weights of H group were distributed around P300, P500, and P700 and the positive weights found near P300 and P500 range was sharper compared to those found around P700 range. Previous researches have indicated the possibility of existence of different features other than P300 [41,43,44] The result of the paper also supports the idea that P300 may not be the only key feaure of ERP speller system. Rather, the P700, which were identified among both L and H group subjects, may represent more universal ERP feature. However, the ERP from central lobe area observed in L group indicates the possibility of effect of stimulus probability [32] (Figure 1(a)).
The PSNR indicated that lack of activities in occipital/parietal lobe and broad peak found in P700 affect the Computational Intelligence and Neuroscience  performance of spatial filter in L1 as well. As the PSNR measures the maximum power of a signal and the power of corrupting noise [45], the result indicates that the filter was not able to extract distinctive signal of target ERP from background noise for L group subjects' data. This may be since peaks near P700 were broad and fluctuating. On the other hand, P300 and P500 peaks found in H group subjects were sharper, which made the filter extract relevant features more precisely without being affected by background noise. Interestingly, the major peak of L 2 of H and L group subjects did not differ significantly ( value = 0.965). As the major peak was found by averaging the feature maps from L 2 , the difference in each feature map may have been overshadowed. Further statistical analysis to access temporal feature within each feature map must be applied to validate the results found in this study.

Conclusions
This study has investigated the difference in spatial and temporal features of ERP between high performance group (H group) and low performance group (L group). The result indicated that the major difference arises from spatial correlation of ERP among other lobes rather than temporal features. Although the temporal feature difference was not found to be quantitative in this study, the qualitative analysis indicated lack of P300 in low performance group. Interestingly, both low and high performance group showed activity near P700 which may be the key activity of ERP speller system instead of traditional P300 peak. Further analysis of individual feature map will be needed to investigate the key temporal feature of ERP speller system.

Conflicts of Interest
The authors declare that they have no conflicts of interest.