Anomaly Detection in EEG Signals: A Case Study on Similarity Measure

Motivation. Anomaly EEG detection is a long-standing problem in analysis of EEG signals. The basic premise of this problem is consideration of the similarity between two nonstationary EEG recordings. A well-established scheme is based on sequence matching, typically including three steps: feature extraction, similarity measure, and decision-making. Current approaches mainly focus on EEG feature extraction and decision-making, and few of them involve the similarity measure/quantification. Generally, to design an appropriate similarity metric, that is compatible with the considered problem/data, is also an important issue in the design of such detection systems. It is however impossible to directly apply those existing metrics to anomaly EEG detection without any consideration of domain specificity. Methodology. The main objective of this work is to investigate the impacts of different similarity metrics on anomaly EEG detection. A few metrics that are potentially available for the EEG analysis have been collected from other areas by a careful review of related works. The so-called power spectrum is extracted as features of EEG signals, and a null hypothesis testing is employed to make the final decision. Two indicators have been used to evaluate the detection performance. One is to reflect the level of measured similarity between two compared EEG signals, and the other is to quantify the detection accuracy. Results. Experiments were conducted on two data sets, respectively. The results demonstrate the positive impacts of different similarity metrics on anomaly EEG detection. The Hellinger distance (HD) and Bhattacharyya distance (BD) metrics show excellent performances: an accuracy of 0.9167 for our data set and an accuracy of 0.9667 for the Bern-Barcelona EEG data set. Both of HD and BD metrics are constructed based on the Bhattacharyya coefficient, implying the priority of the Bhattacharyya coefficient when dealing with the highly noisy EEG signals. In future work, we will exploit an integrated metric that combines HD and BD for the similarity measure of EEG signals.


Introduction
In recent years, we have witnessed significant improvements of using electroencephalogram (EEG) measurement for data acquisition in a wide range of clinical applications. It has also led to the development of data mining methods that discover potential patterns in the data, aiming at characterization of dynamic EEG behaviours. Representative examples include early detection of epileptic seizure [1][2][3], sleep process monitoring [4][5][6][7], and many other neurological disordering related health assessment and surgery problems [8][9][10].
Time series is an important class of EEG data. One of its mining tasks is to detect potential anomaly event(s)/pattern(s) at an early stage in a long-term EEG monitoring process, which is highly required by change detection [11][12][13], seizure prediction [14,15], etc. Hence, the notion of "anomaly EEG detection" is defined in the following sections. e basic premise of anomaly EEG detection is consideration of the similarity between two nonstationary EEG recordings. A well-established scheme is based on sequence matching. Figure 1 illustrates the computation process of this scheme. e continuously monitored EEG signal is first divided into nonoverlapping (or overlapping) segments; then, the ongoing segment under inspection is compared with those ones that are usual under normal states. It is worth noting that these normal EEG segments can be collected with a prior collection phase or directly taken from the past within the signal itself. e resulting comparison results, i.e., the similarity scores, allow for a change detection by testing a null hypothesis, H 0 : θ � θ 0 against H A : θ ≠ θ 0 on the parameters θ of an assumed distribution. e Gaussian distribution is the most typical assumption, and some other quantifiers, e.g., a direct threshold, can be also applicable to achieve this end. To summarize, three techniques are crucial to the success of anomaly detection, described as follows: (i) Feature Extraction. To extract explanatory parameters from the raw EEG data in order to reduce data redundancy (ii) Similarity Measure. To employ a specific metric to measure/quantify the similarity between two data recordings, i.e., individual EEG segments (iii) Decision-Making. To make a decision by testing a null hypothesis based on the resulting similarity scores Along this line of research, many efforts have been made to enhance the feature extraction as seen in [16][17][18], and some of them also involve the decision-making [4,19,20]. Nonetheless, we should be aware that it is also an important aspect to design an appropriate similarity metric, that is compatible with the considered data, when designing such an anomaly detection system [21]. Here, one can note that although the design of similarity metric has been an important problem in the context of statistics and data mining [22][23][24], the metric used for EEG signal processing still needs to be clarified due to the domain specificity. However, to the best of our knowledge, few of existing studies associated with the EEG signal processing takes into account this issue in the design of anomaly EEG detection systems. e main objective of this work is to investigate the impacts of different similarity metrics on anomaly EEG detection based on a sequential matching scheme, which uses similarity measure coupled with a null hypothesis testing. us, we collect a variety of most popular and stateof-the-art metrics from other areas that would be potentially available for our problem and modify/extend them if necessary to incorporate with the anomaly EEG detection. Impacts of different metrics on anomaly detection results are evaluated based on two data sets. e experimental results reveal the different impacts of investigated metrics. Especially, the HD and BD are demonstrated outperforming performances than other competitors including PCCD, SKLD, KD, BD, and the typically used ED.
is study therefore provides a preliminary basis for the EEG signal processing. e organization of the rest of this paper is given as follows. Section 2 formulates the considered problem. Section 3 introduces several typical metrics that are potentially available for EEG signal analysis. Section 4 describes the testing data and the experimental implementation. Section 5 shows the results with some discussion. Section 6 finally concludes this paper and shows the future work.

Problem Formulation
In this section, we first assume that the collected EEG recordings have been already represented by employed features (the feature extraction will be given in the following Section 4.2.1). We then review the method of anomaly EEG detection in the following [25]. e anomaly detection is concerned with recognising new inputs that differ in some way from those that are usual under normal states [26]. Based on this, for a given query EEG recording x, it is a common practice to compare it with a set of normal templates y j , j � 1, . . . , M, where y j is a EEG recording template and M is the total number. is size of the templates is a trade-off between sensitiveness to EEG  status change and robustness to noise. If the size of the template is larger, it will be more robust to noise but less sensitive to change because the change often occurs instantaneously, and vice versa. In this paper, the size of the templates was set as 20 seconds empirically according to our clinical experience. e (anti-)similarity can be then quantified as the maximum similarity between the query recording and the templates using a similarity metric s. We denote it as S(x) ⟵ max j s P, Q j , where the P and Q j are the features extracted from x and y j . e x is inspected as an anomaly event if the resulting similarity score S(x) exceeds a predefined threshold λ, i.e., S(x) < λ; otherwise, it is inspected as normal. Here, it is worth to mention that the detection can achieve a scalable and flexible detection result with using a different value of λ. However, since the focus of this paper is on the investigation of similarity metric, we do not make additional discussion on this issue. e interested reader can refer to [27,28] for more discussions on this issue. e similarity metric s is essential to report an accurate and reliable detection result, and its construction normally relies on a specific distance metric. A greater value of distance indicates a smaller level of similarity. More importantly, for the two given EEG recordings P and Q j , the employed distance metric needs to satisfy several fundamental properties: (i) Nonnegativity, i.e., s P, Q j ≥ 0 (ii) Identity, i.e., s P, Q j � 0 if and only if P � Q j (iii) Symmetry, i.e., s P, Q j � s Q j , P (iv) Triangle inequality, i.e., s P, Q j ≤ s P, R + s R, Q j , where R is a third EEG recording that is not equivalent to both P and Q j Here, one can note that, the distance metric for similarity quantification is not necessary to meet all of these properties especially the triangle inequality, under which such kinds of distance are called as non-metric distances [29].
Based on the above definition, the similarity metric can also be confirmed as S ∈ [0, 1] with value of 1 if two compared EEG recordings are identical and 0 if nonidentical at all. In the following, we identify some typical metrics with potentials to solving our problem by careful reviewing of the relevant literature. In particular, during the identification, two following issues were considered: (i) e metric should satisfy three properties of scalability, sensitivity, and coverage, according to [30] (ii) Among various metrics, we only pay attention to the ones which only calculate the similarity between two sequences with equal lengths

Common Metrics
is section introduces a variety of metrics from other areas that would be potentially available for our problem and modify/extend them if necessary to incorporate with the considered anomaly EEG detection problem.
Let us assume that we have two sequences, . . . , K, where p(k) and q(k) are the observed values of P and Q at time k, respectively. A variety of typical metrics, that are potentially available for EEG analysing, are introduced to measure the similarity between P and Q.

Euclidean Distance (ED)
. ED is the most common metric that refers to the real distance between two points in space [31]. e ED between P and Q can be calculated by Taking into account the characteristics of similarity metric described in Section 2, we use the reciprocal of d (ED) to represent the similarity as

Pearson Correlation Coefficient Distance (PCCD).
PCCD, proposed by Pearson, is a statistic used to reflect the degree of linear correlation between two series, with values between − 1 and 1. A larger value of this metric implies a stronger correlation of the two compared series [32]. e PCCD between P and Q can be calculated by (3) So, the similarity defined by PCCD is then calculated by

Symmetric Kullback-Leibler Divergence (SKLD).
SKLD can be used to measure the difference between two probability distributions, widely used in information retrieval and data science [33,34]. e SKLD between P and Q can be calculated by but it is not a distance metric because of its asymmetry. In order to solve the problem, symmetric Kullback-Leibler divergence is very popular in various statistical distance metrics [35] and is calculated by en, the similarity can be gotten as Computational Intelligence and Neuroscience

Hellinger Distance (HD). HD was first proposed by
Hellinger in [36]. It is used in probability and statistics to measure the similarity between two probability distributions, which belongs to f-divergence [36]. e HD between P and Q can be calculated by us, the similarity based on HD can be calculated as

Kolmogorov Distance (KD). KD was introduced by
Kolmogorov [37]. is statistical distance plays an important role in probability theory and hypothesis testing [38], and it is widely used to measure the difference between two probability distributions [39]. erefore, the KD between P and Q can be calculated by us, the similarity based on KD can be calculated as

Bhattacharyya Distance (BD).
In the statistics, BD which was proposed by Bhattacharyya in [40], also known as the Hellinger distance, measures the similarity of two discrete or continuous probability distributions. It is closely related to the Bhattacharyya coefficient, which measures the overlap between two statistical samples or populations [23]. e Bhattacharyya coefficient can be used to determine the separability of the class classification used in the measurement of two samples that are considered relatively close. e BD between P and Q is defined as where BC(X, Y) is the Bhattacharyya coefficient.
In the above schemes of distance metric, the similarity by some of them does not satisfy the condition s ∈ [0, 1], as summarized in Table 1. To cope with this problem, the similarity needs to be normalized for some of them, and the normalization will be given in Section 4.2.

Materials and Methods
is section introduces the testing data and the implementation of our experiments.

Testing Data.
e testing data in this section are from two data sets: (i) e first data set is established based on our system setup. e process of data collection is depicted in Figure 2. Electrodes are placed in accordance with the International 10-20 Electrode Placement Method to collect EEG signals. e original multichannel EEG signals are obtained using the data collector. e sampling rate of data collection used here is 512 Hz. e channel C4 was chosen for our testing. ree neurological experts are invited to check the original data and label the ground-truth according to their domain experiences, i.e., which part is normal and which part is abnormal. Here, it must be pointed out that the normal status represents that the EEG signal is in a stable status, and the abnormal status includes an unstable status of the EEG signal that might be caused by seizures or other abnormal physical activities. e data are divided into several samples using a 10,000 points nonoverlapping window. Examples of tested data samples are shown in Figure 3(a). (ii) e second data set is taken from the public Bern-Barcelona EEG data set. ey randomly select 3,750 pairs of simultaneously recorded signals from the pool of all signals measured at focal and nonfocal EEG channels, respectively, and divide the recordings into time windows of 20 seconds. e original data are recorded with a sampling rate of 1,024 Hz. en, these EEG signals were downsampled to 512 Hz prior to further analysis so that each piece of EEG data contains 10,240 samples in length [41]. Examples of data in this data set are shown in Figure 3 Additionally, for each data set, we first select 30 pieces of most table normal data segments to form a template set, and the stability and normality here are judged according to domain experts, and the residuals are as the test data. Moreover, the test data are further equally divided into two groups: one for optimizing threshold and one for final testing. Both groups contain 30 pieces of data segments, of which 15 pieces are normal data segments, and the other pieces are abnormal. e detection performance was evaluated with cross-validation of these two groups. We repeat the whole process of the evaluation twenty times, such that the final results can be obtained and analysed.

Experimental Implementation.
Consistent with the mechanism of anomaly EEG detection introduced previously in this paper, we perform three steps, i.e., feature    Computational Intelligence and Neuroscience extraction, similarity measure, and decision-making, to carry out our experiment. Let us first denote each ith piece of template data as y i (n), n � 1, 2, . . . , N, and denote each ith piece of testing data as x i (n), n � 1, 2, . . . , N. Main methodologies used in the experiments are then introduced in the following.

Feature Extraction.
We extract the so-called power spectrum [21] from the raw EEG data as the feature. Let us assume that the observed value of a piece of the EEG signal at the nth point has been denoted as x(n), n � 1, 2, . . . , N. e EEG signal was observed in discrete situation, where the transform is discrete in both time and frequency domains [42]. We may review the discrete Fourier transform (DFT) calculation, which is formulated as where X(k) is the output of the transform and k indicates the frequency index.
After a subband passing filtering (the resulting EEG data are denoted as x'(n) after filtering), the power spectrum P(k) can be estimated using the Welch method, a typical power spectrum estimation method, by where U � (1/M) M− 1 n�0 d 2 2 (n) and d 2 (n) is the window function. e resulting power spectrum P(k) allows for the quantitative inspection of EEG data. An example is shown in Figure 4. It can be found that the anomaly EEG signals have the disordering amplitude variations and are polluted with a high ratio of noise. As a result, it would be very difficult to judge whether the EEG signal is abnormal through time-domain analysis. In contrast, the difference between normal and abnormal EEG signals in the frequency domain is more clear, thus allowing for quantitative inspection, i.e., similarity measure, for EEG data inspection.
Based on the above calculation of power spectrum, the testing data x i (n) and the compared template y j (n) can be represented as their corresponding power spectrums P i and Q j , respectively.

Similarity
Measure. s P i , Q j is the similarity between P i and Q j , which is calculated through the metrics described in Section 3. e similarity S(x i ) of x i to a normal status is thought of as the minimum s among all templates, i.e., S(x i ) ⟵ min j s P i , Q j .
Furthermore, in order to satisfy the requirement described in Section 2, S(x i ) should be normalized as [0, 1] by We still use S(x i ) to represent the similarity for simplicity in the following.

Decision-Making.
In order to inspect whether x i (n) is normal or not, a threshold λ should be predefined. e decision is subsequently made by testing the following hypothesis: If the similarity between of testing data x i (n) is greater than the threshold λ, the data are inspected as a normal data; otherwise, it is considered as abnormal. We first carry out a prior estimation to confirm the optimal value of λ with a number of EEG testing data and then use it to detect all other testing EEG signals in the experiment. e results shown in the following section are obtained by the optimal value of λ.

Results and Discussion
5.1. Experiment I: Investigation on Data Set I. As described in Section 4.1, the evaluation was repeated 20 times to obtain the final result. In the following, detailed results for one of evaluations are provided. Figure 5 provides the detection results for all investigated metrics using the data of our database. In the left of each subfigure, we show the computed similarities of each training data including normal training data and abnormal training data. e similarities are gathered and then arranged in ascending order (normal testing data) or descending order (abnormal testing data). As such, two curves corresponding to normal testing data and abnormal testing data can be obtained, and they intersect at point O.
e abscissa of point O (AOPO) can provide an overall evaluation for normal and abnormal testing data. A smaller AOPO means a greater difference between the normal recordings and the abnormal recordings, indicating that the similarity indicator is better; otherwise, the similarities between the two classes of recordings are not much low, meaning that the similarity indicator is not good enough. From these results, it can be clearly seen that HD and BD achieve the best result and the KD and SKLD have achieved not-so-good results, while the ED and PCCD have the worst results. 6 Computational Intelligence and Neuroscience   Computational Intelligence and Neuroscience e other indicator of accuracy is also used to quantify the detection performance, which is defined as where TP is true positive indicating the number of data that are inspected correctly and FN is false negative indicating the number of data that are inspected incorrectly. e right of each subfigure in Figure 5 shows the results of all metrics in term of accuracy. e hypothesis testing described in Section 4.2.3 is used to classify the group 1 of testing data using all investigated metrics with different threshold λ values. erefore, the higher the accuracy, the better the metric. And it can be seen that, for each metric, as λ increases, accuracy increases first and then decreases. e values of λ corresponding to the highest accuracy are used to calculate the accuracy of the group 2 data set. Two examples are given in Figures 6 and 7, in which we show the similarity scores of all investigated metrics (using their optimal λ) for a normal testing recording and an abnormal testing recording. It can be found that PCCD and KD output wrong results for the abnormal testing data, while the others output the right results. It can be seen that the HD achieves the best performance outperforming other metrics.
We summarize the results of investigated metrics by combining their results in two terms of AOPO and accuracy in Table 2. It can be seen that (1) HD and BD are the best metrics in terms of AOPO and (2) HD works best in terms of accuracy.
e above experimental process was implemented 20 times. In order to analyse all the experimental results, we calculated the average of the AOPO and accuracy values obtained from all experiments based on a global mean measure and show the results in Table 3. It is noticed that the  metrics of HD achieve the best performance in terms of AOPO, i.e., 3.65; in terms of accuracy, the HD outperforms others. Based on these results, the investigated metrics can be ranked as HD > BD > KD > SKLD > ED � PCCD.

Experiment II: Investigation on Bern-Barcelona Data Set.
e result of one repetitive evaluation on the Bern-Barcelona data set is also shown. Figure 8 gives the detection results for all investigated metrics using the training data of the public Bern-Barcelona EEG database. In the left of each subfigure, we show the computed similarities of each testing data. And the similarities are also arranged in ascending order (normal testing data) or descending order (abnormal testing data). erefore, the AOPOs in this experiment can be gotten. From these results, it can be clearly seen that HD, KD, and BD achieve the best result, the ED and PCCD have achieved not-so-good results, while the SKLD has the worst results. e right of each subfigure in Figure 8 shows the results of all metrics in term of accuracy. It is clear that, for each metric, as λ increases, accuracy increases first and then decreases too. e values of λ corresponding to the highest accuracy which is marked as λ 0 are also used to calculate the accuracy of the group 2. Two examples are given in Figures 9 and 10, in which we show the similarity scores of all investigated metrics (using their λ 0 ) for a normal testing recording and an abnormal testing recording. It can be found that all the metrics output the right result for the normal testing data. But for the abnormal testing data, only ED and HD output the correct result. In terms of accuracy, BD, HD, and HD are also better than the others. e results of investigated metrics are also summarized in Table 4. It can be clearly seen that, in this experiment, HD, KD, and BD have achieved the best results in terms of AOPO; in terms of accuracy, BD works best. e above experimental procedure was also implemented 20 times. e averages of the AOPO and accuracy values obtained from all experiments are shown in Table 5.
erefore, for the Bern-Barcelona EEG database, the metrics of BD achieves the best performance in terms of AOPO, i.e., 1.55; in terms of accuracy, the BD outperforms others. Based on these results, the investigated metrics can be ranked as BD > HD > KD > PCCD > ED > SKLD.

Experiment III: Investigation on Effect of Feature
Extraction. In order to investigate the effect of feature extraction on detection performance, five representative features including mean, root mean square (RMS), empirical mode decomposition (EMD), discrete wavelet transform (DWT), and artifact subspace reconstruction (ASR) that are used in EEG signal analysis, are investigated in this section. eir operations are provided in Table 6. e processes of similarity measure and decision-making stated in Section 4.2 are also implemented to classify the testing data. e results of AOPO and accuracy of our database are shown in Tables 7  and 8, respectively. e results of the Bern-Barcelona EEG database are shown in Tables 9 and 10, respectively.
From the results shown in Table 7, we can see that, for our database, in terms of AOPO, the metrics of HD and BD perform better than others when using different features. Table 8 shows the results in term of accuracy. We see that the metrics of HD and BD performs better than others when using DFT, mean, RMS, and ASR; in comparison, PCCD also shows exciting results when using the features of EMD and DWT. Tables 9 and 10 show the detection results for the Bern-Barcelona EEG database. It can be clearly seen that the metrics of HD and BD perform better than other alternatives in both terms of AOPO and accuracy.
To summarize all these results, it can be also noted that, ED, as the most commonly used indicator, performs the worst in terms of AOPO and accuracy for both testing data sets. PCCD, SKLD, and KD have achieved not-so-good results. Among all investigated metrics, the metrics of HD and BD are more suitable for EEG signal analysis.

Result Summary and Discussion.
Combining the results from two tested data sets, it is clear that HD and BD achieve a better performance than the other compared metrics. Recall that both BD and HD are obtained by certain transformations of the Bhattacharyya coefficient BC(P, Q), i.e., s (HD) � 1 − BC(P, Q), s (BD) � − ln(BC(P, Q)). (19) In this regard, HD and BD are thought of as an approximately equivalent measurement of two statistical samples. e difference between them is the sensitivity to noise, as discussed in [49]. However, it is very difficult to determine which of them is more appropriate for analysing the highly noisy EEG signals. As a potential solution of taking advantages of them, one can combine them using

12
Computational Intelligence and Neuroscience machine learning-based optimization methods, such as inputs selection and inputs weighting [50][51][52], to form an integrated metric to measure the considered EEG recordings. is also comprises the direction of our future work.
Empirical mode decomposition (EMD) is a method of signal decomposition based on the time-scale characteristics of the data itself, the detailed process of which can refer to [46].

DWT
Discrete wavelet transform (DWT) is a discrete wavelet transform method. Its detailed process can refer to [47].

ASR
Artifact subspace reconstruction (ASR) is relatively new technique, and it is based on new approach of signal reconstruction with the reference signal fragment. e detailed process of ASR can refer to [48].

Conclusions
Anomaly EEG detection is a long-standing problem in analysis of EEG signals. e basic premise of this problem is consideration of the similarity between two nonstationary EEG recordings, where a well-established scheme is based on sequence matching. Typically, this scheme includes three steps: feature extraction, similarity measure, and decisionmaking. Current approaches mainly focus on EEG feature extraction and decision-making, and few of them involve the similarity measure/quantification. Generally, to design an appropriate similarity metric, that is compatible with the considered problem/data, is also an important issue in the design of such detection systems. It is however impossible to directly apply those existing metrics to anomaly EEG detection without any consideration of domain specificity. e main objective of this work is to investigate the impacts of different similarity metrics on anomaly EEG detection. A few metrics that is potentially available for the EEG analysis have been collected from other areas by a careful review of related works, including Euclidean distance (ED), Hellinger distance (HD), Bhattacharyya distance (BD), Kolmogorov distance (KD), Pearson correlation coefficient distance (PCCD), and Symmetric Kullback-Leibler divergence (SKLD). Experiments were conducted on two data sets to investigate them. Based on the results shown in Section 5, the following are found: (1) Experimental results demonstrate the positive impacts of different similarity metrics on anomaly EEG detection. Especially, the commonly used ED did not achieve satisfactory results when compared with other metrics. One main reason is that this metric does not consider the possibly different weight of each element in two compared EEG samples. (2) Among all investigated metrics, the HD and BD metrics, that are constructed based on the Bhattacharyya coefficient, show excellent performances. ey achieved excellent performances for two inspected data sets: an AOPO value of 3.5 and an accuracy of 0.9167 for our data set and an AOPO value of 1.5 and an accuracy of 0.9667 for the Bern-Barcelona EEG data set. ese findings reflect the priority of the Bhattacharyya coefficient when dealing with the highly noisy EEG signals. is study provides a preliminary basis for analysing the EEG data.
In order to take advantages of the Bhattacharyya coefficient, we will exploit an integrated metric combining HD and BD for similarity measure of EEG signals in the future work. e power spectrum of y R: An EEG recording that is not equivalent to both P and Q j s P i , Q i : e similarity between P i and Q i S(x i ): e similarity between x i and the template set y i x(n): e nth point of a given EEG recording x ′ (n): e nth point of resulting EEG data after filtering x i : e ith of EEG recording of the testing data set X(k): e k point of the frequency spectrum of x(n) y j : e jth EEG recording of the template set λ: e threshold used for hypothesis testing.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.