Comparative Analysis of Classifiers for Developing an Adaptive Computer-Assisted EEG Analysis System for Diagnosing Epilepsy

Computer-assisted analysis of electroencephalogram (EEG) has a tremendous potential to assist clinicians during the diagnosis of epilepsy. These systems are trained to classify the EEG based on the ground truth provided by the neurologists. So, there should be a mechanism in these systems, using which a system's incorrect markings can be mentioned and the system should improve its classification by learning from them. We have developed a simple mechanism for neurologists to improve classification rate while encountering any false classification. This system is based on taking discrete wavelet transform (DWT) of the signals epochs which are then reduced using principal component analysis, and then they are fed into a classifier. After discussing our approach, we have shown the classification performance of three types of classifiers: support vector machine (SVM), quadratic discriminant analysis, and artificial neural network. We found SVM to be the best working classifier. Our work exhibits the importance and viability of a self-improving and user adapting computer-assisted EEG analysis system for diagnosing epilepsy which processes each channel exclusive to each other, along with the performance comparison of different machine learning techniques in the suggested system.


Introduction
Epilepsy is a chronic neurological disease. The hallmark of this disease is recurring seizures. It has been cited that one out of hundred people suffers from this disorder [1]. Electroencephalography is the most widely used technique for diagnosis of epilepsy. EEG signal is the representation of voltage fluctuations which are caused by the flow of neurons ionic current. Billions of neurons maintain brains electric charge. Membrane transport proteins pump ions across their membranes. Neurons are electrically charged by these membranes. Due to volume conduction, wave of ions reaches the electrodes on the scalp that pushes and pulls the electron on the electrode metal. The voltage difference due to pull and push of the electrons is measured by voltmeter whose readings are displayed as the EEG potential. Neuron generates too small of a charge to be measured by an EEG, and it is the summation of synchronous activity of thousands of neurons that have similar spatial orientation which is measured by an EEG. Unique patterns are generated in the EEG during an epileptic seizure. These unique patterns help the clinicians during diagnosis and treatment of this neurological disorder. That is why EEG is widely used to detect and locate the epileptic seizure and zone. Localization of the abnormal epileptic brain activity is very significant for diagnosis of epileptic disorder.
Usually the duration of a typical EEG varies from few minutes to few hours but in case of prolonged EEG it can even last as long as 72 hours. This generates an immense amount of data to be inspected by the clinician which could prove to be a daunting task.
Advancement in signal processing and machine learning techniques is making it possible to automatically analyse EEG data to detect epochs with epileptic patterns. A system based on these techniques can aid a neurologist by highlighting the epileptic patterns in the EEG up to a significant level. Of course, the task of diagnosis should be left to the neurologist. However, the task of the neurologist becomes efficient as it reduces the data to be analysed and lessens up the fatigue. Along with classification these analysis software programs can also provide simultaneous visualization of multiple channels which helps the clinician in differentiating between generalized epilepsy and focal epilepsy.
Computer-assisted EEG classification involves several stages including feature extraction, feature reduction, and feature classification. Wavelet transform has become the most popular feature extraction technique for EEG analysis due to its capability to capture transient features, as well as information about time-frequency dynamics of the signal [4]. Other previously used feature extraction approaches for epilepsy diagnosis include empirical mode decomposition (EMD), multilevel Fourier transform (FT), and orthogonal matching pursuit [5][6][7][8][9]. Feature extraction is followed by feature reduction to reduce computational complexity and avoid curse of dimensionality. Most commonly, the reduced feature vector consists of statistical summary measures (such as mean, energy, standard deviation, kurtosis, and entropy) of different sets of original (unreduced) features, although other methods such as principal component analysis, discriminant analysis, and independent component analysis have also been used for feature reduction [4,7,10,11]. Feature extraction/reduction is followed by classification using a machine learning algorithm, such as artificial neural networks (ANN), support vector machines (SVM), hidden Markov models, and quadratic discriminant analysis [8,[11][12][13][14].
A very important and novel phase of our system is user adaptation mechanism or retraining mechanism. There are multiple reasons according to which introduction of this phase has lots of advantages. During this phase, system will try to adapt its classification as per users desire. It has been cited that sometimes even the expert neurologists have some disagreement over a certain observation of an EEG data. There is also a threat of overfitting by the classifier. In order to keep the classifier improving its performance with the encounter of more and more examples, we have introduced this user adaptive mechanism in our system. We consider the existing systems as dead because they cannot improve their classification rate after initial training. They do not have any mechanism of learning or improvement from neurologists corrective marking [15][16][17]. The agreement between different EEG readers is low to moderate; our adaption mechanism helps the user in catering this issue as our system tries to adapt the detection according to the users corrective marking. The new corrective markings generate new examples with improved labels. Hence, it populates the training examples with newly labelled ones. So after retraining machine learning algorithms in the system, users adapt to set of choices.
In the next section we will explain our proposed method which will be followed by the results. In the results section, we will explain how SVM performs better than QDA and ANN in our proposed method. We will also show that exclusive processing of each channel results in a significant improvement in the classification rate. Here "epileptic pattern" and "epileptic spikes" will be used as an alternative to each other.

Proposed Method
Computer-aided EEG analysis systems use the neurologists marking and labelling of the EEG data as a benchmark to train themselves during initial training phase. But after initial training phase, these systems have no simple mechanism for these neurologists to improve systems classification after encountering any false classification. So we have proposed a method by which systems classification can be improved by the user in a relatively simpler way. This analysis system only tries to detect the epileptic spikes as mentioned by Noachtar and Rémi. Later it adapts its detection of epileptic spikes exclusively for every user (Figure 4). In this proposed system, we are processing each channel for each epileptic pattern exclusive to each other. This exclusive processing of each channel not only helps the user in diagnosing localized epilepsy but also eases up the classifiers job. We have considered that different epileptic patterns are independent to each other and their separate handling will help us in avoiding error propagation from one epileptic pattern type detection to the other. Our systems working has two major phases (A) initial training phase and (B) adaptation phase. These two major phases have further three parts which are (1) feature extraction, (2) feature reduction, and (3) classification. Next we will briefly explain all of these steps.

Initial Training
To decide which parts of the signal are epileptic and which are not we first divided whole of the signal in small chunks known as epochs. Then DWT was applied on those epochs so that visibility of epileptic activity can be enhanced which is distinguished by some spectral characteristics. These features are then processed to make them more suitable for the classification technique.
(a) Epoch Size. The first important part of the feature extraction is epoch selection. Epoch is a small chunk of the signal which is processed at a time. The size of the epoch is very important. The larger it is the less accurate it will be. The smaller it is the higher the processing time will be.  After testing different epoch sizes, we found epoch size of nonoverlapping 1 sec window to be best yielding in terms of accuracy. It also reestablished the work of Seng et al. [18] ( Figure 1).
(b) DWT. As discussed in Introduction, spectral analysis is very informative while examining the epilepsy suspected patients EEG. There are profound advantages of wavelet decomposition which is a multiresolution analysis technique. A multiresolution analysis technique allows us to analyse a signal for multiple frequency resolutions while maintaining time resolution unlike a normal frequency transform. Wavelet decomposition allows us to increase frequency resolution in the spectral band of our interest while maintaining the time resolution; in short we can decimate these values simultaneously in time and frequency domain.
During wavelet transform, the original epoch is split into different subbands: the lower frequency information is called approximate coefficients and the higher frequency information is called detailed coefficients. The frequency subdivision in these subbands helps us in analysing different frequency ranges of an EEG epoch while maintaining its time resolution [4,8,13]. The choice of coefficients level is very important as the epileptic activity only resides in the range of 0-30 Hz. Coefficients levels of the DWT are determined with respect to sampling frequency. So, the detailed levels of interest are adjusted on the run according to the sampling frequency such that we may get at least one exact value of the closest separate (0.4-4 Hz), (4-8 Hz), (8)(9)(10)(11)(12), and (12-30 Hz) components of the signal. We discarded all the detailed coefficient levels which were beyond the 0-30 Hz range.
Then DWT was applied on each epoch with Daubechies-4 (db4) as mother wavelet. The detailed coefficient levels of the DWT were determined with respect to sampling frequency.
(c) Statistical Features. After the selection of detailed coefficients which represent the frequency band of our interest, we calculated the statistical features by calculating the mean, standard deviation, and power of these selected wavelet coefficients. These statistical features are inspired from Subasi and Gursoy work [13].
(d) Standardization. These statistical features were then standardized. During training stage -score standardization was applied on these features [19]. This standardization is just like usual -score normalization, but as we do not know the exact mean and standard deviation of the data (to be classified) during classification/test stage, we used the mean and standard deviation of the training examples during training stage for standardizing (normalizing) the features during classification stage. We normalized the features by subtracting and dividing them by training examples mean and standard deviation, respectively.

Feature Reduction.
In order to avoid overinterpretation by redundant data and misinterpretation by noisy data we applied feature reduction method. Inclusion of this part increases the processing time, thus exacerbating the latency.
Dimensionality reduction using principal component analysis (PCA) is based on a very important trait that is variance of the data. PCA develops the nonlinear mapping in such a way that it maximizes the variance of the data, which helps us in discarding that part of the data which is marked by lesser variances. This reshaping and omission not only removes the redundant data but also lessens up the noise.
During training stage PCA was applied on these features in order to reduce the redundant and/or noisy data. We kept the components which projected the approximate 95% of the total variance. We were able to reduce the 21 features into 9. Then we fed these reduced features to classifiers trainer. Here as per our observation we again assumed that the EEG data is stationary for a small length. So, during the testing stage, we took the PCA coefficients matrix from training stage and multiplied it with the standardized statistical features of the blind test data and then fed the top 9 features to classifier.

Classification.
Classification is a machine learning technique in which new observations belonging to a category are identified. This identification is based on the training set which contains the observations with known labelling of their category. These observations are also termed as features. We tried three types of classification methods: (1) SVM, (2) QDA, and (3) ANN ( Figure 3).
The reduced features were fed to these classifiers. Here the reduced features mean that those statistical features of the selected wavelet coefficients are reduced using PCA as described in previous section. All of the three processing parts were exclusive for each channel and each epileptic pattern. So like previous parts the classifiers were also trained and tested exclusively for each channel.
Our system requires individual labelling of channels. There is a separate classifier for each channel and for each epileptic pattern type. So, the total number of classifiers is equal to the product of total number of channels by ten where ten represents the number of epileptic pattern described by Noachtar and Rémi [3]. (f) Quadratic Discriminant Analysis. Quadratic discriminant analysis (QDA) is a widely used machine learning method among statistics, pattern recognition, and signal processing to find a quadratic combination of features which are responsible for characterizing an example into two or more categories. QDAs combination of discriminating quadratic multiplication factors is used for both classification and dimensionality reduction.
(g) Artificial Neural Network. Artificial neural network (ANN) is a computational model which is inspired from animals central nervous system. That is why ANN is represented by a system of interconnected neurons which are capable of computing values as per their inputs. In ANN training, the weights associated with the neurons are iteratively adjusted according to the inputs and the difference between the outputs with expected outputs. The iteration gets stopped when either the combination of neurons starts generating the expected results within an error of a tolerable error range or the iteration limit finishes up.

Adaption Phase (Retraining/User Adaptation Mechanism).
In order to keep the classifier improving its performance with the encounter of more and more examples, we have introduced a user adaptive mechanism in our system. Our system allows the user to interactively select epochs of his choice by simply clicking on the correction button. While using our system, when a user thinks that a certain epoch is falsely labelled/categorised, our system allows him to interactively mark mark that label as a mistake. These details will be saved in a log in the background and they will be used to retrain the classifier to improve its classification rate and adapt itself according to the user with the passage of time. When the user is going to select the retraining option in our system, then classifiers will retrain themselves on the previous and the newly logged training examples. As every user has to log in with his personal ID, every corrective marking detail will only be saved in that user's folder and only classifier will update itself for that user. Hence, the systems classifier tries to adapt itself according to that user without damaging anyone else classification.
The concept behind the inclusion of the retraining is that if there is more than one example with same attributes but different labels, the classifier is going to get trained to the one with most population. The user's corrective marking will increase the examples of his choice, thus making that classifier adapt itself to the user's choice in a trivial way. Every user will have exclusive classifiers trained for him and his marking will not affect other users' classifier. As we know, the users sometimes do not agree on the choice of the epileptic pattern or its type. The exclusive processing for each user will help the same software keep the system trained for every user and it will also let different users compare their markings with each other.
We do not have any standard right now to measure which neurologist is the most righteous among a disagreeing group of neurologist users. So we kept the corrective markings of each user to his account so that it may not interfere with the one who may not agree on his choice. So, the developed system is used to facilitate the neurologist's selection to the user according to his own choice and after initial training on every retraining it tries to adopt more users. This system does not want to dictate to the neurologist but rather learn from him to adapt him to save his time.
We want the classifier to think like the user and supplement him by highlighting the epochs of his choice, so the gold standard after few retraining mechanisms will be the user himself. Already tested examples with new labels inclusion in the training examples for the retraining will bias the classifiers choice in favour of user.

Experimentation
In this section, we will discuss the results in detail. At first, we will describe the datasets which we used to train, test, and validate our method. Then we will discuss their versatility (Figure 7).

Dataset.
Two labelled datasets of epilepsy suspected surface EEG data were available to us. Both of these datasets have lots of versatility in between them in terms of ethnicity, age, gender, and equipment. The datasets available to us were about generalised absence seizure which is characterized by the 3 Hz spike and wave epileptic pattern in almost each channel. That is why we have classification results available only for one type of epilepsy which is absence seizure.   Figure 2. In order to get the discriminating information between different types of epileptic patterns and identifying them correctly without mistaking them with each other, decomposition of this detailed coefficient further in Beta, Alpha, Theta, and Delta is hugely helpful. So we further decomposed them until the 7th level. Hence, we used the DWTs detailed coefficients of levels 3, 4, 5, 6, and 7 for 256 Hz sampled CHB-MIT dataset ( Table 2). After the selection of the wavelet coefficients, we calculated the statistical feature out of them. The statistical features were the mean, power, and standard deviation of all of the selected coefficients.
After the selection of detailed coefficients, we calculated the statistical feature out of them. The statistical features were the mean, power, and standard deviation of all of the shortlisted detailed coefficients.

Standardization.
During training stage, we first used simple -score normalization to standardize the features [19] before applying feature reduction. But the real issue arose when we tried to normalize them during testing stage. One way of doing this is that we keep all of the examples and apply -score on them along with the new test data. Instead of this time taking process, we made an assumption on our observation that mean and standard deviation does not deviate a lot. It is analysed in this study that the EEG time series are assumed to be stationary over a small length of the segments.

Classifier.
Classification is used in machine learning to refer to the problem of identifying a discrete category to which a new observation belongs. Observations with known labels are used to train a classification algorithm or classifier using features associated with the observation. For CHBMIT database, we had to train 220 classifiers in initial training stage. The calculation behind 220 is the 22 channels multiplied by 10 types of epileptic pattern. The 23rd channel was same as 15th channel. For PIMH dataset 330 classifiers were trained where 33 channels of EEG were utilized. We tried three different classifiers and found SVM to be the most accurate.
We have used blind validation mechanism for the ten different feature data distributions to estimate the classification performance. These 10 different and separate blind data distributions were taken from a huge set of EEG dataset. These 10 data distributions we randomly divided into two groups. We trained our classifier on one half of the distribution and tested it on the other half. We repeated that on all ten distributions. Then we calculated the average of the classification rate for the all ten distributions. 3229 out of 3297600 epochs were randomly taken for ten times from CHBMIT dataset. Each time half of them were used to train and half of them were used to test the initial classification. The average of the sensitivity, specificity, and accuracy for these ten distributions is considered as the initial training phase performance.
Same approach was applied on PIMH datasets where 3229 out of 24097 epochs were randomly taken from PIMH dataset for the six times instead of ten times.
Due to unavailability of the non-3 Hz spike and wave epileptic EEG data, currently we have only classification rates for generalized absence seizure.

Exclusive Processing.
In this study, we have analysed that even in the case of absence seizure epileptic patterns do not appear in the exact same way in each channel. Handling of each channel exclusive to each other was also another very important decision. We tested the classification in both ways, that is, one classifier for all of the channels at once versus one separate classifier for each channel (Figure 8).   a lesser training time as compared to SVM, but considering the sensitivity and classification improvement through corrective marking, we think that SVM is the better choice than LDA and ANN. In upcoming sections, we have shown the results for all three types of classifier.      dataset file, and he marked the same amount of epochs for each channel. These corrective markings were saved in his log as training examples. These corrective markings as the new examples along with the 32290 epochs of initial training stage were used to retrain the classifier. The number 32290 has come from the 3229 randomly selected epochs from whole of the CHBMIT dataset for the ten separate times during initial training phase. Then later the performance of the classifier after retraining was judged again on another random 3000 epochs ( Figure 14).
In Table 4, we have shown the average initial classification and retrained classification results of our system for each channel. Table 4 shows that our technique is robust and it works also on a different dataset. The average accuracy of the system rose from approximately 89% to 90%.

Discriminate Analysis.
We used the discriminant analysis package available in MATLAB Statistics Toolbox. We found pseudoquadratic to be the best performing discriminate type with uniform probability.
(a) CHBMIT. For CHBMIT dataset, initial training of the classifier resulted in 94% average accuracy, 96% average specificity, and 90% average sensitivity for 3 Hz spike and wave which is a characteristic of absence seizure. After initial training our specificity is better than that of Shoeb [10] and Nasehi and Pourghassem [21] (Figure 13).
(b) PIMH. For PIMH dataset, initial training of the classifier resulted in 90% average accuracy, 95% average specificity, and 73% average sensitivity for 3 Hz spike and wave which is a characteristic of absence seizure. In Table 6, we have shown the average initial classification and retrained classification results of our system for each channel. Table 6 shows that our technique is robust and it works also on a different dataset. The average accuracy of the system rose from approximately 89% to 90%.

Artificial Neural Network.
We used feedforward backpropagation package available in MATLAB Neural Network Toolbox and found Levenberg-Marquardt to be the best method, with 0.05 learning rate.
(a) CHBMIT. For CHBMIT dataset, initial training of the classifier resulted in 92.88% average accuracy, 98.66% average specificity, and 75.75% average sensitivity for 3 Hz spike and wave which is a characteristic of absence seizure (Figure 9). In Table 7, we have shown the average initial classification and retrained classification results of our system for each channel. In this system, we have shown that after correction of few epochs there is visible improvement in the systems classification. The average accuracy of the system rose from 92.88% to 93.96%.
(b) PIMH. For PIMH dataset, initial training of the classifier resulted in 84% average accuracy, 94.8% average specificity, and 56.5% average sensitivity for 3 Hz spike and wave which is a characteristic of absence seizure ( Figure 10).
In Table 8, we have shown the average initial classification and retrained classification results of our system for each channel. Table 8 shows that our technique is robust and it works also on a different dataset. The average accuracy of the system rose from approximately 84% to 85.43%.

Discussion and Future Work
Computer-assisted analysis of EEG has tremendous potential for assisting the clinicians in diagnosis. A very important and novel phase of our system is user adaptation mechanism or retraining mechanism. Introduction of this phase has importance in many aspects. In this phase, system tries to adapt its classification according to users desire. Moreover, this technique personalizes the classifiers classification. It has been cited that sometimes even the expert neurologists have some disagreement over a certain observation of an EEG data. This system will be useful for disagreeing users and it will also help them in comparing their results with each other. There is also a threat of overfitting by the classifier. In order to keep the classifier improving its performance with the encounter of more and more examples, we have introduced this user adaptive mechanism in our system. We consider the existing systems as dead because these cannot improve their classification rate after initial training (during software development). The self-improving mechanism after deployment makes our tool alive. This system can be made part of the whole epileptic diagnosis process. It will highlight the epileptic spikes among the whole EEG, thus leading to reduced fatigue and time consumption of a user. We obtained high classification accuracy on datasets obtained from two different sites, which indicates reproducibility of our results and robustness of our approach.
In the future, we are planning to make this a web based application; neurologists can log in and consult each other's reviews about a particular subject. This will make our system experience a whole versatility of examples and learn from all of them. Integration of the video and its automatic analysis (video EEG) can help a neurologist in diagnosing epilepsy in a better way, whereas this can also help him in distinguishing between psychogenic and epileptic seizures. We would also be investigating how much overfitting is an issue in the reported performances which are now even touching 100% based on some claims. There is a need for method/criteria which could limit these algorithms improving their detection on a limited number of available examples.
This system is made keeping in mind that we have to facilitate the neurologist by supplementing him in the analysis of the EEG. We do not want to enforce the classification of the EEG data on a user.
In the future, we will also include a slider in the system which will allow the user to adjust the sensitivity and specificity before retraining. This assisting system is more like a detection tool which is continuously learning with encounter of better examples. More and better examples will certainly improve its performance. The agreement between different neurologists over the EEG readings is low to moderate. If we could find the agreement on at least few of the epileptic patterns correspondence with epileptic disease then we can take this tool further ahead and use it for diagnosis instead of just assistance. One of the biggest limitations to this study is the unavailability of non-3 Hz spike and wave data. Even though we have included the data features of the entire epileptic frequency ranges exclusive to each other, proof testing on the data will certainly prove worthy for the progress of these assisting tools toward a diagnostic tool.