Epileptic Seizures Prediction Using Machine Learning Methods

Epileptic seizures occur due to disorder in brain functionality which can affect patient's health. Prediction of epileptic seizures before the beginning of the onset is quite useful for preventing the seizure by medication. Machine learning techniques and computational methods are used for predicting epileptic seizures from Electroencephalograms (EEG) signals. However, preprocessing of EEG signals for noise removal and features extraction are two major issues that have an adverse effect on both anticipation time and true positive prediction rate. Therefore, we propose a model that provides reliable methods of both preprocessing and feature extraction. Our model predicts epileptic seizures' sufficient time before the onset of seizure starts and provides a better true positive rate. We have applied empirical mode decomposition (EMD) for preprocessing and have extracted time and frequency domain features for training a prediction model. The proposed model detects the start of the preictal state, which is the state that starts few minutes before the onset of the seizure, with a higher true positive rate compared to traditional methods, 92.23%, and maximum anticipation time of 33 minutes and average prediction time of 23.6 minutes on scalp EEG CHB-MIT dataset of 22 subjects.


Introduction
The disease in which patients suffer seizures caused by a brain functionality disorder is called epilepsy [1]. While more than fifty million people around the world are diagnosed with epilepsy [2], in the United States, about three million patients have been affected by epilepsy. Epilepsy is the third most common brain disorder [3]. Meanwhile, there are several possible causes of epilepsy, one of which is a molecular mutation, which results in irregular neuronal behavior or migration of neurons. Although the main cause of epilepsy remains unknown, early diagnosis can be useful for treating epilepsy. Epilepsy patients can be treated with drugs or surgical procedures [4]. However, these methods are not fully effective. Unfortunately, seizures that cannot be completely treated medically limit the active life of the patient. In these cases, patients cannot independently work and do some activity. This leads to social isolation of individuals and economic difficulties.
Early prediction of epileptic seizures ensures enough time before it actually occurs; it is very useful because the attack can be avoided by the drug. Epileptic seizures have four different states: the preictal state, which is a state that appears before the seizure begins, the ictal state that begins with the onset of the seizure and ends with an attack, the postictal state that starts after ictal state, and interictal state that starts after the postictal state of 1st seizure and ends before the start of preictal state of consecutive seizure. Figure 1 shows different input states of three different channels. In addition, seizures can be predicted by detecting the beginning of the preictal state.
Detecting the appearance of preictal state [5] predicts the seizure. Therefore, the purpose of our investigation is to detect the appearance of preictal state for epileptic seizures. Machine learning models are used to predict epileptic seizures. These machine learning models include EEG signal acquisition, signal preprocessing, features extraction from the signals, and finally classification [6] between different seizure states. The objective of the prediction model with machine learning was to detect preictal state's sufficient time before seizure onset starts [7]. However, enough time for the predictive preictal state of the gland and maximum sensitivity are important, and they remain as a performance issue in the prediction of epileptic seizures.
Preprocessing and feature extraction from EEG signal have great affect in maximizing prediction time and true   positive rate (TPR). Preprocessing is performed for removing noise from the signals and to increase the signal-tonoise ratio (SNR). Many researchers [8] have discussed preprocessing steps that include converting multiple channels of EEG signals into a single surrogate channel, and then filters have been applied to increase the signal-tonoise ratio (SNR). EEG signals that have been acquired by using multiple electrodes can be converted into surrogate channel by using averaging filter, common spatial filtering (CSP), large Laplacian filter, and optimized spatial pattern (OSP) filtering [8]. Many researchers have also discussed extraction of linear and nonlinear features for prediction of epileptic seizures. Features that distinguish between preictal and interictal states include autoregressive coefficients, relative power from EEG signals of different frequency bands, complexity, Hjorth parameters [9], zero crossing, Lyapunov exponent [10], and spectral and statistical moments [11].
Finally, as we have observed in the literature [12][13][14][15][16] that there is no machine learning model that provides an absolutely reliable method for both preprocessing and features extraction, we propose an effective and reliable machine learning model for prediction of epileptic seizures. Our model focuses on preprocessing and features extraction from EEG signals. We have converted multiple channels EEG signal into the surrogate signal, and then empirical mode decomposition (EMD) [17] has been applied for increasing signal-to-noise ratio (SNR). We have extracted multiple features including entropy, approximate entropy, Hjorth parameters, spectral moments, and statistical moments. It has been observed that both statistical and spectral features give increased sensitivity between interictal and preictal states. Support Vector Machine (SVM) [18] has been used as a classifier for classification between preictal state and interictal state.
The rest of the paper is organized as follows: Section 2 discusses related work. Section 3 explains the proposed methodology. Section 4 reports and compares the experimental results. Section 5 concludes the paper and mentions future work.

Related Work
In the previous section, we have discussed four states of seizures. Among these four states, ictal state and preictal state are very useful for predicting epileptic seizures. The ictal state can be used to classify seizure and nonseizure EEG signals. The preictal state is another useful state of epileptic seizures. It begins several minutes before the onset of a seizure, and it ends with the start of ictal state. Many researchers [12][13][14][15][16] tried to detect the beginning of the preictal state by using EEG signals. However, only a few have reliably detected the preictal state of epilepsy. Preprocessing of the EEG signal to increase signal-to-noise ratios and features extraction play an important role in reliable prediction of epileptic seizures.
The combination of multiple unique features into a feature vector can be used to predict the preictal state of epileptic seizures. Rasekhi et al. in [12] have proposed an algorithm for seizure prediction with the help of univariate linear features. In [12], the authors used only six EEG channels in their proposed model and extracted 22 univariate linear properties. Thus, a 132-dimensional feature space is created. It was assumed that preictal time starts 10 to 40 minutes before the ictal state with a difference of 10 minutes. Prediction of epileptic seizures is considered by classifying a binary class that classifies test data into either preictal state or ictal state. On average, the prediction sensitivity after applying this algorithm is 73.90%.
In [12], the authors used Support Vector Machine as a classifier to classify the preictal and ictal states of EEG signals. The authors have used extracted univariate linear features using the window size of seconds in their algorithm. In the second step, preprocessing is done, and finally a decision was made on EEG signal, following certain regularization. Three EEG channels have been extracted by placing electrodes on the patient's scalp focusing on seizure, while the three electrodes are located outside the confiscated surface. The data of the acquired EEG signals are converted into segments of a nonoverlapping window having the size of 5 seconds. After converting this data to the 5-second segments, the Butterworth filter [19] was used to reduce the noise effect.
The authors in [12] extracted the first four statistical moments as features. All these four features measure similarity, variance, and symmetry of consecutive samples of EEG signals. In order to deal with outliers, the authors have standardized all the features, so there will be no outliers. Although the noise has been reduced from signals, still there was some noise in the EEG signals, as the brain is a nonstationary source for recording the EEG signals.
Smoothing is performed on the EEG signals to eliminate the noise.
It has been observed in literature that univariate linear features have better sensitivity performance for epilepsy data of EEG signals. Teixeira et al. in [13] have proposed a model for prediction of epileptic seizures by choosing only six channels of EEG signals and have extracted 22 linear univariate features for each channel. The overall feature space expands to 132 dimensions. In [13], the authors have used only six electrodes for EEG data acquisition. The main reason behind this minimum electrode selection is to set free the patient from wearing a large number of electrodes, as patients are often unwilling to wear so many electrodes on their scalp due to discomfort. Therefore, in order to give comfort to the mind, only six channels have been acquired and used for prediction purpose. The authors have selected these electrodes by using three different approaches. One method is by randomly selecting six EEG electrodes, while the second method is to choose six channels from electrodes which have been placed on the scalp area from where seizures originate. The authors in [13] have used notch filter [20] for smoothing as preprocessing step for noise removal. They have also tested their model for prediction by varying multiple combinations of electrodes and also with four different preictal state durations. They have used three classifiers for classification and have approximately predicted every seizure.
After selecting suitable features, training data is fed into Support Vector Machine for training the classifier, and then test data is passed for determining classification accuracy and sensitivity. The authors have observed sensitivity of 75.8% of detecting the seizure, meaning that, out of 87, they have successfully detected 66 seizures. The authors have also proposed that performance can be improved by further reducing features set.
In [14], Bandarabadi et al. have proposed an algorithm to predict epilepsy seizures that can extend the life of epilepsyaffected patients. They have extracted spectral power features, and after suitable selection of features, features are passed into Support Vector Machines for classification. They have observed sensitivity of 75.8%; it means that their classifier has predicted 66 seizures out of total 87. They have concluded that, by applying these methods, after reducing proposed features subset can improve seizure prediction performance.
In [15], the authors have used wavelet method for prediction of seizures. They have extracted features including wavelet energy and wavelet entropy. Two or three channels have been selected for testing purposes on a dataset of six patients. Sensitivity has been reported as 88% with average anticipation time of 22 minutes.
Zandi et al. [16] have also proposed a model for predicting seizures using scalp EEG signals on the basis of zero crossings. The authors in [16] have computed the histogram of all intervals in a moving average window and have selected values from particular bins for observations. Once the whole process is completed, last 5 seconds of observations are compared with different reference sets of points, containing preictal and interictal states. They have measured a similarity index on the basis of variational Bayesian Gaussian mixture [21] model of EEG data.
A combined similarity index is calculated after measuring the similarity index on the basis of a particular threshold. An alarm is generated upon indication of the start of the preictal state of the seizure which predicts the seizure. The authors have applied their model for epileptic seizure prediction on a dataset of 20 subjects including 86 seizures. They have observed sensitivity of the model of 88.3%. Average anticipation time in their case, that is, the time of detecting preictal state before onset of the seizure, has been observed to be 22.5 minutes. Table 1 shows a comparison of various machine learning models on the basis of the number of subjects, number of seizures, type of EEG data, sensitivity, and false positive rates.
It has been observed in researches of Table 1 that there is no reliable method for selection of channels from EEG signals, and no reliable preprocessing method has been devised. Therefore, keeping in mind the fact that preprocessing of the EEG signals can improve prediction sensitivity and average anticipation time, we propose an effective machine learning method for epilepsy prediction.

Proposed Method
The preictal state is very useful for seizure prediction, as it starts few minutes before the seizure. This made it possible for us to be able to predict epileptic seizure, if we successfully detect the start of preictal state. The aim of this research is to predict epileptic seizure by detecting the start of preictal state's sufficient time before the ictal state or onset of seizure starts. Early prediction [22] helps patients, as medication can be done by the doctors to prevent the seizure. Due to this medication, the patient can now perform his or her routine activities without any interference from seizures. After critical considerations of these states, we have proposed a model to detect the start of the preictal state. However, EEG data acquisition by placing electrodes on the scalp of the patient is out of the scope of our research. Consequently, we have used a publically free online available dataset of CHB-MIT [23]. The dataset has been acquired by placing 23 electrodes on the scalp of 22 subjects. We have performed preprocessing of the data in two stages; in the first stage, 23 channels EEG signals are converted into a surrogate channel, which is a single signal to improve the SNR. In the second stage of preprocessing, empirical mode decomposition (EMD) has been applied to the surrogate channel for further increasing SNR.
It is pertinent to know that the surrogate channel of EEG signal can be obtained by applying either averaging filter, large Laplacian filter, or common spatial pattern (CSP) filtering [  reference electrode, among the 23 electrodes, large Laplacian filter is not possible, as it requires information about reference electrode for assigning maximum weight to it. We have applied averaging filter and CSP to obtain surrogate channel and have concluded after comparing results that CSP gives increased SNR; CSP also increases within a class variance of signals.
We have performed the empirical mode decomposition (EMD) after converting into the surrogate channel to increase the signal-to-noise ratio. Empirical mode decomposition decomposes the surrogate channel EEG signal into its oscillatory functions, known as Intrinsic Mode Functions (IMFs). Noise affects high-frequency components; therefore, we have taken only last four IMFs having low frequencies and contained more information about seizures. These four IMFs are combined, and the features have been extracted for classification.
Features have been extracted in both time and frequency domains. Statistical features have been extracted in a time domain, whereas spectral features have been extracted in a frequency domain. We have compared three classifiers in terms of sensitivity, and Support Vector Machine has been selected as it gives greater sensitivity; therefore, classification is done by using Support Vector Machines in order to distinguish testing data between preictal and interictal states. Figure 4 shows a flowchart that illustrates our proposed method. Figure 2 shows our proposed flowchart. Meanwhile, data acquisition has been performed in the first step by placing electrodes on the scalp of subjects, and data is stored in edf format [24]. edf data is converted into MATLAB [21] files having the extension * .mat with the help of MATLAB function called "edfread." Figure 3 shows plots of different channels of EEG signal for a single session. Most of the sessions are of one-hour recordings, as data has been acquired using 23 electrodes and the sampling rate of 256 Hz. Noise has been added in 23 channels of EEG signals during the recording of these signals; therefore, in order to increase the signal-to-noise ratio (SNR) of the signal, we have to convert it into a surrogate channel by applying common spatial pattern (CSP) filtering. In the next step, empirical mode decomposition (EMD) is applied on the surrogate channel EEG signal for further increasing SNR. Statistical features in time domain and spectral features in frequency domain have been extracted.

Surrogate
Channel. CHB-MIT dataset has been acquired by placing 23 electrodes on the scalp of subjects; therefore 23 channels have been obtained. These channels contain noise that affects prediction directly; therefore, we have to convert these channels into a surrogate channel with increased SNR. The multiple channels signal can be converted into a single surrogate channel by applying common averaging filter, large Laplacian spatial filter, and common spatial pattern (CSP) filter.

Averaging Filter.
We have applied a simple averaging filter on the multiple channels EEG single to convert it into the surrogate channel. This filter computes the average of all channels to form a single-channel EEG signal. Although the surrogate channel obtained after applying this filter has more signal-to-noise ratio than multiple channels signal, there is still some noise in it. Figure 4 shows plot of surrogate channel EEG signal obtained after applying averaging filter.

Large Laplacian
Filter. Following our observation, averaging filter has not given better results. Large Laplacian filter also gives increased signal-to-noise ratio and can be used for surrogate channel [25]. In our case, CHB-MIT dataset has no information about reference electrode, and, without this information, we cannot apply large Laplacian filter.

Common Spatial Pattern Filter.
Common spatial pattern (CSP) filter [26] can be used to convert multiple channels signal into the surrogate channel, as it gives increased signalto-noise ratio. It also performs better in case of EEG signals as it increases SNR and the variance interval between two class variables. If 1 and 2 are the signals from two different epilepsy states, that is, preictal state and interictal state, then you get 1 and 2 when you divide the squares of signals by the trace in (1) and (2), respectively.
Weight is computed in (5) and 1 and 2 are computed in (6) and (7). Eigenvalue decomposition is performed in (8) to get eigenvector and eigenvalues . These values are sorted in descending order in order to get a filter in the following equation: Equation (9) shows coefficients of common spatial pattern (CSP) filter. We have applied CSP on all sessions to get surrogate channel EEG signal in order to get increased SNR and high variance between classes of preictal and interictal states. Figure 5 shows a plot of EEG signal after applying CSP for converting it into a single surrogate channel. The plot shows that this surrogate channel indicates the start of preictal state.

Empirical Mode Decomposition (EMD).
In empirical mode decomposition (EMD) [27], a time domain signal is broken into a number of oscillatory functions known as Intrinsic Mode Functions (IMFs). This process of decomposition of signal into multiple IMFs while remaining in time domain is comparable with wavelet decomposition and Fourier transform. EMD is a very useful process for analyzing signals that are nonstationary and not linear. When empirical mode decomposition is applied on surrogate channel EEG signal, it filters out noise from the original signal. Highfrequency components contain noise, whereas low frequencies contain original signal and information about seizures.
There are different frequency components that are obtained as a result of applying empirical mode decomposition and Intrinsic Mode Functions (IMFs). Multiple IMFs are obtained depending upon the nature of EEG signal. Meanwhile, there is an important fact about this decomposition, where the length of each Intrinsic Mode Function is equal to the length of the surrogate channel signal. We have applied EMD on surrogate channels of each session for all subjects in MATLAB. We have analyzed different IMFs and have observed that the last four IMFs obtained after applying EMD contain maximum information about states of seizures. Therefore, we have combined the last four IMFs, which are used for feature extraction. Every Intrinsic Mode Function must follow the following conditions: (1) The total peak values and the count of zero crossing should ideally be the same or have a maximum difference of 1. (2) At any given point of signal, the average envelope value is defined by local maxima, and envelope defined by local minima is zero. Assume that ( ) is the given signal; all maxima and minima can be obtained by Algorithm 1.

Selecting a Nonoverlapping
Window. CHB-MIT dataset has been sampled at 256 Hz; therefore we have selected a nonoverlapping window of 1 second consisting of 256 samples. Selection of window has been done for downsampling the data in order to increase processing speed.

Feature Extraction.
Several features have been extracted in this work, but spectral and statistical moments perform better in terms of both anticipation time and sensitivity. Therefore, we have extracted the first four statistical moments and the three spectral moments [28]. states of epileptic seizures. These statistical moments give information of the distribution of samples including variation between samples, symmetry, and peaks. Analysis of statistical moments of IMFs of seizure and nonseizure sessions obtained after applying EMD shows that this information can be easily classified. If represents combined IMFs of the signal and denotes length of the IMF; then statistical moments can be computed by the following equations: where is the total number of samples of length of IMF; is the mean, is the standard deviation, and is skewness of the corresponding IMF. Figure 6 shows a plot of mean of EEG signal, whereas Figures 7, 8, and 9 show plots of standard deviation, skewness, and kurtosis for EEG signal, respectively.

Spectral Features.
Empirical mode decomposition can perform spectral analysis [29] for EEG signals. Amplitude of  EEG signals increases in frequency domain during seizures, which is useful for prediction. Therefore, we have also extracted the features in frequency domain.
A conceptual understanding of empirical mode decomposition is that it decomposes a single signal into a number of Intrinsic Mode Functions (IMFs), which are obtained by passing from filters having narrow pass bands. Spectral moments have been extracted from Intrinsic Mode Functions (IMFs). These moments provide useful information of different states of seizures. Most of the time, when applying EMD, this spectral analysis is performed by using the calculation of instantaneous frequencies (IF). However, it is also reported that calculation of instantaneous frequencies has a physical meaning only for mono components of signal.
It is no longer news that when EMD is applied on EEG signals we never get monotonic component. As an alternative, we have computed power spectral density (PSD) [30] for extracting spectral moments. The discrimination power of the PSD features can be visually analyzed by their respective plots for three IMFs from the normal and pathological EEG signals. The PSD can be calculated as follows: (1) Spectral Centroid. As the spectral variation of the IMF differs for both normal and seizure subjects in EEG signals, the variational coefficient can be used as a feature for classification of EEG signals. Equation (13) gives the formula for computation of variational coefficient; ( ) is the amplitude of th frequency bin in the spectrum. In (12), computes spectral centroid of the signal. Figure 10 shows plot of spectral centroid for EEG signal.
(2) Variational Coefficient. As the spectral variation of the IMF differs for both normal and seizure subjects in EEG  signals, variational coefficient can be used as feature for classification of EEG signals. Equation (13) gives the formula for computation of variational coefficient; is the spectral centroid. Both and power spectral density ( ) are used for computing variational coefficient 2 . Figure 11 shows plot of variational coefficient of EEG signal.
(3) Spectral Skew. Spectral skew is the third spectral moment, which measures the symmetry of the distribution of data. The plot of spectral skew of EEG signal shows that there is a significant change that occurs when in different states of seizure. Therefore, spectral skew can be a useful feature for prediction of epileptic seizure of EEG signal. Spectral skewness can be computed by using spectral centroid , variational coefficient , and power spectral density ( ) in (14). Figure 12 shows plot of spectral skewness of EEG signal.
3.5. Combined Feature Set. First, four statistical moments and three spectral moments have been extracted from IMFs and combined into a single feature vector, which is used for classification.

Classification of Preictal and Interictal
States. As we have already discussed that for predicting epileptic seizures it is important that we detect the start of preictal state, which shows that seizure is going to occur after few minutes, we need to classify preictal and interictal states of seizure. We have performed classification on features that we have extracted from the window of 1 second. The preictal state is detected where three samples are classified as preictal state samples.

Classifier Selection.
We have extracted some temporal features from ictal and nonictal state and performed a comparison for suitable selection of a classifier.
Three classifiers have been applied: -nearest neighbor classifier, naïve Bayes, and Support Vector Machines. After performing the comparison of three different classifiers, Support Vector Machine has been selected for prediction of epileptic seizures because it performs better in terms of sensitivity. Figure 13 shows comparison of sensitivity of three different classifiers.

Results
For the selection of a suitable classifier, we have performed a classification of the ictal state with rest of states of seizure. Support Vector Machine is chosen as a classifier due to its superior performance in terms of sensitivity. After selection of classifier, we have applied our model for prediction of epileptic seizures on CHB-MIT dataset that contains EEG signals recordings of several hours for 24 patients of 3-19 Zandi et al. [16] Proposed model Sensitivity (%) Avg. prediction time (min.) years of age having 84 seizures, and the data is sampled at the sampling rate of 256 Hz. We have selected only those sessions from datasets which have seizure onset, at least after 20 minutes from start of the session. We have computed average anticipation time for 84 sessions only from a complete CHB-MIT dataset. We have computed average prediction time of epileptic seizure as 23.48 minutes, whereas maximum prediction time has been observed as 33.46 minutes before the onset of the seizure. Average sensitivity has been observed as 92.23% and specificity as 93.38%. Figure 14 and Table 2 show a comparison of results generated by our proposed model and those by other models. It is quite evident from the comparison that our model performs better in terms of both sensitivities and anticipation time as compared to Zandi et al. [16] and Teixeira et al. [13].

Conclusion and Future Work
In this research, we have used the CHB-MIT dataset that was recorded by placing electrodes on scalp of subjects to predict epileptic seizure. We have tested our proposed model on the dataset, and it has been shown in results that our model performs better in terms of both sensitivity and average prediction time as compared to other models for prediction of epileptic seizures. After applying the proposed model on the dataset, on average we have predicted epileptic seizures 23.6 minutes before the start of the onset of a seizure. Therefore, with the help of our proposed model, epilepsyaffected patients will get more time for proper medication required for preventing the seizure before it actually occurs. We have also observed maximum prediction time of 33.46 minutes on the dataset.
In the future, preprocessing of the EEG signal can be further improved to get an increased sensitivity of seizure prediction. Other preprocessing methods can be tried, including those hybrid preprocessing methods and those that come with adaptive window sizes [31]. Moreover, we can also develop an online system for prediction of epileptic seizures.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.