Automatic Detection of Epilepsy and Seizure Using Multiclass Sparse Extreme Learning Machine Classification

An automatic detection system for distinguishing normal, ictal, and interictal electroencephalogram (EEG) signals is of great help in clinical practice. This paper presents a three-class classification system based on discrete wavelet transform (DWT) and the nonlinear sparse extreme learning machine (SELM) for epilepsy and epileptic seizure detection. Three-level lifting DWT using Daubechies order 4 wavelet is introduced to decompose EEG signals into delta, theta, alpha, and beta subbands. Considering classification accuracy and computational complexity, the maximum and standard deviation values of each subband are computed to create an eight-dimensional feature vector. After comparing five multiclass SELM strategies, the one-against-one strategy with the highest accuracy is chosen for the three-class classification system. The performance of the designed three-class classification system is tested with publicly available epilepsy dataset. The results show that the system achieves high enough classification accuracy by combining the SELM and DWT and reduces training and testing time by decreasing computational complexity and feature dimension. With excellent classification performance and low computation complexity, this three-class classification system can be utilized for practical epileptic EEG detection, and it offers great potentials for portable automatic epilepsy and seizure detection system in the future hardware implementation.


Introduction
Epilepsy is one of the most common chronic neurological disorders and is a condition with recurrent evoking of seizure. Nowadays, about one percent of population in the world is suffering from epilepsy [1], which costs billions of dollars annually for direct medical care. Epileptic seizure impacts the quality of life for patients and their families and even leads to the death of patients. Therefore, detecting and curing epilepsy with high efficiency are very necessary. Electroencephalogram (EEG), which shows the temporal and spatial information of brains' electrical voltages, is successfully used to diagnose epilepsy patients [2]. Currently, the seizure detection relies on "interviewing" patients and inspecting EEG recordings by highly trained professionals in hospitals [3,4]. However, this approach is extremely inaccurate and inconvenient, and epilepsy patients may show normal states when their seizures do not occur. Differentiating between healthy and interictal (seizure-free) EEG signals can be used to diagnose epilepsy in a clinical setting and, additionally, the detection of seizure is of importance for instant treatment [5]. So, automatic classification of healthy, ictal (seizure), and interictal EEG signals is of great clinical significance.
The machine learning approach is generally used to the automatic detection of seizure EEG signals. Many machine learning methods have been used for EEG classification [6][7][8][9][10][11][12][13][14][15]. Artificial neural network (ANN) has been widely applied to classify EEG signals over the last two decades [8]. However, the conventional learning algorithms for ANN, such as the backpropagation (BP) algorithm, are prone to fall into a local minimum [9]. It is very time-consuming to adjust the connection weights and biases in ANN, and the learning speed is too slow to meet the requirements of practical applications, which has been a major bottleneck for development [9]. Another popular machine learning method, support vector machine (SVM), has been successfully used to classify epileptic EEG signals [10][11][12]. However, since the training of SVM involves a quadratic programming (QP) problem, 2 Computational and Mathematical Methods in Medicine the computational complexity of SVM training algorithms is usually intensive, which is at least quadratic with respect to the number of training examples. So it is difficult to deal with large problems using SVM [10]. Extreme learning machine (ELM) is an emerging machine learning method which was proposed by Huang et al. [13,14] for the generalized single hidden layer feedforward neural networks (SLFN), in which the hidden node parameters are randomly generated and the output weights are analytically computed [13,14]. ELM is successfully applied to detect seizure EEG in previous works [7,9]. However, the initial ELM would consume large storage space if implemented in hardware in calculating the inverse of matrix [15,16]. The sparse ELM (SELM), stateof-the-art algorithm, was proposed in [16]. Similar to the conventional SVM, the training of SELM is essentially a QP problem. The only difference between them is that the SELM does not have the sum constraint [16]. In the SELM, as fewer constraints need to be satisfied, and only one Lagrange variable needs to be updated in each iteration, the training process would be easier. Consequently, compared with SVM and ANN, the SELM needs less storage space and takes shorter training and testing time. With all the advantages, the SELM provides a more efficient way for hardware implementation and satisfies the demand for portable seizure detection application.
The SELM was originally proposed for binary classification. Different strategies based on SVM have been proposed for multiclass classification problems [17][18][19][20], such as oneagainst-all (OAA), one-against-one (OAO), binary tree (BT), error-correcting output codes (ECOC), and directed acyclic graph SVM (DAG-SVM). Inspired by the multiclass SVM, the binary SELM classifier can also be extended for multiclass classification by constructing and combining several binary classifiers together. For achieving good performance in the three-class classification of epileptic EEG signals, nonlinear SELM classifiers with different kernel functions are compared and kernel parameters are optimized simultaneously in this work. Eventually, we find that OAO strategy with Gaussian SELM is the best multiclass classification for epilepsy and seizure detection.
The feature extraction of EEG signals plays an important role in the performance of multiclass classification [21]. The methods of feature extraction used can be categorized into four types: time domain, frequency domain, time-frequency domain, and nonlinear analysis [21,22]. For nonstationary EEG signals, discrete wavelet transform (DWT) has been proved to be an efficient tool due to its ability to resolve the signals in both time and frequency domains [2]. The DWT filters are conventionally designed based on the convolution operation architecture [23] which requires many complex operations and large memory [24]. To overcome these drawbacks, the liftingbased DWT (LDWT) is adopted and implemented. There are many types of wavelet transforms such as Haar, Mexican Hat, Gaussian, Morlet, and Daubechies wavelets, of which Daubechies 4 (db4) wavelet is found to be the most appropriate for epileptic EEG analysis because its wave characteristic is similar to the spike wave of the EEG signals [25].
Selection of the SELM input is important since even the best classifier will perform poorly if the input is not selected well [22,26]. Although some previous feature selection methods can increase the detection performance [27,28], they suffer from the high dimensionality of features, and the complexity makes hardware implementation difficult and expensive. In this work, three-level LDWT is used to decompose the EEG signals into delta, theta, alpha, and beta subbands; the feature values of each subband are computed to create multidimensional feature vectors. In order to obtain maximum accuracy with a low-dimensional feature vector under certain conditions, a great number of combinations of different features are investigated. Finally, the maximum and standard deviation values of each subband are calculated to create eight-dimensional feature vectors as the input to the multiclass SELM classification.
To the best of our knowledge, this work is the first work to design a three-class classification system based on LDWT and the multiclass SELM for detecting epilepsy and seizure. This paper makes two contributions. First of all, this paper develops a low computational complexity for feature extraction and multiclass classification that can detect epilepsy and seizure with high enough classification accuracy. What is more, this paper provides a good solution for portable automatic epilepsy and seizure detection system.
The rest of this paper is organized as follows. Section 2 describes the methods of the multiclass classification system, including the SELM algorithm, multiclass classification strategy, and feature extraction based on LDWT. Section 3 describes the experimental results and discussions of the proposed epilepsy and seizure detection system. Section 4 concludes the paper.

Methods
This section will present the multiclass classification system for epilepsy and seizure detection. The multiclass classification system consists of two phases: training and testing phases. Figure 1 shows the workflow of the proposed EEG classification system. EEG signals are decomposed into one approximation and three detailed coefficients using the threelevel LDWT, and then eight features are extracted by computing the maximum and standard deviation values of the wavelet coefficients (discussed in detail in what follows). The eight-dimensional feature vectors are input to the multiclass SELM. Labelled EEG signals are used for training the system, and, after training, unlabelled EEG signals can be automatically classified into normal, interictal, or ictal ones by the multiclass SELM system.
In this section, we first review the SELM algorithm and present the five strategies of the multiclass SELM and then introduce the LDWT-based feature extraction. QP problem which can be written as follows [14]: where N is the number of training samples, is the Lagrange multiplier, ∈ {±1} is the associated class label, and C is a predefined regularization constant. ( , ) is kernel function that is used for nonlinear classification, and the kernel functions could be, but not limited to, the following [16].
Gaussian kernel is Laplacian kernel is Polynomial kernel is The training algorithm of the SELM is summarized as follows.
Since only one Lagrange multiplier needs to be updated in each iteration [16], choosing the updated is vital. The index of the updated in each iteration is determined according where = ⋅ , = ( / ) denotes the gradient of , and indicates the way in which should be updated, expressed as follows: The corresponding Lagrange variable is updated as follows: The unconstrained point must be checked to ensure that it is in the feasible range [0, ], and the clipped function can be written as follows: After updating , ( = 1, 2, . . . , ) is updated as follows: Based on the updated values of and , 's ( = 1, 2, . . . , ) are updated according to the definition.
When the training stage is finished and the SELM parameters are determined, we can classify a new object with where is the number of nonzero Lagrange multipliers. The pseudocode of the SELM training algorithm is summarized in Algorithm 1.

Multiclass SELM Strategy.
Even though the SELM is designed for binary classification, it can be extended for multiclass classification by constructing and combining several binary classifiers together. In the multiclass SELM, we discuss the five typical strategies, namely, OAA, OAO, BT, ECOC, and DAG [17][18][19][20]36]. For the three-class problem (normal, interictal, and ictal EEG signals), we make a brief introduction of these approaches.
(1) OAA (see [17]  Step 2: Update Lagrange multiplier , clip to [0, ]. Step (2) OAO (see [17]). Here three binary SELM classifiers are trained, and each classifier is trained using samples from a pair of classes. After training, three decision functions sign(∑ =1 12 12 ( , )), sign(∑ =1 23 23 ( , )), and sign(∑ =1 13 13 ( , )) are used to determine the class of an unknown sample by the majority vote strategy. In the vote strategy, if sign(∑ =1 ( , )) says that is in the pth class, then the vote for the pth class is added by one; otherwise the qth is increased by one; then we predict that is in the class with the largest vote. However, if each class has the same vote number, we say is in the class which has the largest absolute function value. For example, if | ∑ =1 12 12 ( , )| is the largest one in the three functions, the final class is determined by the decision function sign(∑ =1 12 12 ( , )).
(3) DAG (see [17]). Its training phase is the same as the OAO strategy, and three binary SELM classifiers are trained. DAG depends on a rooted binary directed acyclic graph to make a decision. When an unknown sample reaches the leaf node, the final decision will be made.
(4) ECOC (see [18]). Its training phase is the same as the OAA strategy. One SELM classifies class 1 from classes 2 and 3, a second SELM classifies 2 from 1 and 3, and a third SELM classifies 3 from 1 and 2. Samples from classes 1, 2, and 3 have target codes (1, −1, −1), (−1, 1, −1), and (−1, −1, 1), respectively. Given an unknown sample, the three SELM classifiers should be used to determine the actual output code. The sample is assigned to the class with the closest target code in the Hamming distance sense.
(5) BT (see [19,20]). For three classes (1, 2, and 3), we need two classifiers. For example, the first SELM classifies 3 from 1 and 2, and the second SELM classifies 1 from 2. When an unknown sample is fed into the BT, class 3 is fully separated by the first classifier, and class 1 and class 2 can be classified by the second classifier. Figure 2 shows the three-level wavelet decomposition structure. Because the main frequencies of epileptic EEG signals are below 32 Hz [2], they are preprocessed by a band-pass filter between 0 Hz and 32 Hz. The three-level LDWT decomposes each EEG signal into four subbands, generating the approximation coefficient 3 with the frequency range of 0-4 Hz corresponding to the delta wave and detail coefficients 1 with the frequency range of 16-32 Hz corresponding to the beta wave, 2 with the frequency range of 8-16 Hz corresponding to the alpha wave, and 3 with the frequency range of 4-8 Hz corresponding to the theta wave.

LDWT-Based Feature Extraction.
The -domain transfer functions ( ) and ( ) of the low-pass and high-pass db4 filters are as follows [37]: Computational and Mathematical Methods in Medicine 5 Using lifting scheme [37], the polyphase matrix of db4 wavelet can be factored into lifting steps as follows: ] .
After decomposing the EEG signals into four coefficients, the feature values of the wavelet coefficients are computed to create multidimensional feature vectors. In this work, the maximum, minimum, mean, variance, approximate entropy, sample entropy, autocorrelation, and standard deviation values are extracted as features and input to the three-class classification. However, using all of the features may not improve the classification accuracy but cause high complexity if implemented in hardware design. One objective of this paper aims at reducing the computational complexity while maintaining certain classification accuracy. In order to obtain maximum accuracy with a low-dimensional feature vector under certain conditions, classification accuracy is calculated with 4 different feature dimensions, that is, 4, 8, 12, and 16. For each feature dimension, a great number of combinations of different features are investigated. Eventually eightdimensional feature vectors by computing maximum and standard deviation values of the wavelet coefficients are fed into the three-class SELM classification.

Experimental Results and Discussions
In this section, the EEG datasets are summarized, and the performance of three-class classification is evaluated. The experiment and simulation are conducted with MATLAB R2010a on a 3.30 GHz Intel(R) Core(TM) i5-4590 processor with 4 GB memory.

EEG Data.
The EEG dataset from the University of Bonn, Germany, is used to test the performance [38]. The dataset contains 5 subsets (A-E), which are recorded intracranially on humans for a presurgical evaluation of focal epilepsies, each with 100 single-channel EEG segments. A summary of the 5 subsets is given in Table 1.
Since subsets A, D, and E are used in most of the epilepsy and seizure detection methods [29][30][31], these subsets are also selected to evaluate the proposed three-class classification system. Subset A contains surface EEG recordings from five healthy volunteers with their eyes open, subset D includes intracranial EEG recordings of five patients during seizurefree intervals from within the epileptogenic zone of the brain, and subset E is recorded during the seizures of five epileptic volunteers. The sample frequency of the EEG dataset is 173.61 Hz, and each segment has 4096 points.
In data preprocessing, every segment is divided into 512point sliding time epochs with 256-point overlap between adjacent epochs, the length of each epoch is 2.94 s, and there is an overlap of 1.47 s between adjacent epochs [10]. Overall, 1600 epochs are constructed from each subset for a total of 4800 epochs over the three subsets A, D, and E. Fourfold cross-validation is used to evaluate the performance of the proposed system. In 4-fold cross-validation, 4800 epochs are partitioned into 4 mutually exclusive parts of approximately equal size, and each part is called fold. In each time, one fold is used for testing and the remaining three folds are put together for training. Then the average performance across all trails is calculated.

Performance Evaluation.
The performance of the proposed multiclass SELM system can be evaluated by sensitivity, specificity, and total classification accuracy, which are defined as follows [10,39]: Table 2 presents the confusion matrix of the three-class SELM. S AD represents the sum of epochs from set D and is classified by the proposed system as epochs from set A, and the other parameters can be interpreted similarly. Table 3 shows the detailed definition of the three-class classification measures.

Comparative Study and Results.
In order to achieve a good performance, the tolerance and the parameters of kernel function need to be chosen appropriately. First, we select the tolerance to be 0.001, which can ensure enough high accuracy [16]. In this work, Gaussian kernel and polynomial kernel are selected since they achieve better generalization performance for most applications [16]. Before comparing the five multiclass classifications, parameter combination of cost parameter C and kernel parameter 2 or should be chosen a priori. Taking OAO, for example, the following method is used to find the appropriate parameters C and 2 of Gaussian SELM. The cost parameter C and kernel parameter 2 have different influence on the classification performance of the Gaussian SELM. 2 2 is tuned with 12 different values, that is, 1 , 5, 10, 60, 100, 200, 300, 400, 500, 600, 700, and 800, and C is tuned with 8 different values, that is, 0.1, 0.5, 1, 2, 5, 10, 20, and 30. Using the three subsets, the accuracy of Gaussian SELM with different values of C and 2 2 is shown in Figure 3. Our experiments demonstrate that the classification performance of the Gaussian SELM is not very sensitive to the parameter C within a certain range. We select = 5 for the OAO strategy in this work. However, 2 2 has rather great effect on the epilepsy and seizure detection. In order to further examine this classification effect and determine the appropriate value of 2 2 , the sensitivities using the three subsets 2 2 are shown in Figure 4 when C takes 5. As can be seen from Figure 4, the sensitivities using the three subsets all tend to be constant when 2 2 = 500. But they will decrease if 2 2 is too large, which is not displayed in Figure 4. So 2 2 is set to 500 in OAO strategy. For polynomial SELM, parameter m is tried with 7 different values, that is, 1, 2, 3, 4, 5, 10, and 20, and C is also tuned with 8 different values mentioned before. Using the same method, the appropriate values of parameters C and m of polynomial SELM can be obtained.
We only need to determine the appropriate parameter values for OAO and OAA strategies since the other strategies only differ in combining methods. The used parameter values of C, 2 2 , and m in the five strategies are shown in Table 4.
In order to find out the most efficient three-class classification strategy, the classification sensitivities of the five  mentioned strategies are compared. Moreover, it should be noted that three structures can be used in three-class classification problem in DAG and BT strategies. Figures 5  and 6 show the DAG and BT structures, respectively. In this work, the three-class SELM yields the classification accuracy of 96.9%, 97.2%, and 97.1% using the structures in Figures  5(a), 5(b), and 5(c), respectively, while those values are 96.7%, 96.1%, and 96.6% using the structures in Figures 6(a), 6(b), and 6(c), respectively. Therefore, the structures in Figures  5(b) and 6(a) are selected to compare with other strategies. Tables 5 and 6 show the sensitivities of all the five strategies using Gaussian SELM and polynomial SELM, respectively. It can be found that OAO strategy with Gaussian SELM achieves the highest sensitivity. Therefore, OAO strategy with Gaussian SELM is chosen to study the performance of the multiclass classification in what follows.   Once the OAO strategy with Gaussian SELM has been selected, specificity and total classification accuracy are also calculated to further evaluate the three-class classification. The sensitivity, specificity, and total classification accuracy are given in Table 7. In order to compare the performance between Gaussian SELM and Gaussian SVM, LIBSVM is used for training and testing the Gaussian SVM. Table 8 shows the comparison of Gaussian SELM and Gaussian SVM in classification accuracy, training time, and testing time with the same features and sample data. As can be seen from Table 8, the training and testing time spent by SELM is much shorter than that spent by SVM.

Comparison and Discussion.
In order to further explore the significance of the proposed three-class classification, this section provides the comparisons of our approach with other reported methods and discusses the results of the comparisons. Table 9 summarizes the performance comparison of this work with previous works including binary classification and multiclass classification for epileptic EEG detection. As can be seen from Table 9, a combination of DWT and ELM has been used for binary classification between ictal and interictal EEG signals [22]. However, the feature extraction and classifier in [22] have high complexity if implemented in hardware. This work is the first work to implement the multiclass SELM based on LDWT for epilepsy and seizure detection. As observed from Table 9, this work has the highest accuracy for three-class classification except the one in [30], but it requires easier training process and less storage space than the latter. In addition, its feature extraction has the lowest computational complexity in all the systems in Table 9. Therefore the proposed three-class classification can be successfully used in hardware implementation for portable automatic epilepsy and seizure detection system. In order to compare this work with [32,33], the five subsets (A-E) are classified into three categories. The EEG signals from sets A and B are labelled as the healthy class, the signals from sets C and D are grouped as the interictal class, and the signals from set E are labelled as the seizure class. The performance of the proposed method is also verified using the three categories of EEG signals. The sensitivities of the signals from subsets (A, B), (C, D), and E are 98.1%, 96.3%, and 98.4%, respectively. Table 10 summarizes the accuracy comparison between the proposed method and [32,33] using subsets (A, B), (C, D), and E. As we know, ANN used in [32,33] requires a more extensive training process and complicated design procedure. As shown in Table 10, the proposed system achieves higher accuracy than [32] and similar accuracy to [33] but uses fewer features than [33].
Although the Bonn datasets have been used by many studies to test their EEG analysis algorithms, they have some limitations, one of which is that the Bonn datasets about epilepsy patients are obtained by using intracranial electrodes [27]. Considering that the intracranial recordings is not always available in the clinic [27,32], the open CHB-MIT scalp EEG database [40] is also used to verify the effectiveness of classification algorithms in some studies. As is known, the CHB-MIT scalp EEG database is collected from epilepsy patients and therefore only includes the seizure class and the seizure-free class.  Figure 6: Three BT structures generated for the three-class problem.
EEG database which includes the above three classes and gets widely used. Considering that the proposed three-class classifier is composed of binary classifiers, the CHB-MIT database is also chosen to verify the effectiveness of the SELM for scalp EEG signals, in which Channels FP1-F7 covering the frontal region of the brain are selected. Table 11 summarizes the comparison between the binary SELM classifier and the existing literature using the CHB-MIT scalp EEG database. As can be seen from Table 11, the sensitivity and specificity of the binary SELM classifier are 81.1% and 98.3%, respectively. The classifiers in [34,35] require more extensive training and complicated design than SELM if implemented in hardware.
From the experiments and discussions, with the advantages of high enough classification performance, low complexity, and easy training process, the proposed three-class classification system exhibits excellent practical value especially in the future hardware implementation for portable automatic epilepsy and seizure detection system.

Conclusion
Automatic EEG detection system is of great significance for epilepsy diagnosis. A three-class classification system based on LDWT and the SELM is designed to detect epilepsy and seizure for the first time. A lifting-based db4 wavelet transform is introduced to speed up the computation of feature extraction. After optimizing the parameters of Gaussian kernel and polynomial kernel, the performances of the five multiclass SELM strategies are compared, and the majority voting-based OAO strategy with Gaussian SELM is chosen for the three-class classification because of its highest accuracy. The three-class classification system is tested using the publicly available epilepsy dataset including normal, seizure activity, and seizure-free EEG signals. Simulation results show that the designed system achieves high enough classification accuracy by combining LDWT and the SELM. In addition, this system reduces training and testing time by decreasing computational complexity and feature dimension.   It is a valuable system for future hardware implementation of automatic multiclass EEG classification.

Conflicts of Interest
The authors declare that there are no conflicts of interest related to this paper.