Comparison of Baseline Cepstral Vector and Composite Vectors in the Automatic Seizure Detection Using Probabilistic Neural Networks

Epileptic seizures are abnormal sudden discharges in the brain with signatures manifesting in the electroencephalogram (EEG) recordings by frequency changes and increased amplitudes.These changes, in this work, are captured through traditional cepstrum and the cepstrum-derived dynamic features. We compared the performance of the traditional baseline cepstral vector with that of the two composite vectors, the first including velocity cepstral coefficients and the second including velocity and acceleration cepstral coefficients, using probabilistic neural network in general epileptic seizure detection. The comparison is tried on seven different classification problems which encompass all the possible discriminations in the medical field related to epilepsy. In this study, it is found that the overall performance of both the composite vectors deteriorates compared to that of baseline cepstral vector.


Introduction
Epilepsy, a chronic neurological disorder in which patients suffer from recurring seizures, affects 1-3% of the world population [1].It is characterized by the occurrence of recurrent unprovoked epileptic seizures, which are episodic, rapidly evolving, and temporary events.For most of the patients, seizures occur suddenly and unexpectedly without any prior external precipitants.The unforeseen nature of these seizures makes the daily life of patients miserable with temporary impairments of perception, speech, memory, motor control, and/or consciousness and sometimes may lead to enhanced risk of injury and/or death.Epilepsy can be controlled but not cured with antiepileptic medication.Long-term electroencephalogram (EEG), lasting as long as several days, is required clinically to diagnose, monitor, and localize the epileptogenic zone [2].The epileptic brain can be considered to function in one of the two states: interictal state with occasional transient waveforms, as isolated spikes, sharp waves, or spike-wave complexes and ictal (seizure) state with continuous discharge of polymorphic waveforms of varying amplitude and frequency, spike and sharp wave complexes, rhythmic hypersynchrony, or electrocerebral inactivity observed over a duration longer than average duration of these abnormalities during interictal intervals [3].The EEG during seizure is significantly different from that of the interictal state and that of a normal subject.The traditional methods rely on well-trained neurophysiologists who visually inspect the entire lengthy EEG signals, which is tedious, time consuming, and prone to error.Therefore, many automated epileptic detection systems have been developed using different approaches in the recent years [4].Such automated systems reduce the time taken to review offline the long-term EEG recordings significantly and facilitate the neurologist to diagnose and treat more patients in a given time.This implies that the selected feature set must be such that, besides accuracy in seizure detection, the processing time must be very short.However, the wide variety of EEG patterns that characterize the nature of seizures, such as spikes and waves, low-amplitude desynchronization, polyspike activity, and rhythmic waves for a wide range of frequencies and amplitudes, tend to increase the complexity of the automated seizure detection problem.Epileptic seizure analysis can be divided into three categories: (1) epileptic seizure detection, (2) epileptic seizure prediction, and (3) epileptic seizure origin localization [5,6].The epileptic seizure detection methods, usually, aim to detect patterns in EEG recordings that are a manifestation of an epileptic seizure.The entire procedure of methods developed for automated epileptic seizure detection can be subdivided into two stages, namely, (i) feature extraction and (ii) classification [7].We have adopted this approach for epilepsy detection.
The issue of selecting an optimal set of relevant features plays an important role in developing a good classification system, particularly when using pattern recognition paradigm.A general thumb rule is to use those features which capture those aspects of the time series which are relevant for discriminating between the classes.To meet higher accuracy, it is not adequate if we have the best pattern classification system.It is found that performance of most classifiers deteriorates when some of the selected features are redundant.Thus, it is important that the selected features must be screened for redundancy and irrelevancy.Also, the number of extracted features must be small.Otherwise, it will add onto computational overheads and a longer processing time.Therefore, the issue of pattern classification reduces to a problem of classification with the smallest number of extracted features many times.Different methods have been used to extract diverse features, including those which capture frequency, energy, and structural content of the signal, for the task of epileptic seizure detection [8][9][10][11].However, there are not many studies which have explored to a sufficient depth the conventionally used features in other domains of signal processing, for example, the long and thoroughly used features such as cepstral coefficients, being tried for seizure detection.Preliminary research has looked at the application of cepstrum to neonatal EEG signals for seizure detection [12,13] and to extracellular neural spike detection [14].Temko et al. used 4 sets of diverse features (55 baseline features, 15 log filter-bank energies, 15 cepstral coefficients obtained from log filter-bank energies, and 15 frequencyfiltered band energies) on 17 neonatal seizure patients and arrived at performance rates of 96.3%, 91.9%, 93.1%, and 93.2%, respectively [12].The EEG was divided into 8-second epochs with 50% overlap between epochs.In another attempt by the same authors on the same 17 neonatal seizure patients, 6 spectral envelope feature sets, which included linear filterbank energies (linFBE), relative filter-bank energies (relFBE), log filter-bank energies (logFBE), cepstral coefficients derived from log filter-bank energies (CC), frequency-filtered band energies (FF), and relative spectral difference (RSD), were used [13].The EEG was split into 8-second epochs with 50% overlap between epochs.They found the 3 spectral slope feature sets (CC, FF, and RSD with performance rates of 93.1%, 93.2%, and 93.1%, resp.) to outperform when compared to the 3 spectral power feature sets (linFBE, relFBE, logFBE with performance rates of 89.3%, 86.8%, and 91.9%, resp.) using an SVM classifier with Gaussian kernel.Johnson et al. have tried real cepstral features to discriminate between nonseizure and seizure EEG states in 22 pediatric patients (17 females aged 1.5-22 years and 5 males aged 3-22 years) using machine learning approach [15].They used 10-second sliding window with a 3-second shift to extract the first 12 cepstral coefficients from each EEG window.A spade aggregator operator was used to improve the performance of the system.They found minimum classification algorithm together with standard Gaussian mixture model to perform well with an overall recognition of 91.7%.Instead of using only static scores, one can also extract dynamically relevant features from the already available information.In speech analysis and recognition when speech feature vector is assembled, it is common to include the temporal derivatives of static features (velocity and acceleration features), both in clean and noisy conditions, to achieve higher performance.The velocity and acceleration coefficients correspond to the first and second derivatives, respectively, of the time trajectory of the cepstral coefficients.Excepting few cases, in all the cases, the dynamic features have enhanced the performance.In this work, we investigate and compare the performance of the baseline cepstral vector (comprising 9 cepstral coefficients) with that of the two composite vectors (first comprising 9 cepstral coefficients and 9 velocity (delta) cepstral coefficients, while the second comprising 9 cepstral coefficients, 9 velocity (delta) cepstral coefficients, and 9 acceleration (delta-delta) cepstral coefficients) to discriminate the general EEG database provided by Andrzejak et al. [16] into normal, seizure-free, and seizure classes using probabilistic neural networks (PNN) and accounting for the challenge of unbalanced data sets.To the best of our knowledge, this is the first study where cepstrum and cepstrum-derived dynamically relevant features are applied and investigated for unbalanced general EEG data classification.Also, no other work addresses all the seven classification problems discussed later, which encompass all the possible discriminations in the medical field related to epilepsy.We also compare the performance of our approach with that of other researchers who had used the same database by Andrzejak et al. [16].As such, there is no well-established method to select an optimal network for classification.
There are two variants in the approach adopted in automated detection of seizures.The first is based on a set of heuristic rules and thresholds.The second is based on classifier which employs pattern recognition techniques.In the former approach, the results depend upon a single operating point, and hence there is no much control over the accuracy.On the other hand, the latter permits the classifier to adapt to the desired performance and meet the requirements.Hence, we go in for the latter approach.The rationale behind choosing PNN is that (1) the earlier literature shows that PNN is a more suitable classifier in medical applications as it uses Bayesian strategies, an approach familiar to medical decision makers [17]; (2) PNN is also suitable from the point of view of its high speed, high accuracy, and real-time property in updating network structure [18].

EEG Records.
The EEG data used for this work is from University of Bonn EEG database which is available in public domain [16].The choice of this database is based on the rationale that many seizure detection methods have employed this database and it becomes easy to compare the end results.The database consists of five sets (designated Z, O, N, F, and S) each containing 100 single channel EEG segments of 23.6-second duration.These segments have been picked from continuous multichannel EEG recordings after removal of any artifacts, like, muscle activity or eye movements, making sure that they fulfilled stationarity requirements.Sets Z and O contain segments taken from surface EEG recordings acquired from five healthy volunteers using a standard 10-20 electrode placement scheme.The subjects were awake and relaxed with their eyes open for set Z and eyes closed for set O, respectively.The segments for sets N, F, and S were acquired from five epileptic patients undergoing presurgical diagnosis.The type of epilepsy identified was temporal lobe epilepsy with the epileptogenic focus as the hippocampal formation.These recordings were taken from intracranial electrodes as they offer the most precise access to the emergence of seizures.Sets N and F contained only activity measured during seizure free intervals (interictal epileptiform activity), with segments in set N recorded from hippocampal formation of the opposite hemisphere of the brain and those in set F recorded within epileptogenic zone.On the other hand, set S contained only seizure activity (ictal intervals), with all segments recorded from sites exhibiting ictal activity.The patients had attained complete seizure control after resection of one of the hippocampal formations which was confirmed to be the epileptogenic zone.All the EEG signals were recorded using the same 128-channel amplifier system using an average common reference.The data were digitized at 173.61 samples per sec with 12-bit resolution.The bandpass filter setting was at 0.53-40 Hz (12 dB/octave).Each single channel EEG segment has 4096 samples.
In this work, we handle seven different classification problems proposed by Guo et al. [19] and Tzallas et al. [20] to encompass all the possible discriminations in the medical field related to epilepsy and compare the performance of our approach with those of other researchers.
(1) In the first classification problem, two classes are examined, normal and seizure.The normal class includes only set Z, while seizure class includes set S.
In this classification problem, 200 EEG segments are included.
(2) In the second classification, two classes, namely, nonseizure and seizure, are examined, but not all sets are used.The nonseizure class includes sets Z, N, and F, while seizure class includes set S. In this classification problem, the dataset includes 400 EEG segments.
( The first three classification problems were proposed by Guo et al. [19]; all others except sixth classification problem were proposed by Tzallas et al. [20], while the sixth one is proposed by us.These classification problems have been chosen such that they are close to clinical applications. EEG signals tend to be arbitrary in nature, and with some epileptic conditions, the frequency of the signal can change drastically with time depending upon the severity of the condition.In particular, during seizure, the frequency components of the EEG signal become extremely erratic and unpredictable.To reduce the edge effects, a Hanning window was preferred for frequency/cepstral analysis of such signals.Empirically, we found that an analysis window length,  ≥ 900 samples (5.18 seconds), a spread constant,  ≤ 0.1 for PNN, and a number of cepstral coefficients,  ≥ 9, lead to optimum results.Sections discusses how these constraints have been arrived at.

Cepstrum Derived from Log Magnitude Spectrum.
Cepstrum analysis is a nonlinear signal processing technique with a variety of applications in areas such as speech and image processing.Among the speech recognition approaches, the family based on cepstrum has been prominent due to its performance and simplicity.Cepstrum models a time evolving signal as an ordered set of coefficients representing the signal spectral envelope.This in fact is a curve passing close to the peaks in the original spectrum.The cepstrum, though a compact representation, has been found to capture most of the relevant information in the original time series.It is possible to compare two relatively long time series with only a few cepstral coefficients.This implies that if two cepstral series are close, then the corresponding signals have a similar evolution in time.
The real cepstrum is defined as the inverse Fourier transform of the log magnitude spectrum as given by where   [k] represents kth order real cepstral coefficient.If the inverse Fourier transform is replaced by discrete cosine transform (DCT), the resulting equation becomes where C[k] represents kth order pseudo-cepstral coefficient.
The advantages are that (1) DCT has better energy compaction properties than the DFT and hence decreases memory requirements; (2) it reduces the computational complexity drastically without degrading the information content ISRN Biomedical Engineering in the cepstrum and hence decreases execution time; and (3) DCT produces highly uncorrelated features.The resulting sequence of coefficients C[k], called pseudo-cepstrum, is an approximation to the cepstrum and, in reality, simply represents an orthogonal and compact representation of the log magnitude spectrum.The difference between cepstral coefficients of different time series can serve as a similarity measure among these time series.The cepstral coefficients decay rapidly to zero, and hence only the first few coefficients are needed to capture most of the dynamic information in the time series.This property of cepstral coefficients helps in reducing the dimensionality.Also, the number of coefficients to be retained does not depend upon the length of the time series.Moreover, the higher order coefficients represent the excitation process which is less useful.The coefficient C[0] is similar to log energy (or DC component) of the signal and represents the segment energy.It is, usually, not treated as a cepstral coefficient, and, in this study, we drop C[0].

Dynamic Features: Velocity and Acceleration Coefficients.
The cepstral features mentioned above describe only the spectral envelope and do not contain any temporal information.To incorporate ongoing changes over multiple segments, dynamic features (time derivatives) are added to complement the basic cepstral feature.The first and second derivatives of the time trajectory of the cepstral coefficients are usually called velocity (delta) cepstral coefficients and acceleration (delta-delta) cepstral coefficients, respectively.These coefficients capture information about the temporal evolution of the basic cepstral features.The velocity coefficients are computed using the linear regression of the cepstral coefficients through the following formula: where 2 + 1 is the size of the regression window and C[m] is the mth cepstral coefficient.The acceleration coefficients are computed using linear regression of the velocity coefficients.It has been shown that appending the velocity and acceleration coefficients to the original feature vector usually enhances the performance.In this study, three feature vectors are derived from each EEG segment: (1) a baseline vector consisting of only 9 static cepstral coefficients; (2) a composite vector consisting of 9 cepstral coefficients and 9 velocity cepstral coefficients; and (3) a composite vector consisting of 9 cepstral coefficients, 9 velocity cepstral coefficients, and 9 acceleration cepstral coefficients.

Probabilistic Neural Network (PNN).
The recent research activities which use neural networks for classification have established that neural networks can be a promising alternative to conventional methods of classification.The main advantage of neural network lies in the fact that it makes use of self-adaptive techniques to adjust to the data without any explicit specification.PNN network provides a general solution to pattern classification problems by employing Bayesian decision strategy.An artificial intelligence-based classifier is a mapping from the feature space to discrete class space (f : R  → Z  ).An artificial neural network (ANN) implements such a mapping using a group of artificial neurons trying to simulate the brain.An ANN can be trained to arrive at anticipated classification against input and output streams, so that there is no need for a specified classification algorithm.PNN is a kind of distance-based ANN which uses a bellshaped activation function.This makes the decision boundaries nonlinear so that it can approach Bayesian optimal [17,18].
The PNN that we use has three layers: input layer, radial basis layer, and competitive layer.When an input is applied, the first or input layer computes the distance from the input vector to the training vector and produces a vector whose elements indicate how close the input vector is to the training vector.The second or radial basis layer sums up these contributions for each class of input to result in a net output vector, which is probabilistic.Finally, the competitive layer picks the maximum of the probabilistic vector and produces an output that is a "1" for that class and "0"s for other classes.More details on PNN are available in [17,18].Distance-based classifiers demand normalization of the data, and hence feature vectors are normalized before they are applied to PNN.
We used MATLAB tool to implement the PNN.The program output is a confusion matrix, which shows the percentages of correct and incorrect classifications.In particular, it shows up the following details: fraction of input misclassified; percentage of false negatives (FN), false positives (FP), and true positives for the class (TP) and outof-class (TN), for each class; and overall accuracy from which the diagnostic results for each classification can be computed.The size of the confusion matrix depends on the classification problem.The diagnostic results for each class, if necessary, can be obtained in terms of sensitivity, specificity, positive predictivity, negative predictivity, and accuracy, from the preliminary results from confusion matrix, using the following equations ( In this study, we have used only overall accuracy, which is the end result of confusion matrix.
From the MATLAB manual, for the case of PNN, the default value of spread constant is found to be  = 0.1.While using PNN, one can start with this value, and the optimum value can be found empirically based on the best overall accuracy as explained later.

Results and Discussion
To arrive at near-optimal window length, first we study the impact of window length, W, on the overall accuracy on the abovementioned seven different classification problems.Initially, assuming a spread constant of  = 0.1 (default value for PNN) and a number of cepstral coefficients  = 12 are sufficient to capture the spectral differences of different EEG data sets, we compute the overall accuracy for several sliding window length, , (with 50% overlap between consecutive windows) of EEG from normal, non-seizure and seizure groups in each classification problem.Table 1 shows the details of overall accuracy for different window length, W, in each classification problem.It is found that as  is increased from 700 to 1100 samples, the overall accuracy also increases and then saturates at 100% for window length,  ≥ 900 samples.Hence, 900-sample sliding window with 50% overlap between consecutive windows is used with every EEG segment from each dataset.This length fulfills the stationarity requirements of the EEG signal as well.
It is essential that the spread constant of PNN is determined empirically, based on overall accuracy, for each classification problem (CP).Figure 1 shows plots of overall accuracy of PNN when spread constant is varied from 0.01 to 1.0, in the seven classification problems.As illustrated in Figure 1, the highest accuracy is achieved in all the seven classification problems when spread constant is in the range 0.01 and 0.1.In this study, therefore, we set  = 0.05 (shown by vertical dashed line).Next, with the sliding window length fixed at 900 samples and with 50% overlap between consecutive windows, we compute the overall accuracy when the number of cepstral coefficients, N, is decreased from 12 to 7. Table 2 shows the details of overall accuracy for decreasing number of cepstral coefficients in each classification problem.It is found that as  is decreased from 12 to 7, the overall accuracy remains constant at 100% for  ≥ 9 and then drops for  < 9. Hence, in this work, we have chosen the number of cepstral coefficients  to be equal to 9.
We now compare the results of the performance of the traditional baseline cepstral vector with those of the two composite vectors, first including velocity vector and second including both velocity and acceleration vectors, using probabilistic neural network in general epileptic seizure detection.The comparison is tried on each of the abovementioned seven Table 3: Descriptive results of PNN analysis using composite cepstral vector (including velocity coefficients) for discriminating different classification problems for  = 900,  = 9, and  = 0.05.

Classification problem
Average different classification problems which have been widely used in the literature related to epilepsy.Typical EEG segments, one from each dataset (in the order Z, O, N, F, and S), are shown in Figure 2. Figures 3, 4, and 5 show the first 9 static cepstral, velocity, and acceleration coefficients, for the same EEG segments shown in Figure 2, in the same order.Descriptive results of PNN analysis using baseline cepstral vector for discriminating different classification problems with  = 900,  = 9, and  = 0.05 are depicted in Table 2.
It is found that the baseline cepstral feature vector shows the best performance (all the diagnostic parameters equal to 100%) in all the cases.The results of PNN analysis using composite cepstral vectors for discriminating different classification problems with  = 900,  = 9, and  = 0.05 are shown in Tables 3 and 4. The first composite vector, which includes velocity vector together with the static cepstral vector, demonstrates a reduction in the overall accuracy in discriminating the EEG segments in different classification problems as seen from Table 3.The second composite vector, which includes velocity and acceleration vectors together with the static cepstral vector, exhibits a greater decline in the overall accuracy in discriminating the EEG segments in different classification problems, which is evident from Table 4.It is interesting to note that the baseline cepstral vector alone showed the best performance.The composite vectors, instead of at least maintaining the best performance, showed a degraded performance.This implies that the velocity and acceleration features are hurting the performance, probably because of the nonlinearities introduced in the EEG significantly affecting   Various researchers have proposed different methods for epileptic seizure detection using the database by Andrzejak et al. [16].Table 5 provides a comparison between our method and other methods that have used the same database [20][21][22][23][24][25][26][27][28][29][30][31].In Table 5, we present a listing of the method, dataset used, and classification accuracy for the seven classification problems.It is to be noted that all the methods shown in the table, including us, had used modern classifiers for first training and then classification.In the first classification problem, the results obtained by Tzallas et al. [20], Subasi and Gursoy [21], Wang et al. [24], Iscan et al. [25], Orhan et al. [26], and our method are the best (100%).In the second problem, our method shows the best results (100.0%).For the third classification problem, the result found by Orhan et al. [26] and us are the best (100%).For the fourth, fifth, and seventh classification problems, only our method showed an average accuracy of 100.0%.In the sixth classification problem, the new classification problem appended by us in this paper, the results are excellent (100%).All these results collectively show a tremendous improvement in our approach over all the previous epilepsy detection methods.The previous comparison also implies that an automated system developed based on this approach should provide feedback to the experts for quick and accurate EEG classification.

ISRN Biomedical Engineering
The database used has already been preprocessed by the removal of artifacts by visual inspection.This is a limitation of our method (like many who have used the same database).Nevertheless, the results of this study provide sufficient evidence to warrant the assessment under actual clinical situations that can provide more robust confirmation of the application of this approach to capture diagnostically significant information.Hence, the method is well suited for implementation not only in epilepsy detection system but also in applications, such as seizure warning systems, closed loop seizure control systems, or delivering abortive responses/monitoring patients using implantable therapeutic devices [31].

Conclusion
A comparison of the EEG epileptic seizure detection based on baseline cepstral vector and composite cepstral vectors comprising velocity and acceleration features is presented.The chief finding of this study is that, in the discrimination of EEG, the composite cepstral vectors do not perform on par with the baseline vector.In the literature, it is found that, in the applications, such as speech analysis and recognition, the velocity and acceleration features do enhance the performance.However, this study shows that, in the case of EEG discrimination, the velocity and acceleration features are hurting the performance.Further, the results show that the baseline cepstral vector is suffice to discriminate general EEG signals in a variety of classification problems close to clinical applications.An automated system developed based on this method should provide feedback to the clinical neurophysiologists for quick and accurate EEG discrimination.Such discrimination is important in some applications, such as seizure warning systems, closed loop seizure control systems, or delivering abortive responses/monitoring patients using implantable therapeutic devices.

7 Figure 1 :
Figure 1: Determination of spread constant of PNN for the seven different classification problems (CP) based on overall accuracy of PNN.

Figure 2 :
Figure 2: Typical EEG segments from each of the five sets (Z, O, N, F, and S) from top to bottom.

Figure 3 :Figure 4 :
Figure 3: The first 9 static cepstral coefficients for the same EEG segments shown in Figure 3 in the order Z, O, N, F, and S.

Figure 5 :
Figure 5: The first 9 acceleration coefficients (deltas-deltas) for the same EEG segments shown in Figure 3 in the order Z, O, N, F, and S.

Table 1 :
Effect of window length  on overall accuracy in different classification problems for  = 12 and  = 0.1.

Table 2 :
Effect of baseline cepstral vector length  on overall accuracy in different classification problems for  = 900 and  = 0.05.

Table 4 :
Descriptive results of PNN analysis using composite cepstral vector (including velocity and acceleration coefficients) for discriminating different classification problems for  = 900,  = 9, and  = 0.05.

Table 5 :
A comparison of classification accuracy achieved by our method and the best performed others method for seven classification problems (CPs).