Prediction of Epileptic Seizure by Analysing Time Series EEG Signal Using k-NN Classifier

Electroencephalographic signal is a representative signal that contains information about brain activity, which is used for the detection of epilepsy since epileptic seizures are caused by a disturbance in the electrophysiological activity of the brain. The prediction of epileptic seizure usually requires a detailed and experienced analysis of EEG. In this paper, we have introduced a statistical analysis of EEG signal that is capable of recognizing epileptic seizure with a high degree of accuracy and helps to provide automatic detection of epileptic seizure for different ages of epilepsy. To accomplish the target research, we extract various epileptic features namely approximate entropy (ApEn), standard deviation (SD), standard error (SE), modified mean absolute value (MMAV), roll-off (R), and zero crossing (ZC) from the epileptic signal. The k-nearest neighbours (k-NN) algorithm is used for the classification of epilepsy then regression analysis is used for the prediction of the epilepsy level at different ages of the patients. Using the statistical parameters and regression analysis, a prototype mathematical model is proposed which helps to find the epileptic randomness with respect to the age of different subjects. The accuracy of this prototype equation depends on proper analysis of the dynamic information from the epileptic EEG.


Introduction
Epilepsy is a long-lasting neurological disorder categorized by repeated, gratuitous seizures, electrophysiological disturbances in the human brain which may range from brief gaps of attention or muscle bumps to severe and prolonged seizures. Epileptic seizures are the visible or apparent manifestations that are produced when the brain briefly becomes dysfunctional because of abnormal paroxysmal discharge of the nerve cells in the cerebral cortex [1][2][3]. Alternately, epilepsy is a group of neurological disorders characterized by epileptic seizures [4,5]. Epileptic seizures are incidents which may be varied from brief and nearly undetectable to long periods of vigorous shaking [6]. In epilepsy, seizures tend to recur and have no immediate underlying cause while seizures that occur due to a specific cause are not deemed to represent epilepsy [4,7]. Characteristics of seizures vary and depend on where in the brain the disturbance first starts and how far it spreads. Temporary symptoms occur, such as loss of awareness or consciousness and disturbances of movement, sensation (including vision, hearing, and taste), mood, or other cognitive functions. Figure 1(a) represents normal neuronal-ion-channel function and in this section the membrane resting potential is −70 mV which is due to the sodium and potassium channels as a primary requirement of action potential.
The sodium and potassium channels are associated with a depolarizing phase which occupy the medium position by sodium channel opening and a repolarizing phase due to potassium-channel opening and sodium-channel inactivation. On the other hand, remaining potassium channels contribute to a longer-term repolarization that acts as the prevention of repetitive excitation of the neuron. In Figure 1(b), mutations in SCN1B, which encode a voltagegated sodium-channel subunit, are associated with generalized epilepsy with febrile seizures plus [4]. The movement of an increased amount of sodium current, which would lead to a greater depolarization during the action potential and an increased tendency to excite repetitive bursts is the outcome  of apparent mutations. Similarly, in Figure 1(c), mutations in KCNQ2 and KCNQ3 will occur in both the potassium and sodium electrodes where encoding of potassium channels occur which are related with benign ancestral neonatal spasms. People with seizures can be injured, have fractures or bruises more frequently than controls, or have higher rates of psychological problems like anxiety or depression which causes more physical problems (such as fractures and bruising from injuries related to seizures). Similarly, the risk of premature death in people with epilepsy is up to 3 times higher than the general population, with the highest rates found in low-and middle-income countries and rural versus urban areas. A great proportion of the causes of death related to epilepsy in low-and middle-income countries are potentially preventable, such as falls, drowning, burns, and prolonged seizures [8][9][10]. There are more than 30 different forms of epilepsy and more than 40 different types of seizures [2]. According to a report of the World Health Organization (WHO) [11], around 50 million people worldwide have epilepsy. Around 90% of them are from developing countries and one-fourth of them do not have access to medication. Epilepsy cannot be cured, but it can usually be controllable with medication. For initial treatment of epilepsy, antiepileptic drugs (AEDs) are used [12]. Epilepsy is not transmissible. The idiopathic epilepsy is the most common type of epilepsy, which may affect 6 (out of 10) people with the disorder, and it has no detectible cause. Epilepsy which may take place due to known cause is called secondary epilepsy or symptomatic epilepsy. The major causes of secondary epilepsy [11] might be as follows: (i) The brain may get impairment from injuries (ii) Inherited abnormalities with associated brain defects (iii) A severe head injury (iv) Stroke may limit the amount of oxygen to the brain (v) Some infection like meningitis and encephalitis of the human brain (vi) A brain tumour which creates more randomness.
There are several methods to diagnose epilepsy such as electroencephalography (EEG), magnetic resonance imaging (MRI), functional magnetic resonance imaging (fMRI), single-photon emission computed tomography (SPECT), positron emission tomography (PET), and magnetoencephalography (MEG). As EEG has speed, high time resolution, and noninvasive advantages, still now it remains one of the most useful and effective tools in the treatment of epilepsy. Prediction of epileptic seizure based on EEG signals can be separated into three classes: time domain, frequency domain, and the nonlinear methods [13]. In recent times, seizure is detected from the recorded seizures in order to quantify the clinical image and propose video-based seizure recognition. In some papers, information-based measure are also proposed for the detections of epileptic seizure [14]. Entropy is a measure of rate of information that may be used in the signal processing for the detection of noise where a higher value corresponds to increased unpredictability while a lower value corresponds to higher predictability [15]. In our proposed research, we use six features for the classification, and among these features, entropy has the higher ranked features that is used for the regression model for prediction of level of epilepsy.

Mathematical Background of Classifier and Statistical Features
Mathematical background for the classifier (k-NN) and statistical features (approximate entropy (ApEn), standard deviation (SD), standard error (SE), modified mean absolute value (MMAV), roll-off (R), and zero crossing (ZC)) are described below.
2.1. k-Nearest Neighbours (k-NN). The k-nearest neighbours (k-NN) algorithm is a nonparametric learning algorithm mechanism mainly used for the classification of signal pattern or pattern recognition as shown in Figure 2(a). The major goals of this mechanism are to assign to an unseen point the leading class among its k-nearest neighbours within the training sets of data [16,17]. Among all of the method of classification like support vector machine (SVM), artificial neural network (ANN), linear discriminant analysis (LDA), naive Bayes (NB), and RBF neural network (RBFNN), k-NN is the best classifier statistical pattern recognition or neighbour cluster selection as shown in Figure 2(b) due to its consistently high performance, without a priori assumptions. The k-NN classifier extends this idea by taking the k-nearest points and assigning the sign of the majority [18]. The positive integer "k" indicates how many neighbours guide the classification. The default value k = 1 is called the nearest neighbour algorithm. In the classification analysis, k-NN is the supervised learning algorithm [19,20]. The learning algorithm of k-NN for the classification of any data set X is described below step by step.
(1) Consider that training categories is the column vector of training set. If there are i numbers of categories in a training set which is denoted by C 1 , C 2 , C 3 , …, C i . The summation makes m-dimensional feature vector.
(2) The sample data set X should have the same dimensional vector for the proper classification which is denoted by X 1 , X 2 , X 3 , …, X m .
(3) In this state, the similarity between training set and data set should be calculated. Taking jth sample (4) Select the value of k which is larger from N similarity of SIM X, d i i = 1, 2, 3, …, N . Now, the probability function has the following mathematical form: where y d j , C i is the category of attribute function which satisfies the following mathematics: Finally, justification of sample X to categories which have larger value of X, C i .
In the k-NN classifier, the distance between two sets of data points is measured by some distance vectors, which are Euclidean distance, cityblock distance, cosine distance, and correlation distance.
In statistical mathematics, the Euclidean distance is the distance between two points in Euclidean space, which becomes a metric space whose norm form is commonly known as Euclidean norm. The Euclidean distance, d st , is in The distance between two points is the sum of the absolute differences of their Cartesian coordinates known as the cityblock distance which is also known as Manhattan length [21]. Cityblock distance d st is represented in Cosine distance is the distance which is used for the complement in positive space, that is, Correlation distance is the measure of statistical distance between two random variables or two random vectors of arbitrary, not necessarily equal dimension. Correlation distance d st is represented in where x s = 1/n ∑ j x sj and y t = 1/n ∑ j y tj The statistical features used for the classification using k-NN classifier in this research are described below.

Approximate Entropy (ApEn).
ApEn is a statistical feature that indicates the predictability of the current amplitude values of a physiological signal, for example, EEG based on its earlier amplitude. The value of ApEn drops sharply during an epileptic seizure, and this property is used to detect the epileptic seizures. A high value of approximate entropy signifies more irregularity; on the contrary, a low value signifies that the time series is deterministic which reflects the intracortical information flow in the brain when applied to EEG signals [22,23]. The value of ApEn can be calculated by using Mathematical procedures of approximate entropy (ApEn) calculation are described in a flow chart [23,24] in   variable, statistical population, any kinds of data set, or probability distribution is known as the standard deviation (SD) which is also known as absolute deviation. The standard deviation can be defined for any distribution with finite first two moments, which can be measured mathematically by using where N is the number of samples in data sets, x n is the actual value of the nth term in data sets, and μ is the average value of those data sets. The standard error (SE) is define as the standard deviation (SD) of a sample data set which is the estimation of sample mean based on the population mean. SE is the mean which is calculated using standard error SE = standard deviation SD 2.5. Roll-Off (R). Roll-off is the steepness of a transmission function with frequency, particularly used in signal feature extraction. The roll-off can be defined as the frequency below which 85% of the magnitude distribution of the data sets is intense [24]. It is also a measure of spectral shape which can be written mathematically in of times that the amplitude value of data sets crosses the zero y-axis [24]. It can be expressed mathematically in ZC = 〠 n n 1 sgn x n × x n 1 ∩ x n x n 1 ≥ threshold, sgn x = 1, x ≥ threshold 0, otherwise 13 2.7. Regression Analysis. In mathematics, regression analysis is the procedure to find out the mathematical relationship between dependent variables with independent variables. In limited conditions, regression analysis can be used to infer causal relationships between the independent and dependent variables. However, in many applications, especially with small effects or questions of causality based on observational data, regression methods can give misleading results. The function which fits a polynomial regression model by the method of linear least squares is mentioned below.
where Y represents predicted outcome value for the polynomial model with regression coefficients b 1 to b k for the kth order polynomial and Y intercept b 0 .

Proposed Research Architecture
The overall proposed methodology is mentioned in a flow diagram as shown in Figure 4. The epileptic EEG signal is loaded into MATLAB workspace to find out the feature     60% 60% 60% "Cosine" 60% 60% 60% "Euclidean" 60% 60% 60%  is measured to find out the best fitted equation for the interpretation which is most suitable and optimized regression equation for the prediction.

Classification and Clustering
Using k-NN. The epileptic EEG data is processed for the achievement of the feature vector and then a template as mentioned in Table 1 is formed for the train of k-NN network. In Table 1, all columns indicate the normalized features set and each row indicates the subject used for the train of network. In Figure 5(a), all the nearest neighbour is determined by the trained k-NN network in which all the arrows indicate the nearest neighbour where blue squares indicate the train features set and red diamond are the desired points whose nearest neighbour is our goal.
On the other hand, in Figure 5 desired standard feature point is set as a reference and then the k-NN network is trained; its clustering circle is determined around the point of interest. In Figure 5(b), k = 10 nearest neighbour is determined inside the circle to find out the close approximation of epileptic EEG signal using feature vectors (ApEn, MMAV, SD, SE, roll-off, and ZC) for that one feature of vector from the normal EEG data (free from the epilepsy) from the patient is required.  Tables 2 and 3 as well as Figure 6, it is concluded that lower classification rate is found at the "cityblock" when k = 1 and the classifier rule is smallest neighbour (SN).  Figure 7(a). The corresponding regression equation is mentioned in (15). In this equation, if we put the age of the epileptic people, we may be interpreting the degree of randomness of EEG signal. The modification between the predicted value and actual value of the independent value is called the residual which is the measure of accuracy of prediction. The residual of the 3rd-order fitting is shown in Figure 7(b) and its regression equation is (16). From this equation, we may find the error of prediction at any age of the epileptic persons. In a similar manner, 4th-order fittings of the approximate entropy (ApEn) is shown in Figure 8(a). The corresponding regression equation is mentioned in (17). In this equation, if we put the age of the epileptic people, we may be interpreting the degree of randomness of the EEG signal. The residual of 4th-order fitting is shown in Figure 8(b) and its regression equation is (18). From this equation, we may find the error of prediction at any age of the epileptic persons.  Table 4, the error of prediction is shown where the accuracy of prediction (interpretation) is more in the 3rd-order fitting. The 1st-order fitting is a liner fitting like y = mx + c which has more error probability and also it has a larger value of residual than other types of fitting of ApEn. From the table, it is noticed that the increase of order of fitting may reduce the error probability, but after the 3rd-order fitting, the error probability as well as the computational complexity is increased. Hence, optimum prediction equation for the epileptic seizure is the 3rd-order which has less computational complexity and less error probability than the 4th-order fitting. Form the table, it is also remarkable that at the smaller age of the epileptic people, the prediction error is more because at the increasing ages of the epileptic persons the EEG (epileptic) is more severe.

Conclusions
The electrophysiological activity of the brain called EEG signal can analyze for the prediction and diagnosis of epilepsy of the living animals. The epileptic EEG signal is more and more random and this EEG containing epilepsy is not suitable for the perfect brain-computer interface (BCI) paradigms. Hence, prediction of epilepsy is a vital issue in the modern biomedical field of research. For the prediction of epilepsy, a statistical approach was explained in this manuscript. In our research, the epileptic EEG signals for different aged epileptic subjects was analyzed and one of the vital features Approximate entropy (ApEn) was measured which was the indicator of randomness of any time domain signal. The regression equation of ApEn with respect to different ages of the epileptic persons may help the BCI researchers or the neural researcher to predict the randomness, namely, level of epilepsy corresponding to different ages. This may help the clinical person to provide the treatment of the epileptic person after finding the level of randomness.