Recognition of Imbalanced Epileptic EEG Signals by a Graph- Based Extreme Learning Machine

Epileptic EEG signal recognition is an important method for epilepsy detection. In essence, epileptic EEG signal recognition is a typical imbalanced classification task. However, traditional machine learning methods used for imbalanced epileptic EEG signal recognition face many challenges: (1) traditional machine learning methods often ignore the imbalance of epileptic EEG signals, which leads to misclassification of positive samples and may cause serious consequences and (2) the existing imbalanced classification methods ignore the interrelationship between samples, resulting in poor classification performance. To overcome these challenges, a graph-based extreme learning machine method (G-ELM) is proposed for imbalanced epileptic EEG signal recognition. The proposed method uses graph theory to construct a relationship graph of samples according to data distribution. Then, a model combining the relationship graph and ELM is constructed; it inherits the rapid learning and good generalization capabilities of ELM and improves the classification performance. Experiments on a real imbalanced epileptic EEG dataset demonstrated the effectiveness and applicability of the proposed method.


Introduction
Epilepsy is a common neurological disease that can cause recurrent seizures. During seizures, injury or life-threatening events may occur owing to the distraction or involuntary spasms of the patient [1,2]. In the clinical diagnosis of various seizures, electroencephalogram (EEG) signal detection plays a crucial role [3]. This is because the epileptic brain releases characteristic waves during seizures. In recent years, an increasing number of machine learning-based methods have been applied for epileptic EEG signal recognition [4][5][6][7][8]. Figure 1 illustrates a machine learning method-based system for epileptic EEG signal recognition. The figure shows that an epileptic EEG signal recognition system involves the following three main steps: (1) a feature extraction method is used on original epileptic EEG signals for training and testing, (2) EEG signals after feature extraction for training are used to train the machine learning-based model to build an epileptic EEG signal recognition system, and 3) EEG signals after feature extraction for testing are then inputted into the epileptic EEG signal recognition system for detection.
Previously, many machine learning methods have been proposed for epileptic EEG signal recognition, such as the naive Bayes method (NB) [9], K-nearest neighbor (KNN) [10], support vector machine (SVM) [11], fuzzy system [12,13], and extreme learning machine (ELM) [14,15], and they have shown good effectiveness. In essence, epileptic EEG signal recognition is a typical imbalanced classification task [16,17]. Compared with negative samples (people without epilepsy), positive samples (patients with epilepsy) have extremely low representation and cannot be well classified by traditional classifiers. Although the misclassification of positive samples has little effect on the model accuracy, it may cause serious medical malpractice. Therefore, traditional machine learning methods face several critical challenges for recognition of imbalanced epileptic EEG signals: (1) traditional machine learning methods often ignore the imbalance of epileptic EEG signals and misclassify positive samples, which may cause serious medical malpractice, and (2) existing imbalanced classification methods ignore the interrelationship between samples, resulting in poor classification performance. Therefore, building a classifier that considers the imbalance of the epileptic EEG signals and additional knowledge of samples becomes imperative for classification of imbalanced datasets with epileptic EEG signals.
To overcome these challenges, a novel imbalanced epileptic EEG signal recognition method based on a graph and ELM is proposed in this study. ELM has become a classical machine learning method with its solid theoretical foundations, fast training speed, and good predictive performance [18,19]. Although ELM can universally approximate to any continuous functions, it is not effective for classifying imbalanced datasets. Therefore, it is necessary to adopt strategies to make ELM correctly classify positive samples to obtain a reasonable classification result of an imbalanced dataset. Previously, numerous imbalanced ELM-based methods have been proposed. For example, Zong et al. [20] proposed the weighted extreme learning machine (WELM), which pioneered the application of ELM in imbalanced classification. Similarly, Zhang and Ji [21] proposed a fuzzy ELM (FELM), which regulated the distributions of penalty factors by inserting a fuzzy matrix. Yu et al. [22] proposed a special costsensitive ELM (ODOC-ELM) for imbalanced classification problems. Li et al. [23] proposed an ensemble WELM algorithm based on the AdaBoost framework to learn the weights of different samples adaptively. Yang et al. [24] proposed a novel ELM-based imbalanced classification method by estimating the probability density distributions for two imbalanced classes. Shukla and Yadav [25] combined CC-ELM with WELM to propose a regularized weighted CC-ELM.
Xiao et al. [26] proposed an imbalanced ELM-based algorithm for two classes of classification tasks by solving each class classification error. Du et al. [27] proposed an online sequential extreme learning machine with under-and oversampling(OSELM-UO) for online imbalanced big data classification. In addition, some ELM-based imbalanced methods, such as ensemble weighted ELM [28], classspecific cost regulation ELM [29], label-weighted extreme learning machine [30], and class-specific ELM [31], have also been proposed. However, to the best of our knowledge, there is no study that uses imbalanced ELM methods for epileptic EEG signal recognition; therefore, it is necessary to propose such a method for epileptic EEG signal recognition.
In this study, inspired by WELM, we propose a novel graph-based ELM (G-ELM) for imbalanced epileptic EEG signal recognition. First, we use the graph theory to construct a relationship graph of samples according to their data distribution. Then, we combine the relationship graph with ELM to propose G-ELM. The experimental results on a real imbalanced epileptic EEG dataset show that the proposed method can address imbalanced classification of epileptic EEG signals effectively. The main contributions of this study are as follows.
(1) The proposed G-ELM sets the compensation for loss of positive samples to be greater than that of negative samples based on graph theory and then combines with the ELM to classify imbalanced data effectively. It is a novel imbalanced ELM-based method, which attains a good classification performance and inherits the rapid learning and good generalization capabilities of ELM (2) The proposed imbalanced classification method attempts to consider both the imbalance and interrelationship of epileptic EEG samples to obtain better performance for imbalanced epileptic EEG signal recognition. It can be utilized for imbalanced epileptic EEG signal recognition. It not only realizes effective classification of imbalanced epileptic EEG signals from a new perspective but also expands application of ELM-based algorithms (3) We use six imbalanced classification evaluation indices, i.e., accuracy, precision, recall, F-measure, G_ means, and AUC, to compare the performance of the proposed G-ELM and the existing imbalanced ELM-based methods. Extensive experiments on a real imbalanced epileptic EEG dataset indicate that the proposed method can address imbalanced epileptic EEG signal recognition effectively and outperform the existing imbalanced ELM-based methods The rest of this paper is organized as follows. Section 2 introduces the background underlying the proposed epileptic EEG recognition method. In Section 3, the details of the proposed G-ELM are presented. The performance of the proposed method is evaluated with several comparative methods in Section 4. The conclusions of this paper are provided in Section 5.  Figure 1: Illustration of the machine learning method-based system for epileptic EEG signal recognition.

Background
In this section, we briefly describe the background related to the proposed epileptic EEG signal recognition method. It includes the epileptic EEG dataset, the feature extraction methods, and the classical ELM, which are used for epileptic EEG signal detection.

Feature Extraction.
Many studies [33][34][35] have shown that the original EEG signals cannot be directly used for training machine learning-based models and that feature extraction is a necessary step. This is because the original EEG signals are usually high dimensional, stochastic, nonstationary, and nonlinear and the background noise in the original signals is very complex. The commonly used feature extraction methods can be divided into three main categories: time domain analysis, frequency domain analysis, and time-frequency analysis. Time domain analysis-based methods extract the features by analyzing the characteristics of original EEG signals, such as mean, variance, amplitude, and kurtosis [36]. Frequency domain analysis-based methods usually analyze the EEG signals in the frequency domain to extract the features, such as fast Fourier transforms [37] and short-time Fourier transforms [38]. As for timefrequency analysis methods, the information of time and frequency domain is considered simultaneously to extract the features from original epileptic EEG signals. Typical timefrequency analysis-based methods are wavelet transform  3 Wireless Communications and Mobile Computing methods [39,40]. In this paper, we use the wavelet packet decomposition [40] for feature extraction from original epileptic EEG signals to simultaneously utilize the information of time and frequency domain. [19], which was first proposed by Huang et al., is a single-hidden-layer feedforward neural network [41]. It can directly optimize the output weight of the hidden layer by setting the number of hidden nodes, without paying attention to the weight and offset of the input layer, which can be generated randomly. Compared with other traditional supervised learning methods, it has good generalization ability and high learning speed. Figure 3 shows the network structure of an ELM.

ELM. ELM
ELM considers both empirical and structural risks, and its objective function is as follows: where represents the hidden layer feature matrix, where h i ðx j Þ = g ðA ðiÞ x j + b i Þ, A ðiÞ represents the ith row of the weight matrix 2, ⋯, n denotes the training samples, n is the number of training samples, d is dimension, and m is the number of hidden nodes; ε = ðε 1 , ε 2 ,⋯,ε n Þ T is the error matrix between the network outputs and the target outputs. C is a penalty parameter, which can adjust the accuracy and generalization ability of the ELM. The optimization problem in (1) can be solved based on the Karush-Kuhn-Tucker theory. The output weight of ELM can be calculated by

Graph-Based Extreme Learning Machine
In this section, a graph-based ELM (G-ELM) is proposed. We first introduce the relationship graph of an imbalanced dataset and then develop the proposed imbalanced classification method G-ELM by combining the relationship graph with an ELM.
3.1. Relationship Graph of an Imbalanced Dataset. In the context of imbalanced classification problem, the relationships between the training samples can be regarded as an undirected graph. Undirected graph can be expressed as G = ðV, EÞ, where V is the vertex set of graph G and E is the edge set of graph G. Figure 4 shows an undirected graph of an imbalanced synthetic dataset with 7 samples, where 2 positive samples are represented by a blue circle and 5 negative samples are represented by a red star. All samples are numbered for subsequent display. Note that there are connections between samples in different classes and the weight is 1. Samples in the same class are not connected.
The elements of an adjacency matrix W can be defined as follows: Here, y i ∈ Y is the label of x i . According to the above definition of the adjacency matrix W, we can see that the distance of the samples in the same class can be considered 0. For samples in different classes, the distance between them can be considered 1.

Wireless Communications and Mobile Computing
Then, the relationship graph matrix can be expressed as where D = diag ðW ⋅ 1 n×1 Þ is the degree matrix; 1 n×1 stands for a vector with n × 1, whose elements are exactly 1; n is the number of training samples. As for the imbalanced dataset X, we need to increase the loss of misclassification of positive samples because the misclassification of positive samples (patients with epilepsy) could cause serious consequences. This can be realized by regulating the degree matrix D. The shortcomings of the cost learning algorithm can be compensated by increasing the relationship between samples. Therefore, the relationship graph not only ensures the accuracy of positive sample classification but also makes up for the lack of the mutual relationships and prior knowledge between samples.
According to the above description, the relationship graph matrix L of the synthetic dataset in Figure 4 can be expressed as 3.2. Objective Function of G-ELM. According to the above relationship graph and ELM, the objective function of the G-ELM can be expressed as follows: Here, X = ½x 1 , x 2 ,⋯,x n ∈ ℝ d×n , n is the number of samples in X, d is the sample dimension, and Y = ½y 1 , y 2 ,⋯,y n T represents the true class label of the samples. H and hðx i Þ are the same as defined in ELM. β = ½β 1 , β 2 ,⋯,β m T represents the output weight vector. ε = ½ε 1 , ε 2 ,⋯,ε n T represents the loss between the network outputs and the target outputs.
Equation (8) is the relationship graph matrix of the samples. By comparing (7) with (1), we can see that G-ELM is an improved version of ELM and still has the characteris-tics of high learning speed and strong generalization ability from ELM.

Solution of G-ELM.
In this subsection, we attempt to optimize the objective function of G-ELM. According to [20], the objective function of G-ELM is a convex optimization problem. The specific optimization solution process is as follows: The Lagrangian function corresponding to (7) is Let the derivation of J with respect to β, ε i , α i equal to zero: Substituting (10a) and (10b) into (10c), we obtain Combining (10a) and (11), we obtain β = With the obtained solution, i.e., β * , the predicted class label of the testing sample can be obtained as follows: where x test is a testing sample.

Learning Algorithm of G-ELM.
According to the above derivation, the implementation of G-ELM is summarized in Algorithm 1.

Data Preparation.
Although the real Bonn dataset has been used in many studies, the way of using it in this study differs from those in previous works. To evaluate the performance of the proposed G-ELM, nine imbalanced datasets were generated from the original five groups of EEG signals to simulate the imbalanced classification scenario. The details of the nine datasets are summarized in Table 2. In each dataset, the EEG signals of patients with epilepsy (E) were regarded as a positive class, while the other groups were regarded as a negative class, to identify whether the patients with epilepsy are experiencing seizure activity. A brief description of the five groups (A, B, C, D, and E) can be found in Table 1. The last column of Table 2 is IR, which is used to show the degree of imbalance of the dataset. IR can be defined as follows: where n + and nrepresent the number of samples of the positive class and the negative class, respectively.
In our experiment, we randomly partitioned each dataset. In each dataset, 80% of the dataset were used for training and the remaining 20% were used for testing.

Evaluation Indices.
In our experiments, we used six imbalanced classification evaluation indices to evaluate all the adopted methods. The six imbalanced classification evaluation indices were accuracy, precision, recall, F-measure, G_means, and AUC, which can be, respectively, defined as Here, TP is the number of true positive samples, FN is the number of false negative samples, TN is the number of true negative samples, and FP is the number of false positive samples, respectively.
where N + is the set of all the indexes of the positive samples and N − is the set of those of the negative samples; n + = jN + j and n − = jN − j. PðxÞ is the prediction value of x. Ið·Þ is the indicator function Input: The training samples X = ½x 1 , x 2 ,⋯,x n ∈ ℝ d×n and their corresponding labels Y = ½y 1 , y 2 ,⋯,y n T , where x i ∈ ℝ d ði = 1, 2,⋯,nÞ. The number of the hidden nodes m; the input weights A ∈ ℝ m×d and input biases b ∈ ℝ m ; the penalty parameter C. Output: The predicted class label of the testing sample x test .
Step 1: Construct the mapping matrix of hidden layer H according to Eq. (2).
Step 2: Compute the relationship graph matrix corresponding to the training samples X according to Eq. (4) and Eq. (5).
Step 3: If n < m Then compute the output weight vector β * using the first formula in Eq. (12). Else compute β * using the second formula in Eq. (12).
Step 4: Return the predicted class label of the testing sample y test = sign ðx test β * Þ.

Adopted Methods and Parameter Settings.
In the experiments, five ELM-based methods, i.e., ELM [19], W1-ELM [20], W2-ELM [20], R1-ELM [25], and R2-ELM [25], were adopted for comparisons with G-ELM. Referring to the guidelines in [2,20,46], a grid search strategy based on G_means was used to determine appropriate parameters of all the methods. We set parameter C in the range of 2 ½−28:2:28 and parameter m in the range of f50, 100, 300, 500, 1000g for all the adopted methods. All the adopted methods were run ten times on each generated imbalanced dataset. The average experimental results corresponding to the six imbalanced classification evaluation indices are reported.   To evaluate the classification performance of the proposed G-ELM, five ELM-based methods were used for performance comparison. All experiments were repeated ten times for fairness. The mean and standard deviation of the corresponding indices of all methods in each dataset are reported in Tables 3-8. The best results are shown in bold. The improvement of G-ELM relative to ELM on all datasets using the six imbalanced classification evaluation indices is shown in Figure 5.
According to experimental results in Tables 3-8, the following observations can be made: (1) For the adopted six imbalanced classification evaluation indices, the proposed G-ELM performs best on      Tables 6 and 7 are two important indices to measure the performance of imbalanced classification methods, which can be combined with recall and precision to evaluate the effect of the methods. From the results, we can see that the proposed G-ELM has the best performance. It has excellent performance in imbalanced epileptic EEG signal recognition (5) AUC is an important index to evaluate imbalanced classifiers. From Table 8, we can see that the performance of G-ELM on all datasets is the best. G-ELM has excellent performance in imbalanced classification and good effectiveness of imbalanced epileptic EEG signal recognition 4.5. Statistical Analysis. Statistical analysis was performed to further analyze the performances of all the adopted methods in our experiments. For conciseness, we only present statistical analysis of the G_means results. Firstly, the Friedman test [47] was used to calculate the average ranking of each method. The rankings of all the adopted methods are shown in Figure 6. In Figure 6, we can see that the performance of G-ELM is the best. Then, the post hoc hypothesis test [48] was used to evaluate the statistical significance of the performance differences between G-ELM and the other adopted methods. Post hoc hypothesis test results ðα Fri = 0:05Þ are presented in Table 9.
In Table 9, we can see that the null hypothesis is rejected when p Fri ≤ 0:025 due to p Fri ≤ Holm. Therefore, performance differences between G-ELM and the other adopted methods are significant, which means that G-ELM is effective for imbalanced epileptic EEG signal recognition.

Conclusions
In this study, we aimed to address the challenge that traditional machine learning methods ignore the imbalance of epileptic EEG datasets and the existing imbalanced classification methods ignore the relationships between samples. A graph-based ELM was proposed for imbalanced epileptic EEG signal recognition. First, graph theory was used to construct the relationship between samples according to the distribution. Second, a model combining the relationship graph and ELM was proposed; this model inherited the rapid learning and good generalization capabilities of ELM while maintaining satisfactory classification. Experiments on a real imbalanced epileptic EEG dataset demonstrated the effectiveness and applicability of the proposed method. However, there is still room for improvement in the scope and search method of the optimal parameters in this experiment. In the future, ways to design a better method to determine the optimal parameters will be further studied and explored.

Conflicts of Interest
None of the authors have any conflicts of interest.