Fast Enhanced Exemplar-Based Clustering for Incomplete EEG Signals

The diagnosis and treatment of epilepsy is a significant direction for both machine learning and brain science. This paper newly proposes a fast enhanced exemplar-based clustering (FEEC) method for incomplete EEG signal. The algorithm first compresses the potential exemplar list and reduces the pairwise similarity matrix. By processing the most complete data in the first stage, FEEC then extends the few incomplete data into the exemplar list. A new compressed similarity matrix will be constructed and the scale of this matrix is greatly reduced. Finally, FEEC optimizes the new target function by the enhanced α-expansion move method. On the other hand, due to the pairwise relationship, FEEC also improves the generalization of this algorithm. In contrast to other exemplar-based models, the performance of the proposed clustering algorithm is comprehensively verified by the experiments on two datasets.


Introduction
Epilepsy is a common disease of nervous system, which is characterized by sudden brain dysfunction. Although there are many other neuroimaging modalities for the recognition of brain activity, EEG signals have a high temporal resolution which is up to the millisecond level, and its acquisition equipment is inexpensive, portable, and noninvasive. Nowadays, most diagnoses of epilepsy are based on clinical experience and the analysis of electroencephalogram (EEG) signals. Compared with manual diagnostic method, machine learning methods are less time-consuming and more consistent [1][2][3][4][5][6]. Specifically, many machine learning methods such as support vector learning [7,8], Takagi-Sugeno-Kang (TSK) fuzzy system [9,10], and Naïve Bayes [11] have been applied.
As we know that brain activity is a nonlinear, unstable complex network system, EEG signals we usually get are complicated. at is to say, some EEG signals are complete while others may miss some features, namely, incomplete. erefore, recognition of epilepsy based on machine learning models will be more promising compared with clinical diagnosis depending on experience. Moreover, EEG signals have the characteristics of high dimension and stochasticity which limit the performance of most existing clustering models, such as k-means [11] and fuzzy c mean (fcm) [12]. K-means and fcm clustering models need to preset the number of clusters in advance. More specifically, the performance of the k-means model relies on the initialization of data, while the fcm model requires high interpretability.
us, we focus on the exemplar-based clustering model [13] which is proposed by Frey in this paper.
e exemplar-based clustering model has the advantages of automatically obtaining the cluster number, high efficiency, and not relying on the initialization of data.
In conclusion, we consider the scenario of EEG signals consisting of most complete data and few incomplete data in this paper, as shown in Figure 1. Based on the previous work about the recognition of epileptic signals, we propose a novel fast enhanced exemplar-based clustering (FEEC) model for incomplete EEG signals. As shown in Figure 1, different from existing exemplar-based clustering models, FEEC compresses the exemplar list and reduces the pairwise similarity matrix, and then FEEC optimizes the target model by the enhanced α-expansion move framework. Moreover, the contributions of this paper can be highlighted as follows: (1) We extend the existing exemplar-based clustering algorithm into a fast version by compressing the potential exemplar lists in this study. FEEC compresses the number of potential exemplars by processing the most complete data in the first stage and extends the few incomplete data into the exemplar list. So, the complexity of FEEC is reduced as well.
(2) Along with most existing exemplar-based clustering models, FEEC is built on the pairwise similarity matrix of data. us, after compression, FEEC would construct a new reduced similarity matrix, and the generalization of this algorithm is improved. (3) Moreover, this paper also considers the fact that the graph cuts [14] based optimization performs better than those loopy belief propagation (LBP) [15] based structure. So, the proposed FEEC algorithm optimizes the target model by the enhanced α-expansion move framework [16,17]. (4) Experimental results of both synthetic and realworld datasets indicate the promising efficiency of the proposed FEEC algorithm. e rest of this paper is listed as follows. In Section 2, we introduce some static exemplar-based clustering models. Section 3 discusses the proposed FEEC algorithm step by step. In Section 4, we analyze the experimental results and the comparison of FEEC and other existing methods. Section 5 concludes this whole paper.

Background
Since EEG signal feature extraction methods and exemplarbased clustering models are two important supporting theories for the FEEC model in this study, we will briefly introduce several feature extraction methods and exemplarbased clustering models in this section.

Feature Extraction Methods.
Original EEG signals have the characteristics of high dimensionality, stochasticity, and nonlinearity. It would be computationally very expensive to extract features from raw EEG signals; nowadays, many feature extraction methods have been proposed to handle this problem. In sum, there are three categories, i.e., timedomain features, frequency-domain features, and timefrequency features.
More specifically, in time-domain analysis, statistics component features of the raw EEG signals will be analyzed [18]. In frequency-domain analysis, power spectrum analysis and short-time Fourier transform (STFT) [19,20] are commonly used. In time-frequency analysis, time and  frequency domain are simultaneously extracted from nonstationary EEG signals. Wavelet and other improved versions [21,22] are widely used in EEG signal processing. We utilize KPCA to extract feature in this paper.

Exemplar-Based
Clustering Models. Exemplar-based clustering models select cluster centers, namely, exemplars, from existing actual data. We focus on exemplar-based clustering models in this paper and briefly introduce affinity propagation (AP) [13] and enhanced α-expansion move (EEM) [17] in this section. And several extended versions for different scenarios are shown in Table 1. e target fucntion defined by exemplar-based clustering model equals to the minimization problem of energy function of Markov random field(MRF). Two existing optimization startegies have been utilized and evolved into AP and EEM frameworks accordingly. Moreover, loopy belief propagation (LBP) [23] is used in AP, while graph cuts technique [15] is used in EEM, respectively.

Affinity Propagation.
AP is based on message passing among data points, and its target function is defined as follows: where where X � x 1 , x 2 , . . . , x N ∈ R N * D is an input dataset and N is the total number of D-dimensional data points. SE is the output of this framework, and the element E(x p ) is referred to the exemplar for each x p . According to AP, each point receives availability message A(i, k) and sends responsibility R(i, k) message simultaneously, which are defined as follows: where S is the similarity matrix of data points and is defined as S(i, j) � − ‖x i − x j ‖ 2 . Meanwhile, S(k, k) � p where p is named as preferences in this framework. Moreover, its value should be independent and can be set to a constant. AP does not require presetting the number of the cluster and the performance is stable. Considering these advantages, many extended versions of AP have been proposed [24,25]. Specifically, AP defines fading factor to adjust the iteration speed; adAP [24] is proposed to determine this fading factor adaptively. Moreover, several extended versions of AP methods have been proposed to deal with large data and link constraints. For instance, IAPKM, IAPNA, and IAPC [26,27] employ incremental strategy and semisupervised AP and SSAP [28] concentrate on instance-level constraints. A two-stage fast version of AP (FAP) [29] is also proposed to improve the efficiency. However, although AP has been obtaining its success in various applications, when we attempt to directly apply AP to incomplete EEG signals, the performance is unsatisfactory.

Enhanced α-Expansion Move. In 2014, Zheng and
Chen [17] utilized enhanced α-expansion move framework to optimize the object function of exemplar-based clustering models and accordingly proposed the EEM clustering model. In line with the mathematical symbols above, the target function of EEM is defined as follows: In terms of [17], α-expansion move algorithm has been proved to be effective in the optimization of the target function equation (5). (7) is verified.

Computational and Mathematical Methods in Medicine
Furthermore, according to graph theory, in the fast α-expansion move algorithm, the expansion range is limited in a one exemplar. To break this limit, the EEM model enlarges the range to the whole exemplar set E when optimizing and defines a second exemplar S(i) for each point x i as follows: is the dataset among which the exemplar is l and s ∈ (E/l) represents other exemplars in E except for l. e EEM clustering model is a state-of-the-art achievement of exemplar-based clustering model and has been proved to be efficient and effective for numerous scenarios [16,17,30]. IEEM [30] is proposed to deal with link constraints by embedding a bound term in the target function. For dynamic data stream, Bi and Wang [16] also proposed an incremental EEM version DSC which processes data chunk by chunk. However, for incomplete EEG signals, these methods would not recognize epilepsy well.

Fast Enhanced Exemplar-Based Clustering Model
In this section, the proposed FEEC model will be stated and theoretically analyzed in detail. We first compress the exemplar list and reduce the pairwise similarity matrix, and then the target model is optimized by the enhanced α-expansion move framework.

Framework.
As mentioned in the introduction section, we focus on the incomplete EEG signals which consist of most complete data and few incomplete data. To improve the efficiency of the EEM clustering model for these incomplete EEG signals, the proposed FEEC framework includes two stages, namely, compression stage and optimization stage. As shown in Figure 2, the compression stage compresses the potential exemplar list and the optimization stage determines the optimal exemplars from the potential exemplar list. Accordingly, the target function can be defined as follows: where X � [X c , X l ] is the input dataset consisting of most complete data X c � x c,1 , x c,2 , . . . , x c,N c and few incomplete data X l � x l,1 , x l,2 , . . . , x l,N l . e total number of data is defined as N � N c + N l , where N c and N l are the number of complete and incomplete data, respectively. Remember that we only consider the scenario that N c ≫ N l in this study. e second term in equation (9) guarantees the validity of the exemplar list; its definition is similar to that of δ in equation (2). In the end, represents the exemplar set in question.
In the compression stage, the number of potential exemplar list will be reduced by exemplar-based selection algorithm, namely, EEM method in this study. To be specific, we apply the EEM model on the most complete data to obtain the potential exemplars for these data. FEEC also pulls the few incomplete data into this potential exemplar list and then constructs compressed similarity matrix. erefore, after compression, only the pairwise similarities between data and potential exemplars would be preserved. Considering that the FEEC method is built on the pairwise similarity matrix, the following clustering procedure would be applied on this compressed similarity matrix. Furthermore, the scale of similarity matrix is reduced from N 2 to Nc, where N and c are the number of data and potential exemplars, respectively.
In the optimization stage, only the similarity relationship between data and potential exemplars is considered. e new target function after compression is similar to that of other exemplar-based clustering model, like equations (1) and (5), so we take graph cuts and LBP into account. Nevertheless, graph cuts based optimization framework outperforms LBP structure [31]. So, the proposed FEEC utilizes the α-expansion move method to optimize the new target function. Moreover, along with EEM, FEEC also expands the expansion move space from a single data to the second optimal exemplar.

Compression Stage.
In the compression stage, the target function of complete data can be defined as follows: where X c � x c,1 , x c,2 , . . . , x c,N c ∈ R N c * D is the complete Ddimensional data and N c is the number of these data. E c is the potential exemplar list for most complete data, and the element among E(x c,i ) is referred to the potential exemplar for each x c,i . e optimization framework of other exemplarbased clustering models, like EEM, can be utilized to solve  (10). In this paper, we select the graph cuts algorithm instead of message-passing algorithm to compress the potential exemplar list. us, the potential exemplar list for complete data E c can be determined, and the number of potential exemplars is defined as c c . e potential exemplar list after compression stage would be where E l is the exemplar set for the few incomplete data, which is the incomplete data itself actually. at is to say, In this stage, we reduced the number of potential exemplars from N c to c c . In terms of the analysis in [13,17,30], the time complexity of this stage will be O(N 2 c ). Compared with the time complexity O(N 2 ), if we apply exemplar-based clustering model directly considering the fact that N < N c , the time complexity of this compression algorithm would be acceptable. erefore, on the basis of the new exemplar list after compression, we can construct the new similarity matrix S new ∈ R N×c ; the element relates to the distance, namely, e scale of the similarity matrix reduces from N 2 to Nc, where c � c c + N l represents the number of potential exemplars.

Optimization Stage.
After compression, we define the new target function as follows: where S new is the new similarity matrix constructed after compression.
In this section, we construct an optimization framework for equation (12). e second term of equation (12) is set to guarantee the validity of the exemplar list; in order to utilize the graph cuts based method, this term should be pairwise [17]. So, η i (E) is modified as η i,j (E). Furthermore, similar to equation (5), we define η i,j (E) as follows: It has been proved that with the definition of η i,j (E), equation (12) can be optimized by the enhanced α-expansion method [30]. To improve the efficiency of framework, this method enlarges the expansion move to the second optimal exemplar. Before optimization, we explain several symbols involved. First, we define X e as those data with exemplar x e and x α as the current potential exemplar. en, the enhanced α-expansion move method considers the second optimal exemplar, which is defined as where (E/α) is the potential exemplar list except for α.
Apparently, this optimization method should consider two cases, namely, x e is among exemplar list or not, as shown in Figures 3 and 4. To be specific, Figure 3 illustrates the case when x α is an exemplar, while Figure 4 shows the case when x α is not an exemplar. Remember that only when x α is a potential exemplar, S(x i ) works. We utilize the concepts of "energy reduction" because this method was first used to optimize the Markov random field (MRF) energy function.
In the situation shown in Figure 3, either ∀x i ∈ X α changes its exemplar to S(x i ) or nothing is changed. erefore, the energy reduction R1 would be defined as where R1 α is the energy reduction when ∀x i ∈ X α changes its exemplar to S(x i ) and is defined as On the other hand, as shown in Figure 4, a new exemplar x α should be considered. Whether to accept the new exemplar is decided by the energy reduction R2, which will be discussed next. First, we assume a new exemplar x α is accepted. In fact, the following procedure is similar to that shown in Figure 3. Specially, the remaining data would change its exemplar to either x α or S(x i ). For data in cluster e ∈ E, theoretical analysis proves that only when the exemplar x e changes its exemplar, ∀x i ∈ X e would change its exemplar as S(x i ). In this case, the energy reduction is defined as follows: Otherwise, some data in cluster e may change their exemplar as x α ; we define these data as X /e e,α , and the corresponding energy reduction R2 e is defined in the following equation: en, the energy reduction R2 is defined as follows:

Compression Optimization
Exemplar list Potential exemplar list Exemplar Computational and Mathematical Methods in Medicine 5 In sum, the new target function equation (12) is optimized, and the optimal exemplar list for the EEG signals is generated.

Time Complexity and Description.
e similarity relationship can be measured by Euclidean distance between data, defined as d(x i , x j ) in this study. e proposed algorithm FEEC consists of two stages, namely, compression stage and optimization stage. After compression, the scale of similarity matrix reduces from N 2 to Nc, so the optimization stage has the time complexity of O(c 2 ). erefore, the complexity of FEEC is considerably promising.
Based on the theoretical analysis above, the proposed FEEC for incomplete data can be summarized as Algorithm 1.

Experimental Study
To comprehensively evaluate the proposed algorithm FEEC, we have conducted several experiments based on both synthetic and real datasets. We also compare our new model with basic exemplar-based clustering model, namely, AP and EEM; to show these experimental results, we choose four performance indices in this section. In our experiments, all the algorithms were implemented using 2010a Matlab on a PC with 64 bit Microsoft Windows 10, an Intel(R) Core(TM) i7-4712MQ, and 8 GB memory.

Data Preparation.
We choose Aggregation [32], as shown in Figure 5, and the Bonn EEG signal datasets in this section. e Bonn dataset [9,10] is from the University of Bonn, Germany (http://epileptologie-bonn.de/cms/upload/workgroup/lehnertz/ eegdata.html). e EEG dataset contains five groups (A to E and each group contains 100 single channel EEG segments of 23.6s duration. e sampling rate of all the datasets was 173.6 Hz. Figure6 shows five healthy and epileptic EEG signals, and Table 2 lists detailed descriptions of these signals. Table 3 shows a brief description of these datasets. To construct the incomplete data scenario, we randomly choose 80% data as complete data and the remaining 20% as the incomplete data. We utilize KPCA to extract features from EEG signals in this section.

Performance Indices.
Here, we give the definitions of the three adopted performance indices ENERGY, NMI, and accuracy. Along with the description in [12,16,30,33,34], we call the result outputted by these involved models as cluster and the true labels as class.

ENERGY.
Since all the mentioned clustering algorithms are optimized, respectively, by the energy functions of the same type, we can compare them in terms of their energy values, defined as follows: where x k denotes the kth exemplar, x k,i is the ith data point in kth cluster, and d(x k , x k,i ) is the Euclidean distance between x k and x k,i which can be seen as a measurement of energy.

NMI.
NMI has been widely used to evaluate the clustering quality as well, and its value can be calculated by the following equation: where N i,j is how clusters fit the classes, N i is the number of data points in ith cluster, N j is the number of data in jth class, and N is the total number of data points.

Accuracy.
Accuracy is a more direct measure to reflect the effectiveness of clustering algorithms, which is defined as where c i is the real label of data points and c i is the obtained clustering label. δ(i, j) � 1, if i � j; δ(i, j) � 0, otherwise. Function map(·) maps each obtained cluster to real class, and the optimized mapping function can be found in Hungarian algorithm. e values of NMI and Acc range from 0 to 1, and the more it is close to 1, the more effective the clustering algorithm is. What is worth to mention is that we put % in the following relevant tables to show better precision. As to the performance index ENERGY, the smaller the value is, the better the clustering algorithm is.

Experimental Results and Discussion.
e parameters involved FAP, AP, and EEM are in line with [13,17,29]. e preference s(i, i) is set to be the median value of similarities between data. We run each algorithm over 10 runs under same parameters; the average results are shown in Table 4. Moreover, the detailed comparison in terms of the above 3 terms, NMI, accuracy, and ENERGY, are shown in Figures 7-12 and Table 4, respectively.
By analyzing Figures 7-11 and Table 4 in detail, we can conclude the following: Input: Given incomplete data X � [X c , X l ] ∈ R N×D , X c Output: Valid exemplar set E.

Conclusions
e diagnosis and treatment of epilepsy is always a significant direction for both machine learning and brain science.
is paper newly proposes a fast exemplar-based clustering method for incomplete EEG signal. e FEEC method includes two stages, namely, compression and optimization. e performance of the proposed clustering algorithm is comprehensively verified by the experiments on two datasets.
Although most recognition methods of epilepsy are based on EEG signals at present, researchers also have to study on other neuroimaging modalities, such as cortical electroencephalography (ECoG), functional infrared optical imaging (fNIR), functional magnetic resonance imaging (fMRI), positron emission tomography (PET), and magnetoencephalography (MEG). Considering the fact that the brain activity is a nonlinear, networked, and unstable complex system, we would focus on the multimodal clustering model for these neuroimaging modality signals in future.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.