A Novel Transfer Enhanced α -Expansion Move Learning Model for EEG Signals

. In this paper, we focus on recognizing epileptic seizure from scant EEG signals and propose a novel transfer enhanced α -expansion move (TrEEM) learning model. This framework implants transfer learning into the exemplar-based clustering model to improve the utilization rate of EEG signals. Starting from Bayesian probability theory, by leveraging Kullback-Leibler distance, we measure the similarity relationship between source and target data. Furthermore, we embed this relationship into the calculation of similarity matrix involved in the exemplar-based clustering model. Then we sum up a new objective function and study this new TrEEM scheme earnestly. We optimize the proposed TrEEM model by borrowing the mechanism utilized in EEM. In contrast to other machine learning models, experiments based on synthetic and real-world EEG datasets show that the performance of the proposed TrEEM is very promising.


Introduction
Epilepsy is a kind of chronic disease, which is caused by the sudden abnormal discharge of brain neurons, resulting in transient brain dysfunction. Usually patients themselves have no obvious impression of the epileptic seizure process. For this reason, doctors can only diagnose the patient's condition according to the patient's family members or other personnel present during the epileptic seizure in the past. However, the accuracy of this manual diagnosis method is low. e pathogenesis of epilepsy is mainly manifested by abnormal neural discharge and abnormal brain waves. Although medical imaging, such as Computed Tomography (CT), magnetic resonance imaging (MRI), functional magnetic resonance imaging (FMRI), Single-Photon Emission Computed Tomography (SPECT), Positron Emission Computed Tomography (PET), has made great progress over the years, and the major diagnostic method of epilepsy is based on electroencephalogram (EEG). More specifically, PET and fMRI cannot be used as common technical means because of their technical requirements and costs. In addition to the high cost, MRI cannot judge the nonstructural lesions as well. Invasive cortical electroencephalogram (ECoG) requires craniotomy and implantation of electrodes, which has a high risk; and noninvasive EEG and MEG can provide functional and structural detection. Taking all these into account, EEG has been widely concerned in more and more theoretical researches and clinical practice because of its low cost, convenient signal acquisition, and noninvasiveness. e research on diagnosis of epilepsy through EEG signals has been a hot topic in related fields, compared with manual diagnostic method, and machine learning methods are less time-consuming and more accurate [1][2][3][4][5][6][7][8]. Numerous machine learning models have been used to recognize epileptic EEG signals, such as support vector learning [9,10], fuzzy system [1,3], naïve Bayes [11], and exemplarbased clustering model [2,12,13]. e traditional machine learning process is usually divided into the three following steps, as shown in Figure 1: (1) EEG signal preprocessing improves signal to noise ratio and provides high-quality input signal for spike detection. (2) According to the characteristics of spikes, artificial design features can reduce the signal dimension and highlight the difference between spikes and background signals. (3) According to the obtained features, spike signals are detected by the machine learning mechanism involved.
In summary, one of the significant issues in the field of processing EEG signals by machine learning technique is the insufficient training data. We briefly introduce some mechanisms for epileptic diagnosis through EEG signals here. Jiang [1] integrates transductive transfer learning, semisupervised learning, and Takagi-Sugeno-Kang (TSK) fuzzy system to take full advantage of the scant training data. Zhu [5] proposed dic-mv-fcm, which automatically evaluates the importance and weights of each view and then performs weighted multiview fuzzy clustering based on FCM framework to achieve accurate fuzzy partition. Bi [2] proposed a novel model called FEEM for incomplete EEG signal, which first compresses the potential exemplar list and thus reduces the scale of pairwise similarity matrix. However, to make better use of training data, we still need to do a lot of work, and we focus on this issue in this paper as well. Specifically, this paper aims at recognizing epileptic seizure from scant EEG signals.
Transfer learning is believed to be an effective strategy to solve problems caused by insufficient training data [1,5,13,14]. Assume that there are two datasets from similar source: one has plenty of features and details and is easily to be learned, while the other one lacks details and is hard to be learned. Transfer learning offers an idea of leveraging the description of former data to study the latter data. e sufficient well-described data is called source data, while the insufficient rough data is named as target data. Accordingly, transfer learning utilizes source data to improve the learning result of target data. Under this framework, effectively measuring the relationship between source data and target data is an important part and has a great influence on the efficiency of relevant study model. us, starting from Bayesian probability theory, this paper first extends the concept of similarity matrix in the exemplar-based clustering model; and this strategy also broadens the application range of the algorithm to transfer learning scenario. By leveraging Kullback-Leibler distance, we propose a new transfer enhanced α-expansion move learning model called TrEEM. e detailed contributions of this paper are listed as follows: (i) According to the transfer learning theory [1,5,13,14], considering the similarity between source and target data, the proposed model TrEEM should keep the target data close enough to the source data. eoretically supported by the information theory, based on the Bayesian probability framework, TrEEM utilizes KL distance to measure the similarity between source data and target data and minimizes this KL distance in the optimization process. (ii) In the scenario of recognizing epileptic seizure, we aim at diagnosing the actual patient. As TrEEM is built on graph theory and pairwise similarity matrix and is an exemplar-based clustering model, this model selects exemplar from actual data. is advantage fits the requirements in the relevant scenario here. (iii) TrEEM embeds KL distance between target data and source data into the calculation of similarity matrix. us, the optimization mechanism utilized in EEM can be directly used to solve the new target function of TrEEM. In detail, we leverage α-expansion move optimization algorithm which performs better than LBP [15,16] algorithm. e paper is organized as follows. e related works are discussed in Section 2. We illustrate the target function and optimization mechanism of the proposed TrEEM in Section 3. e simulation experimental results and analysis are shown in Section 4. We make a conclusion in Section 5.

Related Works
Many researchers are committed to using machine learning technology to classify EEG signals, including SVM, fuzzy system, naïve Bayes, and exemplar-based clustering model. In this section, we illustrate two popular learning frameworks, namely, Enhanced α− Expansion Move (EEM) and TSK fuzzy system. EEM is a widely used exemplar-based learning model, and TSK fuzzy system is a typical fuzzy-rulebased clustering model.

Enhanced α− Expansion Move. Consider a dataset
X � x 1 , x 2 , . . . , x N ∈ R N * D ; N is the total number of D-dimensional data points. E is the output, whereas the element E(x i ) refers to the exemplar for each x i . e target function of a typical exemplar-based clustering model is defined as follows [12,15]: where S is the similarity matrix of the dataset, and the elements are defined as S(i, j) � − ‖x i − x j ‖ 2 ; θ p,q (E(p), E(q)) is shown as follows: In [15], the authors regard the above target function as the energy function of Markov random field (MRF) and verifies the below theorem. (3) is verified, the graph-theory based framework can be used to optimize the target function of the exemplarbased clustering model as shown in equation (1).
Enhanced α-Expansion Move (EEM) framework optimizes the above target function by an improved algorithm [15]. In more detail, theoretically supported by eorem 1 and graph-cuts [16] algorithm, EEM expands the active region of candidate exemplar from a single data to the whole dataset. EEM defines the second optimal candidate exemplar S(i) for x i as below, which is selected from the whole dataset as mentioned above.
where X l � x i |E(i) � l is the dataset among which the exemplar is l, and s ∈ (E/l) represent other exemplars in E expect for l. In this way, the optimization mechanism behaves more rapidly and effectively. EEM algorithm is one of the most popular exemplarbased clustering models, and it performs effectively and steadily in numerous simulation experiments involved [2,[12][13][14][15]. Scientists have applied this model for data stream, constrained supervised learning, and EEG signal processing.

TSK Fuzzy
System. TSK fuzzy system is a rule-based system and it is widely used as a typical fuzzy system model for both classification and clustering. Generally, the kth TSK fuzzy rule for K fuzzy rules can be described as R k .
where A k i is a fuzzy set subscribed by the input x i for the kth fuzzy rule and ∧ is a fuzzy conjunction operator. Each rule is premised on the input data X � x 1 , x 2 , . . . , x N ∈ R N * D which is mapped to a singleton f k (x). us, the output of the TSK fuzzy system is defined as where where μ A k i (x i ) is the membership grade that can be obtained using Gaussian membership function, and the other involved parameters also could be estimated using clustering techniques and other partition methods [1,[3][4][5].
Accordingly, based on the relevant theory of TSK fuzzy system, the target model above in equation (6) converts to a parameter learning process of the corresponding linear regression model. In line with recent achievements, TSK fuzzy model has strong interpretability and robustness. For this reason, this TSK fuzzy model is widely used among numerous intelligent medical diagnosis systems, including recognizing epileptic seizure from EEG signals.
In this section, we briefly introduce two popular machine learning clustering frameworks used in the recognition of EEG signals, namely, EEM and TSK fuzzy system. e detailed descriptions are shown in Table 1. Considering the scenario of diagnosing epileptic patients from some healthy patients based on their EEG signals, we focus on EEM clustering model in the rest of this paper.

Transfer Enhanced α-Expansion Move Learning Model
In this section, we first analyze the theoretical basis of TrEEM from Bayesian probabilistic framework. Second, we induce the novel algorithm TrEEM in detail. en, considering the optimization algorithm utilized in EEM algorithm, we optimize target function as well. Generally, the structure of this novel model is shown in Figure 2. See Figure 2; on the basis of source-data-based exemplar set, starting from Bayesian probability framework, TrEEM first imbeds the distance between source data and target data in the calculation of similarity matrix. is distance is measured by Kullback-Leibler distance.
en we induce the novel target function for TrEEM. Finally, TrEEM directly calls the optimization algorithm in EEM to solve this model and obtain the target-data-based exemplar set.
Besides, we list the frequently used notations in Table 2.

eoretical Preliminary of TrEEM Scheme.
As mentioned before, transfer learning considers two datasets from similar source, namely, source data and target data; and the relationship between source data and target data is considered as a significant factor in this model (see Table 2); in the following part, we define the sufficient well-described source data as X.
After study, we obtain the source-data-based exemplar set denoted as L s in the above table. en the insufficient target Scientific Programming data is defined as X above. Moreover, probabilistic framework contributes to measuring this relationship as well. erefore, supported by Gaussian probability hypothesis and exemplarbased cluster mechanism, we built the pairwise probabilistic relationship of target data by leveraging the corresponding similarity as follows: where s(x i , x E(i) ) is the similarity between x i and its current exemplar x E(i) and parameter σ is a standard deviation from Gaussian probability hypothesis.
As to the exemplar set, we should exclude the situation when an exemplar appoints other exemplars among current exemplar set except for itself as its own exemplar. Consequently, Bayesian posterior probability of an exemplar set is defined as follows: and θ m,n (E(m), E(n)) here is the same as the definition shown in equation (2). Accordingly, under Bayesian probabilistic framework and the discussion of EEM algorithm in Section 2, the objective function in equation (1) is equal to the following function: In conclusion, equation (10) defines another form of the target function of EEM by introducing Bayesian probabilistic framework and Gaussian probability hypothesis. Starting from this target function, we would be able to design TrEEM for recognition of epileptic EEG signals in the next subsections.

Design of TrEEM Scheme.
According to information theory, the Kullback-Leibler distance (KL distance) is a natural distance between two real probability distributions and it has been widely applied to solve numerous issues [17][18][19]. e definition of KL distance is shown below. Definition 1. Consider two probability distributions as P and Q; the KL distance from P to Q is as follows: where X � x 1 , x 2 , . . . , x N is the input data.   Notations Descriptions Pairwise similarity matrix L s Source-data-based exemplar set What is worth mentioning is the fact that KL distance is an asymmetric measurement, namely, D KL (P‖Q) ≠ D KL (Q‖P), according to Definition 1.
Furthermore, given L s as a possible exemplar set, L s (i) is the best exemplar for x i among current exemplar set L s . As discussed above, we also define L s (x i ) under Bayesian probabilistic framework as follows: where p(x i , x l ) is obtained from equation (8). In transfer learning, actually two datasets are involved, that is, source data and target data. In equation (12), note that the first x i is from the target data, and the second x l is from the possible exemplar set, namely, from the source data. us, see Table 2; to make the distinction clear, the symbol x i represents the source data, while x i stands for the target data in the rest of this paper.
Although the target data is not exactly same as source data, according to those theoretical analyses of transfer learning, the source-data-based learning model and results should contribute to the learning of new target data as well [3,4,[20][21][22]. Otherwise, it will become negative transfer learning, which is not under discussion in this paper. Accordingly, we believe the target-data-based exemplar that is set to be evaluated is supposed to be similar to the sourcedata-based exemplar set. In this section, we measure the difference between target-data-based exemplar set and source-data-based exemplar set by the aforementioned KL distance with the definition shown in Definition 1. To be specific, in the process of designing the TrEEM learning model, we minimize the difference of target exemplar E and source exemplar set L s by controlling the KL distance between them. e structure of TrEEM is shown is Figure 2 in detail. In view of this goal, on the basis of the probabilistic target function in equation (10) of EEM, we build the novel target function for the proposed TrEEM model as follows: where E is the target-data-based exemplar set to be obtained and L s represents the source-data-based exemplar set, as shown in Table 2. λ is the regularization parameter. In terms of maximum a priori (MAP) principle and combining Definition 1 and equation (12), (13) becomes Observing equation (14), we can find that the values of the second and third terms belong to the same magnitudes; hence, the value of λ will not be large and the specific determination strategy will be discussed in Section 4.
Introducing the definitions of p(x i , x E(i) ), p(x i , x L s (i) ), and p(E) in equations (8) and (9) and discarding the constant terms, equation (14) can be simplified into the following equation: Comparing equations (15) and (10), we conclude that they are similar in structure. According to eorem 1 in Section 2, TrEEM's target function also can be solved by graph-cuts mechanism. Consequently, we will discuss the optimization mechanism step by step in the next subsection.

Optimization Mechanism of the TrEEM Scheme.
As mentioned before, the novel target function in equation (15) is similar to that of EEM algorithm under Bayesian probabilistic framework, so the optimization mechanism utilized in the EEM algorithm is supposed to be helpful in solving the novel target function. However, we need to deal with the difference between these two models firstly.
In detail, we redefine the similarity relationship of target data by imbedding source-data-based exemplar set L s .
Specifically, we single out the suitable exemplar from L s for target sample x i by equation (12) and build the new pairwise transfer similarity matrix S t � (s t (x i , x j )) according to the new measurement in the following equation: where d(x i , x j ) � ‖x i − x j ‖ 2 is the Euclidean distance between samples x i and x j , λ is the regularization parameter, x L s (x i ) refers to the exemplar singled out from source data. By introducing this new definition of similarity relationship, the target function equation (15) of TrEEM is equal to equation (10) in structure. Meanwhile, the constraint condition in eorem 1 is true for TrEEM model as well. erefore, the optimization mechanism of EEM algorithm is also suitable for the proposed TrEEM model; and the novel model TrEEM is described in detail in Algorithm 1.
EEM utilizes α-expansion move to optimize its learning model. As discussed above, the mechanism is also suitable for the proposed TrEEM model. We analyze this Enhanced α-Expansion Move optimization mechanism step by step here. Firstly, as the target functions shown in both equations (15) and (10) also can be defined as the energy function of MRF, we consider this optimization process as an energy reduction process of the MRF. In general, we start from the change values of energy to decide whether to accept new exemplar for a sample. Secondly, the improved optimization mechanism is designed to broaden the effective field when changing the sample's exemplar. at is to say, assume that a Scientific Programming 5 sample's current exemplar is abandoned; it will search all the rest exemplars for a new exemplar. is new alternative exemplar is defined as follows: where l is the original exemplar for x i , E is current exemplar set, and a ∈ (E/l) is the obtained alternative exemplar. By introducing this alternative exemplar A(i) for x i , we enhance the optimization efficiency. Note that TrEEM model redefines the similarity matrix as equation (16). So, the following discussion is based on the similarity matrix S t � (s t (x i , x j )). Apparently, the optimization mechanism would consider two cases; namely, x l is among current exemplar set or is not among current exemplar set. We analyze these two cases step by step in the next subsections.

Case I. x l is a current exemplar.
Obviously, in the process of optimization, this current exemplar x l may be abandoned. As previously analyzed, whether to keep x l in the ultimate exemplar set is decided by the reduction values of energy function calculated by the target function in equation (15).
Specifically, if x l is accepted as an exemplar, the energy of the model remains unchanged, and the reduction value is equal to 0. Otherwise, if x l is not accepted, all samples whose exemplars are l would redetermine their exemplars; these samples are defined as X l � x i |E(i) � l . eoretically supported by the related analysis in [2,12,14,15], new exemplar for x i ∈ X l Input: Target dataset X � x 1 , x 2 , . . . , x N ∈ R N * D , source data X � x 1 , x 2 , . . . , x N s ∈ R N s * D , source-data-based exemplar set L s , self similarity d(x i , x j ), regularization factor λ, σ. Output: Valid target-data-based exemplar set E(N).
(1) for x i ∈ X do (2) single out the nearest exemplar L s (i) for x i from source-data-based exemplar set L s based on equation (12).
(3) compute probabilistic Euclidean similarity p(x i , x L s (i) ) between x i and L s (i). (4) end (5) for x i ∈ X do (6) calculate transfer similarity matrix S t � (s t (x i , x j )) by new probabilistic similarity p(x i , x L s (i) ) according to equation (16). (1) Randomly generate expansion order o.
(2) Let t � 1; (10) compute R out , R e out , R l out by equations (20), (21), (22)  (11) if R l out > R e out then (12) for ∀x i ∈ X /e e,l , set E(x i ) � l (13) else (14) for ∀x i ∈ X e , set E(x i ) � A(i) (15) end (16) if R out > 0 then (17) Accept the new exemplar l (18) end (19) end (20) 6 Scientific Programming would be A(i) as shown in equation (17). us, the energy reduction R l in should be computed by the following equation: en, we take the greater value of 0 and R l in as the ultimate energy reduction for this case, as defined in the following equation: Namely, if R l in is the ultimate energy reduction, x i ∈ X l change their exemplars to A(i). Otherwise, the current exemplar set is convincing and remains unchanged.

Case II.
x l is not a current exemplar.
In this case, we define the current exemplar of x l as x e . When optimizing this situation, we firstly pretend to consider x l as a new alternative exemplar; namely, en, similar to the analysis in case I, whether to accept x l as ultimate exemplar is decided by the reduction values of energy function. In detail, if x l is accepted as a feasible exemplar, some samples would change their exemplar from x e to x l . ese samples are defined as . us, the corresponding energy reduction is defined as follows: On the other hand, may be current exemplar set E ′ is not convincing, so all samples would be certain to redetermine their exemplars including x l . As discussed before, the new exemplars for these samples are defined by equation (17), and the resulting energy reduction is listed as follows: Remember that equations (20) and (21) are based on the assumption that E ′ � E, E ′ (x l ) � l. Considering this, the energy reduction caused by x l which is not a current exemplar should be To sum up, the optimization mechanism is shown below in detail.

Model Complexity.
e similarity matrix is calculated according to the Euclidean distance; s( So, the scale of the similarity matrix is N 2 ; note that the amount of target data is not big. In the optimization process, we directly utilize the α-expansion move, which has O(N 2 ) time complexity. For the proposed TrEEM, source-databased exemplar set is actually one of the inputs and is out of the scope of the time complexity analysis of TrEEM here. Although we adopt EEM to obtain the source-data-based exemplar set L(s), many other clustering models could be helpful. TrEEM needs to select L s (i) from the source-databased exemplar set in the first step, and this procedure has the time complexity of O(N). In summary, the time complexity of TrEEM is O(N 2 ) overall. Compared with other state-of-the-art transfer learning frameworks, it is very acceptable.

Experimental Results
To comprehensively evaluate the TrEEM model, we have conducted several experiments based on both synthetic and real-world datasets. For comparison, we also perform comparison with other different machine learning mechanisms, namely, EEM [15], multiclass SVM [23], TSK fuzzy system [24], and TSC [25] in the experiments. In this section, we will carefully analyze these experimental results.

Preparation.
Before inputting the TrEEM model, we need to preprocess the original nonstationary EEG signals [1][2][3]. Usually, the features of EEG signals include time-domain features, frequency-domain features, and time-frequency features. In short, in time-domain analysis, statistics component features of the original EEG signals will be analyzed [26]. In frequency-domain analysis, power spectrum analysis and Short-Time Fourier Transforms (STFT) [27,28] are commonly used. In time-frequency analysis, time domain and frequency domain are simultaneously extracted from high-dimensional and nonlinear EEG signals.
Various methods have been commonly used to extract EEG signals' features, including wavelet [29,30], KPCA (Kernel Principal Component Analysis) [1,2], and LDA (Linear Discriminant Analysis). In line with the experiments setting in [1][2][3], we use two feature extraction methods in this section, that is, KPCA and wavelet.
Besides, we use both synthetic and real-world datasets in this section. Firstly, we randomly generate 300 two-dimensional data points as 3 classes, shown in Figure 3. en, we also choose Bonn EEG dataset [1,2] as real-world data.
e Bonn dataset is from the University of Bonn, Germany (http://epileptologie-bonn.de/cms/upload/workgroup/lehnertz/eegdata.html), and has five classes. Each class (A to E) contains 100 signal channel EEG segments of 23.6 s duration. e sampling rate of all the datasets was 173.6 Hz. Each sample has 6 attributes. Table 3 lists a brief description of this dataset.
In addition, we examine the involved experimental results from two performance indices, namely, RandIndex(RI) [2,31] andPurity. Assume that N is the total number of data points; we give the definitions of them below. at is, RI is shown in the following equation: where f 00 is the amount of data whose cluster is in line with their class, while f 11 is the amount of those data whose cluster is inconsistent with their class. Also Purity is defined in the following equation: Scientific Programming where E � e 1 , . . . , e N is the cluster result obtained by the learning model, while C � c 1 , . . . , c N is the real data label set. In all, the experiments are implemented in 2010a Matlab on a PC with 64-bit Microsoft Window 10, an Intel (R) Core (TM) i7-4712MQ, and 8 GB memory.

Results
Analysis. As mentioned before, four machine learning methods are involved in this section, namely, EEM, multiclass SVM, TSK-FS, and the proposed TrEEM algorithm.
ere is no need to preset the cluster number in advance for EEM and TrEEM. In fact, it is a huge advantage for all exemplar-based clustering frameworks, whereas cluster number is an important parameter for TSK-FS. Multiclass SVM and TSC are two typical classification methods. Both EEM and TrEEM need parameter self-similarity d(x i , x i ). For multiclass SVM, in line with [23,32], we choose Gaussian kernel function. In TSK-FS, usually the number of clusters is set to be equal to the number of fuzzy rules. Also, TSC need to preset the number of clusters. We follow the parameter setting strategy in relevant papers here. Besides, 5-fold cross validation is used to search the optimal parameters; and Table 4 lists brief introductions of these involved methods and the parameter searching range.
To construct the transfer learning scenario, for both synthetic and real-world EEG signal datasets, we randomly choose 80% data as source data and the remaining 20% as target data. For statistical analysis, in the experiment procedure, each algorithm is repeatedly executed 10 times; and we record the average performance and the corresponding standard deviation of RI and Purity. Furthermore, to deeply observe different extraction methods of EEG signals, we use both KPCA and wavelet here. e detailed comparison in terms of RI and Purity of the proposed TrEEM model and other benchmark approaches is shown in Table 5.
Observing Table 5, in this experimental setting, especially considering the fact that Bonn EEG signal dataset has 6 attributes and 5 classes, the performance of TrEEM model is very promising. TrEEM model is capable of recognizing useful information from both synthetic and real-world EEG signal datasets. Moreover, compared with other benchmark machine learning models, the proposed approach TrEEM performs better in terms of RI and Purity in this scenario.
In the experiment procedure, we also find that parameter self-similarity d(x i , x i ) has important influence on the experimental results, especially on the obtained number of clusters. e finding is identified with other exemplar-based clustering models [2,[12][13][14][15], and the parameter selection method is also in line with these models. See Table 4  while small η will bring in big cluster numbers. To fit with real data labels, in our experiments here, we set η � 1. e regularization factor λ has a big effect as well. As analyzed before, λ determines how the source data affects the clustering result, and the value should not be too large. Obviously, if λ is too large, the clustering result based on the target data will be very close to that based on the source data, which is not what we want. When λ � 0, it means that TrEEM does not take the source data into account and TrEEM degrades to the typical EEM framework. In particular, Figures 4-6 show the dependence of model results on the value of λ. When λ > 0, in terms of RI and Purity, source data improves the performance of TrEEM. Index Purity is more sensitive to λ, while RI changes slowly. Table 6 shows the average running time of 10 times for each approach. Yet the time consumption of the proposed TrEEM model is a bit more than those of EEM and TSK-FS; it is still in the same magnitude. Considering the improvements in RI and Purity, we think that the time complexity is acceptable. e results also fit the discussion in Section 3.4. erefore, from experimental results in Tables 5 and 6 and the above analysis, we can conclude the following: (1) For both synthetic and real-world EEG signal datasets, TrEEM performs great. us, we believe that TrEEM can effectively absorb knowledge from scant target data when similar source data exists.
(2) For time consumption, TrEEM takes source data into account, which will inevitably increase the time complexity. Remember that the scale of target data will not be big, and the time consumption is very acceptable especially when combined with the performance in Table 5. (3) Although TrEEM requires the most parameters shown in Table 6, λ and η have big effects. Observing Figures 4-6, the performance of RI and Purity depends more on the value of λ. Note that we can narrow the optimization range according to the discussion in Section 3. us, we believe that parameter setting would be easy. Multiclass SVM: a typical classification learning model 5 . Penalty parameter C ∈ 10 − 3 , 10 − 2 , 10 − 1 , 10 0 , 10 1 , 10 2 , 10 3 .

Conclusion
In conclusion, the contribution of this paper is providing a novel TrEEM framework to learn from few EEG signals when recognizing epileptic seizure. Starting from information theory, the proposed TrEEM method implants the similarity relationship between source and target data into the exemplar-based clustering model to improve the utilization rate of EEG signals, whereas this structure keeps all merits of the original optimization scheme. erefore, without increasing the complexity of the model, TrEEM utilizes transfer learning method to learn from scant EEG signals. Yet our experimental results have shown promising performance of TrEEM, and several other perspectives should be considered as well. For instance, when each class contains unbalanced data, will this TrEEM method still work? And if we can provide multiple source data, what should we do to make them collaborate instead of bringing a negative effect?
ese are the problems that we should discuss in the future.
Data Availability e data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.