Multifeature Metric Learning Based on Enhanced Equidistance Embedding for Electroencephalogram Recognition of Epilepsy

Mobile edge computing (MEC) has the ability of pattern recognition and intelligent processing of real-time data. Electroencephalogram (EEG) is a very important tool in the study of epilepsy. It provides rich information that can not be provided by other physiological methods. In the automatic classification of EEG signals by intelligent algorithms, feature extraction and the establishment of classifiers are both very important steps. Different feature extraction methods, such as time domain, frequency domain, and nonlinear dynamic feature methods, contain independent and diverse specific information. Using multiple forms of features at the same time can improve the accuracy of epilepsy recognition. In this paper, we apply metric learning to epileptic EEG signal recognition. Inspired by the equidistance constrained metric learning algorithm, we propose multifeature metric learning based on enhanced equidistance embedding (MMLE) for EEG recognition of epilepsy. The MMLE algorithm makes use of various forms of EEG features, and the feature weights are adaptively weighted. It is a big advantage that the feature weight vector can be adjusted adaptively, without manual adjustment. The MMLE algorithm maximizes the distance between the samples constrained by the cannot-link, and the samples of different classes are transformed into equidistant; meanwhile, MMLE minimizes the distance between the data constrained by the must-link, and the samples of the same class are compressed to one point. Under the premise that the various feature classification tasks are consistent, MMLE can fully extract the associated and complementary information hidden between the features. The experimental results on the CHB-MIT dataset verify that the MMLE algorithm has good generalization performance.


Introduction
Mobile edge computing (MEC) converges cloud computing capabilities and Internet service environment to the edge of the network, which can provide services to users nearby, and effectively makes up for the deficiencies of cloud computing [1]. The combination of MEC and artificial intelligence technology is a research hotspot in recent years. MEC has rich application scenarios in the field of intelligent medicine. Electroencephalogram (EEG) analysis is widely used in neuroscience, especially in the diagnosis and seizure of epilepsy [2,3]. In clinical practice, the diagnosis of epilepsy is mainly based on the patient's history of seizures, and further examination and diagnosis are made concerning the EEG signals. EEG-based epilepsy detection mainly relies on the personal experience of the doctor. With the gradual development of intelligent medical treatment, the automatic recognition and detection of epileptic EEG signals have become an important auxiliary detection means. How to extract effective features from EEG and design appropriate classification algorithms is the key task to epilepsy detection.
At present, the most commonly used feature extraction methods include the following: time domain analysis, frequency domain analysis, time-frequency analysis, nonlinear dynamics, and model-based methods [4]. The time domain feature method regards EEG as a time series, calculates the correlation statistics of the sequence, and extracts the corresponding epileptic EEG features. For example, Kaya et al. [5] used the histogram features based on local binary patterns (LBP) together with Bayesian networks to classify epileptic seizures. Another widely used strategy is to extract frequency domain features from a given EEG signal. Fourier transform is one of the most commonly used algorithms to extract frequency domain features from time series data. Frassineti et al. [6] proposed a preprocessing step method. The signal is filtered by a fixed wavelet transform to reduce possible artifacts. Then, the support vector machine fine Gaussian method is used to detect epilepsy. Chandel et al. [7] proposed a combination of features based on ternary wavelet decomposition to predict the onset and termination of epilepsy. This method extracted standard deviation, variance, and high-order moments to represent the characteristics of different EEG activities and used linear discriminant analysis and K-nearest neighbor (KNN) classifiers to classify EEG between seizure and interictal periods. In the extraction of nonlinear dynamic features, the method based on complexity analysis is widely used in epilepsy detection, and the most commonly used is the feature extraction method using entropy strategy. For example, Xiang et al. [8] developed a feature extraction algorithm using fuzzy entropy. This method first calculated the fuzzy entropy of EEG signals from different epileptic states, then performed feature selection, and finally used a support vector machine for prediction. Hussein et al. [9] identified EEG seizures by modifying the fuzzy entropy with minimum variance. Firstly, appropriate filtering and independent quantity analysis were carried out to remove noise and artifacts, and then, the proportional operation was carried out to obtain the Input: Multi-feature matrix X l , its cannot-link and must-link sets; Output: The metric matrix Q and Δ. Initialization:Δ = ½1/L, 1/L, ⋯, 1/L, Q = I, Repeat t = t+1 Step 1: fixed Q(t),compute ΔðtÞ using Eq.(13); Step 2: fixed ΔðtÞ, compute Q(t) using Eqs.(9)-(11); Step 3: compute the value of objective function J(t); Until J(t) is convergence or t ≥ t max Step 4: obtain the optimal Q and Δ.      Figure 4: The frequency domain feature extraction process in this study.

Wireless Communications and Mobile Computing
according to the frequency separation standard, and then extracted features from the decomposed components. Usman et al. [12] converted the EEG data into a proxy channel and then used the empirical mode decomposition to improve the prediction results.
Supervised learning in machine learning is widely used in epilepsy detection. Some famous supervised learning algorithms, such as KNN, decision trees, and metric learning, have been successfully used for epilepsy detection. Metric learning is aimed at learning a more suitable distance measurement criterion in the feature space, in order to more accurately represent the similarity between samples. Metric learning is widely used in face recognition, object detection, image recognition, and so on. Weinberger and Saul [13] developed a large margin nearest neighbor analysis algorithm based on a support vector machine. The obtained Mahalanobis distance had the advantages of maximum marginal and internal consistency. Liu    Many classification algorithms largely rely on the distance measurement of the input data. In EEG classification, the key problem is to find a good distance measurement, to classify the test EEG into the class of the nearest EEG samples. Many researches have shown that an appropriate distance measure can significantly improve classification accuracy. In metric learning, EEG recognition depends on the similarity measurement between the input EEG data samples, and the similarity measurement between EEG data samples is realized by the distance measurement of the input feature vector of EEG data samples [17,18]. Therefore, it is crucial to find a good distance metric in the sample feature space.
Due to the rhythm of EEG signals and the collection of EEG signals of multiple channels, EEG data samples have rich feature information. The contribution of different forms of features to EEG recognition is different, some play a decisive role, some play an auxiliary role, and some play a small or no role. Metric learning measures the similarity of EEG data samples and treats all features equally. Obviously, this strategy cannot accurately measure the similarity between EEG data samples, which will affect EEG signal recognition. In addition, various types of EEG signal features can be obtained from different feature extraction algorithms. Based on the principle of consistency and complementarity, the features of each type will contain specific information, and the use of multiple forms of features at the same time will improve the accuracy of epilepsy recognition.
In this paper, we apply metric learning to the recognition of epilepsy EEG signals. We make full use of various forms of EEG features and assign their different weights automatically. We try to find an appropriate distance measure for EEG data samples, so as to measure the similarity between EEG data samples more accurately, and finally achieve the purpose of improving the accuracy of epilepsy recognition. To achieve this goal, we propose multifeature metric learning based on enhanced equidistance embedding (MMLE 3 ) for EEG recognition of epilepsy. We learn from the techniques of the EquiDML algorithm [19] to maximize the distance between the samples constrained by the cannot-link, so that the samples belonging to different classes are transformed into equidistant. At the same time, the distance between the data constrained by the must-link is minimized, so that the samples belonging to the same class are compressed to one point. In the process of metric matrix learning, feature weight vectors are introduced, and various The bold values mean that they are the best classification results in the experiments.

Wireless Communications and Mobile Computing
features are adaptively weighted to effectively adjust the weight relationship between various features. Under the premise of the consistency of various feature classification tasks, the MMLE 3 algorithm can effectively mine the hidden and complementary information between the features and highlight the role of the optimal feature, and it has a stronger discriminative ability. We conduct experiments on the CHB-MIT dataset, and the experimental results validate the effectiveness of MMLE 3 .

Related Work
Metric learning uses a given pair of samples to calculate the similarity between pairs of feature vectors. Metric learning generally uses distance metrics. Taking the commonly used Mahalanobis distance as an example, the distance metric between the two samples z i and z j can be written as where Q is a positive semidefinite matrix. Q can be decomposed as Q = HH T , where the matrix H d×m ðm ≤ dÞ is metric matrix (or projection matrix). Therefore, Equation (1) can be expressed as Therefore, the essence of metric learning is to learn a mapping space. In classification tasks, the commonly used strategy for metric learning is to output a positive value close to zero for pairs of samples of the same class and output larger values for pairs of samples of different classes.
Given a labeled dataset Z = ½z 1 , ⋯, z n with dimensionality d and n number samples, the label matrix Y is composed of all class labels of X. The sets of must-link M and cannotlink C are defined as According to the classification principle of minimum intraclass distance and maximum interclass distance, a supervised metric learning framework can be represented as where δ 1 and δ 2 are thresholds for sets of must-link and cannot-link, respectively, and δ 1 < δ 2 .
In the EquiDML algorithm [19], the sample pairs in set M are gathered directly to a signal point. The distances of where μ is a positive value. The equidistance constraint indicates that the distance between classes must be greater than the distance within classes. In the metric space, the samples in the C set correspond to different classes of samples, and the distances of any different pairs will have the same constant value.

Multifeature Metric Learning Based on Enhanced Equidistance Embedding
Based on the EquiDML algorithm [19], the proposed MMLE 3 algorithm makes use of the correlation and difference between multiple forms of features and makes the proposed algorithm more distinguishable by learning the complementary information of different types of features. Then, the MMLE 3 algorithm can be represented by where M l and C l are the sample sets of must-link and cannot-link with the l-th feature expression, respectively. j M l j and jC l j are the size of M l and C l , respectively. λ is the trade-off parameter, and Δ is the feature weight vector, and The bold values mean that they are the best classification results in the experiments.

Wireless Communications and Mobile Computing
Δ l is the feature weight of the l-th sample features. It is worth emphasizing that (1) in order to reduce the convergence time, MMLE 3 uses the shifted squared loss ðz + 1Þ 2 − 1 for the set M l (2) Δ l is not a parameter that needs to be adjusted manually. It can be obtained in a closed-form solution 3.2. The Optimization Procedure. According to the Lagrange multiplier method, the Lagrangian function of Equation (7) can be represented as where Λ is the Lagrange parameter. There are two tune parameters Q and Δ in the MMLE 3 . We use the alternating updating strategy to obtain their optimal parameters. When Δ is fixed in Equation (8), the opti-mization of Q is equal to solve the following problem: Using the simplest projected gradient method, Q can be updated by where Q * h = ½Q h − ε h ∇LðQ h Þ + . ½⋅ + denotes projection on Ω = fQ | Q≻ = 0g. η h ∈ ð0, 1 and ε h are the step and regulation parameters, respectively.
At the h -th iteration, denote Q * h = ½Γ h + = arg min Using the positive semidefinite matrix approximation method [19], we can obtain Q * h as where P h P h When Q is fixed in Equation (8), the optimization of Δ is equal to solve ∂L/∂Δ l = 0. We can obtain Then, the solution of Δ l is Obviously, different from the manual parameter adjustment strategy, the weight parameter in the MMLE 3 algorithm is adaptive and it can converge to the extreme value at any initial value.
Based on the above analysis, we give the MMLE 3 algorithm as follows.

Dataset and Feature
Extraction. The used dataset called CHB-MIT is from Boston Children's Hospital [20]. The signal data is recorded by the international standard 10-20 Wireless Communications and Mobile Computing system, and the sampling frequency is 256 Hz. The example EEG signals and the used international standard 10-20 system are shown in Figures 1 and 2, respectively. The dataset includes the cortical EEG data of 23 patients with epilepsy. Among the 23 patients, 5 are males, aged between 3 and 22 years, and 17 females, aged 1.5-19 years. The data No. 21 is collected again by patient No. 1, one and a half years later. The gender of patient No. 24 is unknown. In our experiment, 21 out of 24 patients are selected, excluding Nos. 6, 12, and 16, since some channel data of these patients can not be read. We use two forms of EEG features in the experiment. The first form of features is the time domain features of EEG signals. We extract the correlation coefficient matrix and its eigenvalues of the original EEG signal and fuse them. The detailed time domain feature extraction process is shown in Figure 3. The "Date" is the original EEG signals. The "Sta" is the standardized matrix of EEG signals. The "CorrM" is the correlation coefficient matrix of "Sta," and the "Eigen" is the eigenvalue corresponding to the "CorrM" matrix. The "Corr" is the expansion of "CorrM." The "Feat1" is the experimental time domain feature by feature fusion of "Corr" and "Eigen." The second form of features is the frequency domain features of EEG signals. The detailed frequency domain feature extraction process is shown in Figure 4. The amplitude spectrum and phase spectrum in the frequency domain are two important features related to the time domain information. After being extracted the amplitude and phase features (called "AS" and "PS" in Figure 4), the correlation coefficients (called "CorrM1" and "CorrM2" in Figure 4) and eigenvalues of the spectrum (called "Eigen1" and "Eigen2" in Figure 4) are further extracted. The Feat2 is the experimental frequency domain feature by feature fusion of "Eigen1," "Eigen2," "Corr1," and "Corr2". The third form of features is the nonlinear features of EEG signals. In the experiment, the Shannon entropy, spectral entropy, and differential entropy of each delta (1-4 Hz), theta (4-7 Hz), alpha (7)(8)(9)(10)(11)(12)(13), and beta (13-30 Hz) band of EEG signals are calculated. Then, the nonlinear feature Feat3 is obtained by three entropies.
We compare MMLE 3 with seven algorithms. The comparison algorithms include LMNN [13], ITML [21], RDML-CCPVL [22], EquiDML [19], CMML [23], MV-TSK-FS [4], and MvCVM [24]. The slack variable in ITML is selected in {0.01, 0.1, 1, 10}. In CMML, the tradeoff parameter, learning rate, and parameter p are set to 1, 10 -6 , and 5, respectively. The number of fuzzy rules in MV-TSK-FS is selected in f5, 10, ⋯, 30g, and three regulation parameters are set in f10 −2 , 10 −1 , ⋯, 10 2 g. In MV-TSK-FS and MvCVM, the penalty parameter for each view is selected in {1, 10, 10 2 , 10 3 }, and the Gaussian kernel parameter is selected in {10 -2 , 10 -1 , …, 10 2 }. In MMLE 3 , the parameter λ is selected in {0.1, 0.2, …, 0.9}, and the parameter μis set to be 2. The KNN is used as the classifier in MMLE 3 . We use the grid search and 5-fold cross strategy to select the best variables. The running environment of all algorithms is CPU i7-8700k, 3.2GHZ, and 32GB RAM, and software is Matlab 2016. The evaluation index adopts the specificity, sensitivity, and classification accuracy rate. The experiment is executed 10 times. Figure 5. The first parameter is the balance parameter λ. The parameter λ is between [0,1] to balance the proportion of minimizing the distance term of the same class samples and maximizing the distance term of different class samples in the objective function. The accuracy of MMLE 3 with different λ is shown in Figure 5(a). When the balance parameter is 1, the MMLE 3 algorithm only optimizes the must-link constraint and ignores the cannot-link constraint, so its classification accuracy is low. When the balance parameter is close to 0, the objective function of MMLE 3 ignores the optimization of the must-link constraint, so the classification accuracy of MMLE 3 is also unsatisfactory. From Figure 5(a), when the balance parameter is between 0.4 and 0.6, the two optimization terms can be balanced, so that the EEG data samples in the metric space have the highest discriminative ability, and the classification accuracy of MMLE 3 is the highest.

MMLE 3 Performance Verification. The classification accuracy of MMLE 3 with different parameters is shown in
Second, we evaluate the dimension m in matrix Q. The dimension of each form of features is 200. The MMLE 3 algorithm obtains the Mahalanobis matrix by metric learning on the multiple forms of EEG features and projects the features into the projection space. The dimension m takes an important role in MMLE 3 . The classification accuracy of MMLE 3 with different m is shown in Figure 5(b). When m is very small, MMLE 3 will ignore most of the discriminative feature information, which leads to low classification accuracy. With the increase of m value, the discriminative feature information increases, and this improves the classification accuracy. When m increases to a certain value, all the discriminative feature information has been obtained, and the remaining small or ineffective feature information has little contribution to the EEG signals recognition. Therefore, the classification accuracy of MMLE 3 keeps stable.
Thirdly, we evaluate the KNN classifier parameter in MMLE 3 . The k parameter is selected in {3, 6, …, 30}. The accuracy of MMLE 3 with different k is shown in Figure 5(c). The value of k has little effect on classification accuracy. Regardless of the value of k, the fluctuation of classification accuracy is very small.

Algorithm Performance
Comparison. The proposed algorithm MMLE 3 is compared with several algorithms on 4CHB-MIT dataset. During the experiment, every algorithm runs 10 times and the specificity, sensitivity, and accuracy of all algorithms are recorded in Tables 1-3. CMML, MV-TSK-FS, MvCVM, and MMLE 3 can make use of various forms of EEG features. In the experiment, three forms of EEG features: time domain, frequency domain, and nonlinear features, are used. When analyzing the time domain characteristics, the mode of the eigenvalue of the correlation coefficient matrix of EEG signal will change before and after the seizure, which shows that the time domain correlation coefficient matrix and its eigenvalue can predict the seizure 9 Wireless Communications and Mobile Computing and termination of epilepsy to a certain extent. The amplitude and phase in the frequency domain are effective features, which can directly reflect the difference between the seizure period and seizure interval. Entropy can describe the uncertainty of information source and plays an important role in nonstationary EEG signals. MMLE 3 obtains the best classification performance and has the highest generalization ability, which is 2.43%, 2.52%, and 2.44% higher than baseline algorithm EquiDML in specificity, sensitivity, and accuracy. It can be seen that the comprehensive use of multifeature information can promote the accuracy of epilepsy recognition.
In addition, the MMLE 3 algorithm uses the constraint forms of must link and cannot link to project the samples into a low-dimensional space, in which the distance between the samples constrained by cannot link is maximized, and the samples of different classes are transformed into equidistant; meanwhile, the distance between the samples constrained by must link is minimized, and the samples of different classes are compressed to a point. We introduce the feature weight vector to adaptively weigh various features and effectively adjust the weight relationship between various features in the process of metric matrix learning. On the premise that all kinds of feature classification tasks are consistent, the MMLE 3 algorithm can effectively mine the association and complementary information hidden among features and highlight the role of optimal features. The MMLE 3 algorithm has stronger discrimination ability. Therefore, the results in Tables 1-3 indicate that various forms of EEG features can be treated differently in the MMLE 3 algorithm and the similarity between EEG data samples can be measured more accurately. The MMLE 3 algorithm shows superiority and effectiveness for EEG recognition of epilepsy.

Conclusion
In clinical research, EEG is a basic tool for diagnosing and studying brain diseases, especially in the field of epilepsy diagnosis. This study explores how to improve the classification accuracy of epileptic EEG based on various feature extraction methods and metric learning algorithm. We propose the MMLE 3 algorithm for EEG recognition of epilepsy. In the process of metric matrix learning, MMLE 3 uses various forms of EEG features to effectively adjust the weight relationship between various features. Experiments show that the classification performance of comprehensive utilization of multiple features is significantly better than single feature, and multifeature metric learning has better stability and generalization ability. In the future, we will embed the proposed algorithm into the deep network for new latent representations. We will apply the proposed algorithm to clinical diagnosis in the next stage. In addition, with the development of computer-aided technology, visualized operating systems are one of the development trends of future medical care. We will also try to design the MMLE 3 algorithm into a visual operating system to facilitate the application of clinical diagnosis.

Data Availability
Copies of the used data can be obtained free of charge from https://physionet.org/content/chbmit/1.0.0/.

Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this paper.