Automatic Sleep Staging using Multi-dimensional Feature Extraction and Multi-kernel Fuzzy Support Vector Machine

This paper employed the clinical Polysomnographic (PSG) data, mainly including all-night Electroencephalogram (EEG), Electrooculogram (EOG) and Electromyogram (EMG) signals of subjects, and adopted the American Academy of Sleep Medicine (AASM) clinical staging manual as standards to realize automatic sleep staging. Authors extracted eighteen different features of EEG, EOG and EMG in time domains and frequency domains to construct the vectors according to the existing literatures as well as clinical experience. By adopting sleep samples self-learning, the linear combination of weights and parameters of multiple kernels of the fuzzy support vector machine (FSVM) were learned and the multi-kernel FSVM (MK-FSVM) was constructed. The overall agreement between the experts’ scores and the results presented was 82.53%. Compared with previous results, the accuracy of N1 was improved to some extent while the accuracies of other stages were approximate, which well reflected the sleep structure. The staging algorithm proposed in this paper is transparent, and worth further investigation.


INTRODUCTION
As everyone knows, sleep is paramount for human life.Good sleep quality helps one's mental and physical health [1][2].Nowadays, all-night polysomnography (PSG) is used to monitor dynamic sleep process clinically.PSG is a systematic method which records the biophysiological signals that take place in the sleep process.The physiological signals monitored include electroencephalogram (EEG), electrooculogram (EOG), exist some defects.Firstly, the SVM treats all training points equally; hence, it results in a certain limitation.For example, both the noisy points and outliers have negative effects on the accurate classification.Therefore, we utilized the fuzzy support vector machine (FSVM) that Lin et al. [37][38][39] established as a basic method to rule out the non-support vector.In this way, the influence of non-important and noisy sleep staging samples on SVM learning can be reduced or even ignored, thus improving the accuracy of classification.In addition, in the process of solving non-linear classification and regression problems, the selection of kernel function is very important.Traditional SVM and FSVM are based on a single kernel.However, for sleep staging, it is difficult to find a suitable kernel function to classify the samples accurately because of individual differences.Previous studies usually selected kernel functions by experience and this is unreasonable for automatic sleep staging.
This paper introduces multiple-kernel to FSVM and proposed a multi-kernel FSVM (MK-FSVM) algorithm.The fuzzy kernel weights and parameters in the algorithm decision tree were mainly determined by unsupervised sample self-learning.An accurate algorithm was also established via applying multiple-kernels to solve the fuzzy characteristics of sleep data.
Among reported studies [8,[21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36], classification accuracies were not balanced in Wake, N1, N2, SWS and REM.Furthermore, Stage N1 presented the lowest accuracy.Such results were not conducive to the analysis of sleep structure.In this paper, we use eighteen-dimensional AASM clinical characteristic features and MK-FSVM classifier to realize automatic sleep staging.Weights and parameters of the kernel functions were obtained by sample self-learning.The classification algorithm can reflect the complexities of the samples, so that it can reflect sleep structure accurately.MK-FSVM is a transparent classification algorithm.Each parameter in MK-FSVM can be obtained by analyzing clinical data.Moreover, specific kernel function can be constructed flexibly for different requirements.

Clinical Data Acquisition
In this study, all-night PSG sleep recordings were obtained from twenty healthy subjects (16 males and 4 females) ranging from 31 to 65 years old (mean = 42.2±8.1 years).These measurements were approved by the ethics committee of the 6th affiliated hospital of Sun Yat-Sen University.The subjects were interviewed about their sleep quality, medical history and all subjects reported no history of neurological or psychological disorders.The all-night PSGs were recorded in the Sleep-Disordered Breathing Center of the 6 th affiliated hospital of Sun Yat-Sen University.There was also no outside interference during data collection, and no medications were utilized to cause sleep.Figure 1 shows the details of data collection and experimental equipment.
The standard of visual sleep staging is based on the R&K rules proposed by Rechtschaffen and Kales in 1968 [3].According to the R&K rules, each epoch (i.e., 30 s) of data are divided into the sleep stages, including wakefulness (Wake), non-rapid eye movement (stages 1-4, from light to deep sleep) and rapid eye movement (REM).
Instead of the R&K rules, the scoring rules developed by the AASM have become the clinical standards in recent years [4][5].Figure 2 presents typical PSG recordings we collected which show various sleep stages, respectively.
The recordings include six EEG channels (F3-M2, F4-M1, C3-M2, C4-M1, O3-M2, and O2-M1), two EOG channels (positioned 1 cm lateral to the left and right outer canthi), and a chin EMG channel (Alice 5 PSG, Philips, Inc.).Eighteen PSG sleep recordings were visually scored by two independent sleep specialists using the AASM rules with a 30-s interval (epoch).If their results were different on some stages, especially on the stages with body movements, they discussed based on the ASSM 508 Automatic Sleep Staging using Multi-dimensional Feature Extraction and Multi-kernel Fuzzy Support Vector Machine rules [4][5] to reach the final results.In the experiment, EEG F3-M2, left EOG and EMG channels were chosen for data processing.Figure 3 is the flowchart of sleep recognition.

Data Processing and Feature Extraction
Considering the comprehensive review of literature [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20], our features were different in some details from the 1968 R&K rules according to the AASM clinical interpretation rules.We put forward eighteen characteristic features to construct vectors, as listed in Table 1.The feature vector was then normalized by eqn. 1 [40]: Journal of Healthcare Engineering • Vol.(1) where, x and y represent each element of feature vector X and normalized vector Y, respectively.y max and y min denote the maximum and minimum values of normalized interval.
The sampling rate was 500 samples per second.To reduce the computational complexity, the signal's sampling rate was decreased to 250 samples per second.Next, the EEG, EOG and EMG data were filtered by a digital filter with a cutoff frequency of 0.5-30 Hz.The continuous time signals were divided into 30-s epochs.
Before the extraction of the spectral features, the signals were segmented into nonoverlapping intervals of 2 seconds for a 500-point fast Fourier transformation (FFT) calculation.The spectrums in accordance with the 15 2-s segments were averaged to represent the spectrum for a 30-s epoch [12][13][14][15][16][17][18].After the FFT calculation, the data were processed using the following methods.

Power spectrum
Power spectrum (PS) was summed among the band 0.5-30 Hz for EEG, EOG and EMG, and this was considered the total power.It was obtained using the following equation: ( In this formula, PS( f ) is the power of the frequency f.

Power ratio
The ratio of each band to the total power of 0.5-30 Hz was calculated as the power ratio (PR), and it was also viewed as a feature.PR was given through the following equation: (3)

Spectral frequency (SF)
The mean frequency of spectral power (SF) was calculated and the SF was obtained by the following equation: (4)

Duration ratio (DR)
Every 30 s signal was segmented into 15 non-overlapping intervals of 2 seconds.Alpha ratio, Spindle ratio and SWS ratio are ratios between the numbers of corresponding wave segments and the total number of 15 segments.

Alpha ratio
Both alpha band and beta band account for higher energy in the waveform in Wake stage.Therefore, the interpretation of alpha ratio can be expanded.We filtered the alpha band (8-13 Hz) and beta band (22)(23)(24)(25)(26)(27)(28)(29)(30), and added the filtered signals.Then the ratio of the combined signal amplitude to the original signal amplitude was detected.If the ratio was equal to or greater than 0.5, then this segment was interpreted as the alpha band.Spindle ratio Two values were calculated: one was the ratio of the power of δ band (11-16 Hz) to the original signal power, and the other was the ratio of the amplitude to the original signal.If the two ratios were equal to or greater than 0.5, then it was judged that this segment was the spindle band.

SWS ratio
The σ band (0.5-2 Hz) was first filtered.If the ratio of the filtered signal amplitude to the original signal amplitude was equal to or greater than 0.2, then this segment was interpreted as the SWS band.

EMG energy and Times of EEG amplitude peaks >75uv
In all sleep stages of Wake, N1, N2, SWS and REM, the EMG changes.We extracted features in terms of time domain and frequency domain.Then the PS sum of EEG (0.5-30 Hz) as well as the mean absolute value of time amplitude domain were calculated, respectively.
We divided per 30-s data into 15 data segments of 2-s window size.If the absolute amplitude of the positive and negative peak value was over 75 uv in this window size, the counter was incremented by one.

MK-FSVM Sleep Staging
The MK-FSVM sleep staging strategy was proposed in this paper.The method was used for generating a fuzzy degrees of membership (DOM) matrix for sleep signal features fuzzification, training weights, as well as parameters of multiple kernels combination.In addition, the method for modeling the MK-FSVM for sleep scoring was also introduced.Table 2

Combination of MK-FSVM and Sleep Data
Let X be a non-empty set: ( It is a fuzzy set; l denotes the number of features; u F (x i ) is the DOM of the i-th sample of x that belongs to fuzzy set F, and the value range of u F (x i ) is in [0,1] [41].
In FSVM, the first step of preprocessing sleep data is choosing an appropriate DOM function u F (x i ) of x i in every epoch of 30-s sleep data sample, and the DOM function u F (x i ) is obtained by means of removing the class standard and the unsupervised selflearning.Then the new fuzzy training set {x i , y i , u F (x i )}, i = 1, 2, ..., l is obtained.In each training sample, x i ⑀ R d ,y i ⑀ {1, 2, 3, 4, 5}, in which 1, 2, 3, 4, 5 represents Wake, N1, N2, SWS, REM sleep stages of AASM sleep staging, respectively.Here u F (x i ) is the fuzzy DOM of the output y i ⑀ {1, 2, 3, 4, 5} for the training set to a certain extent.ε i is a measure of the degrees of the right or wrong dividing.Therefore, u F (x i ).ε i (i = 1, 2, ..., l) is used to measure the degrees of importance of different variables for misclassification, thereby the optimal structure of the objective function of the optimal separating hyperplane is obtained: where the penalty factor C is a constant; ω is the weight coefficient of linear classification function; ε = (ε 1 , ε 2 , ..., ε l ) T , ϕ(x i ) is mapping x i from R d into a highdimensional feature space.The corresponding discrimination function formula of the optimal separating hyperplane is as follows: where K(x i , x) is a kernel function to convert the inner product operation of highdimensional feature space to a simple function calculation of low-dimensional model.In fuzzy DOM function, fuzzy factor u F (x i ) is the key of the performance of FSVM.When the value of u F (x i ) is small, it will reduce the impact of ε i in the formula so that the corresponding sample x i can be regarded as being unimportant.Thus, the impact of the outliers or noise samples on the training of SVM can be reduced by decreasing the value of u F (x i ).
512 Automatic Sleep Staging using Multi-dimensional Feature Extraction and Multi-kernel Fuzzy Support Vector Machine

Fuzzy Membership Function in the Algorithm of Sleep Staging
The fuzzy membership functions are determined by using an unsupervised and selflearning method [42] of which the u i matrix is the key to our algorithm.Fuzzy Clustering Method (FCM) gives the number of clusters C = 5 and a set of data X which includes N l-dimensional vectors denoted as x i .The FCM algorithm outputs the DOM u ic , the probability that data x i falls into the scope of the c-th cluster.Therefore, we have to minimize the following objective function: (8) where m is the fuzzification degree which should be larger than 1; d(.,.) is the Euclidean distance; v c is the center of the c-th cluster; U = [u ic ] i=1..N, c=1..C is an N × C membership matrix whose elements are the DOM; and V = [v 1 , v 2 ,..., v c ] is a l × C matrix whose columns correspond to cluster centers.In the FCM algorithm, the previous constrained optimization problem is solved by using Lagrange multipliers [42]: (9) The problem is solved by iteratively updating DOM with fixed centers and updating centers with fixed DOM.The closed-form formulas for updates are derived by taking the partial derivatives with respect to both and setting them to zero [42]: What should be noted is that although we add no Lagrange multiplier to the nonnegative constraints in eqn.(8), it is certain that the above-mentioned formula implicitly satisfies constraints such as u ic ≥ 0, ∀i, c.What is more, when m is close to 1, the FCM algorithm degenerates to the k-means algorithm.By unsupervised learning, u ic is obtained.

Application of MK-FSVM Algorithm
Most of the kernel functions only have one free parameter to control the generalization of performance.For instance, in a radial basis kernel function, only the width parameter is used in control while a number of different parameters are not involved at all.
In sleep staging, the decision tree and the algorithm were adjusted as follows by combining multiple kernel functions: (12) where u i is the weight of multiple kernel functions; K j (x i , x) denotes the j-th kernel function.In this paper, j is 1, 2, 3, 4, corresponding to the common kernel functions of ЈlinearЈ, ЈpolyЈ, ЈrbfЈ, ЈerbfЈ, respectively [41].Theorem 1: If K is a kernel function, K ^is called multi-kernel fuzzy kernel function.K ^(x i , x) = q(K j (x i , x), u) in which (13) Lemma 1 Non-negative linear combination of Mercer kernel is still Mercer kernel.Theorem 2: If K is a Mercer kernel function, then the combined function of Fuzzy multiple kernel K ^(x i , x) = q(K j (x i , x), u) is still a Mercer kernel.
It is known from Theorem 2 that a polynomial combination of the Mercer kernel is still a Mercer kernel.Therefore, based on Theorem 2, we can use the existing ЈlinearЈ, ЈpolyЈ, ЈrbfЈ, ЈerbfЈ kernel functions to construct a fuzzy multi-kernel function.This kind of kernel can pose both translational and rotational invariance, and can be applied to sleep data set for training and learning [41].Thus, MK-FSVM sleep staging procedure is as follows: • Step 1: normalize the data in the characteristics matrix.• Step 2: establish fuzzy data set of sleep data using eqn.(5).
• Step 5: apply the sleep data on MK-FSVM for training and testing by utilizing the decision tree in eqn.(12).

EXPERIMENTAL RESULTS
After EEG, EOG and EMG signals of the eighteen subjects were recorded, the data were then employed in the sleep staging experiment.The experimental results are described and discussed in Section 4.
We used ten subjects' data for training and testing, respectively.Then, weight and parameter values of four kernel functions, that is, ЈlinearЈ, ЈpolyЈ, ЈrbfЈ, ЈerbfЈ [41] in the MK-FSVM algorithm, are obtained as in Table 3.The experimental results and the kappa coefficients [43] are presented in Table 4.

514
Automatic Sleep Staging using Multi-dimensional Feature Extraction and Multi-kernel Fuzzy Support Vector Machine In this paper, EEG, EOG and EMG signals were utilized to generate the eighteen features.A total of 17192 epochs of sleep data were recorded and then used to train and test ten subjects, respectively.Table 4 shows the subject-by-subject agreement percentages and Cohen's kappa coefficients of the manual scoring versus automatic scoring.The overall agreement of each subject ranged from 75.1% to 90.36%, and the average sensitivity was 82.53% (SD = 5.43).These results demonstrate that the proposed sleep scoring method can achieve a stable performance.The average kappa value was k = 0.7 (SD = 0.05), and the individual kappa ranged from 0.63 to 0.78 for the PSG signals of nine subjects.The 2-fold cross validation was then performed during our experiment.The data set was divided into two subsets (each containing ten subjects) among which one subset was used as the training set and the other as a testing set.Such evaluation process was repeated five times with random shuffling of the training-testing datasets.Finally, the average overall agreement was 81.12% (SD = 6.72%).The results revealed a substantial agreement between our method and the scoring of the sleep specialists.

DISCUSSION
The recognition rate in this paper is compared with the previous research on sleep staging [8,[21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36], and the result is presented in Figure 4.The classification method was used in some research to study the six sleep stages (i.e., Wake, N1, N2, N3, N4 and REM).To compare these results, the recognition rates of N3 and N4 in these studies were combined as the SWS stage [24], as stated above.Results in the second row (b) in the table below the figure were from Park et al. [8] who took advantage of both rulebased methods and numerical classification methods.The third row (c) shows the results obtained from [23] in which the back-propagation neural network (BPNN) was utilized in automatic sleep staging with good results.The fourth row (d) shows a multilayer neural network that employed EEG, EMG and EOG [24].The results in the fifth row (e), in which the HMM method was used, are reported by L.G. Doroshenkov [30].
The last row (f) shows the result of simple SVM by using our data set.The average accuracies of our method are as follows: Wake, 82  extraction in this paper is in accordance with clinical AASM criteria interpretation adopted in previous studies [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20].According to the clinical standard, the frequently used band in R&K rules was adjusted and refined to be in accordance with the interpretation of clinical experts.Based on clinical interpretation analysis, some features were proposed, such as times of positive and negative peaks > 75uv, judgment of 2~6 HZ brain waves in REM, etc. [4][5].All of these provide a solid foundation for accurate training of MK-FSVM classifier.
In this paper, FSVM algorithm and multiple kernel functions are introduced in automatic sleep staging algorithms, and weights and parameters of the linear combination for multiple kernels are obtained through the training of data set.The existing classification methods [8,23,24,30] were deterministic for the interpretation at a certain stage.But through the clinical observation, we found that at a certain stage, the PSG signal consists of plenty of types of wave characteristics which can cause misinterpretation, especially in the N1 stage.MK-FSVM in this paper can solve this blurring problem to a certain extent by clarifying the ambiguity of the sleep data.Compared with previous methods, MK-FSVM not only achieved acceptable accuracy of the N1 stage but also maintained the accuracy of other stages.Furthermore, results calculated by MK-FSVM are accurate enough to reflect the real sleep structure.
Correct interpretation of N1 affects the determination of sleep cycle ratio directly.N1 is more easily misinterpreted than any other stage, and the number of N1 epochs is significantly lower than the other stages.Although some of the previous studies have demonstrated that the accuracy in some stages can be high, the max sensitivity of N1 was only approximately 24% [8,23,30] except when using the ANN strategy [24].A previous study [24] used ANN method, and demonstrated a relatively high accuracy of 72.6 ± 1.67% for N1.But the test sample number was only 265 epochs.Accuracy rate of N1 period by MK-FSVM reached 43.28%.
In summary, our staging algorithm constructed multi-kernel FSVM by combining weights and parameters of the multiple kernels after sleep samples self-learning.It can also be upgraded by recombining the kernel functions.More importantly, our staging algorithm is transparent since it doesn't include any unknown functions.As a limitation of the present study, our data set for experiment only contains 20 PSGs.More datasets are needed to further verify if MK-FSVM designed in this work can accurately reflect the sleep structure in different groups.Moreover, the data collection time for different groups of subjects will be extended.We also plan to explore some other proper features to increase the staging accuracy.

CONCLUSION
In this work, eighteen characteristic features were extracted to meet AASM staging requirements.MK-FSVM, a transparent algorithm, was applied to solve fuzzy characteristics of the sleep data.The average recognition rates of our method were the following: Wake, 82.01%; N1, 43.28%; N2, 89.86%; SWS, 90.76%; and REM, 87.61%.The average sensitivity was 82.53% (SD = 5.43).The average kappa value was k = 0.7 (SD = 0.05), which shows a high agreement with the k value obtained by an expert.This result can reflect sleep structure defined with previous research.
using Multi-dimensional Feature Extraction and Multi-kernel Fuzzy Support Vector Machine

Figure 4 .
Figure 4.Comparison of the present recognition results for every sleep stage with previous studies.

Table 2 . Summary of the specs of the MK-FSVM in the experiment Items Specs of the MK-FSVM
lists the details of the experiments.