Atrial Fibrillation Detection with Low Signal-to-Noise Ratio Data Using Artificial Features and Abstract Features

Detecting atrial fibrillation (AF) of short single-lead electrocardiogram (ECG) with low signal-to-noise ratio (SNR) is a key of the wearable heart monitoring system. This study proposed an AF detection method based on feature fusion to identify AF rhythm (A) from other three categories of ECG recordings, that is, normal rhythm (N), other rhythm (O), and noisy (∼) ECG recordings. So, the four categories, that is, N, A, O, and ∼ were identified from the database provided by PhysioNet/CinC Challenge 2017. The proposed method first unified the 9 to 60 seconds unbalanced ECG recordings into 30 s segments by copying, cutting, and symmetry. Then, 24 artificial features including waveform features, interval features, frequency-domain features, and nonlinear feature were extracted relying on prior knowledge. Meanwhile, a 13-layer one-dimensional convolutional neural network (1-D CNN) was constructed to yield 38 abstract features. Finally, 24 artificial features and 38 abstract features were fused to yield the feature matrix. Random forest was employed to classify the ECG recordings. In this study, the mean accuracy (Acc) of the four categories reached 0.857. The F1 of N, A, and O reached 0.837. The results exhibited the proposed method had relatively satisfactory performance for identifying AF from short single-lead ECG recordings with low SNR.


Introduction
Atrial fbrillation (AF) is a disordered and rapid atrial electrical activity characterized by supraventricular tachyarrhythmia. Its incidence increases with age, and millions of people are afected by AF every year [1]. In practice, realtime monitoring of cardiovascular disease is essential for early warning of AF. At present, wearable electrocardiogram (ECG) monitoring is the mainstream real-time monitoring system [2], which can help patients get rid of discomfort and time and place restrictions in the process of long-term health monitoring. However, the ECG recordings collected by wearable devices or mobile phones are easily contaminated by the complex external environment so that their signal-tonoise ratio (SNR) is low. Actually, many recordings with low SNR cannot be used for diagnosis because of their poor quality. Tus, the ECG recordings with low SNR also should be identifed to avoid wasting clinical resources.
Traditional machine learning algorithms based on statistics were extensively used for data analysis [3][4][5][6]. Most of the current studies on AF automatic analysis do not focus on recognizing the noisy ECG recordings with low SNR. Krasteva et al. [3] used the limited feature set and combined with the optimized artifcial neural network to conduct fourclassifcation research on the CinC 2017 database. Goodfellow et al. [4] extracted three types of features, that is, template features, RRI features, and full waveform features using step-by-step machine and classifed the CinC 2017 database into four categories. In general, previous studies can be divided into machine learning methods based on prior knowledge extracting artifcial features and deep learning methods based on neural networks. Bin et al. [5] extracted 30 features including AF features, morphological features, and RR interval features from ECG recordings and trained a decision tree model using AdaBoost.M2 algorithm to realize AF detection. Datta et al. [6] extracted several Actually, with the adoption of wearable devices and mobile phones, the ECG recordings collected using the devices are easy to be contaminated by noise so that the recordings cannot be used for clinical purpose because of their poor quality. So, the noisy ECG recordings should be recognized before diagnosing. Tus, it is necessary to distinguish the acceptable ECG recordings and the noisy ECG recordings from a large lot of ECG recordings with low SNR. In previous studies, entropy helped identify the inherent nonlinear property within the ECG recordings and randomness [18]. Zhang et al. [19] calculated a multiscale entropy of the ECG recordings for signal quality assessment and further studied the sensitivity of multiscale entropy on the ECG recordings with noise. Pham et al. [7] extracted a large number of entropy features to train classifers. Fu et al. [20] extracted different entropy features, that is, approximate entropy, sample entropy, and fuzzy entropy to feed into machine learning, that is., support vector machine (SVM), leastsquares SVM (LS-SVM), and long short-term memory (LSTM) for assessing the quality of the ECG recordings. Zhang et al. [21] proposed a permutation ratio entropy (PRE) based on permutation entropy to identify random components and inherent irregularities within time series. Te studies exhibited a satisfying performance of entropy methods for identifying random components and inherent irregularities within the recordings. Tus, this study used the entropy feature, namely, PRE, to identify the noisy ECG recordings and other ECG recordings.
So, a novel method was proposed in this study, which used feature fusion including artifcial features and abstract features to extract comprehensive information within the ECG recordings, and the entropy feature was also employed to improve classifcation performance of the method for noisy ECG recordings. In this study, Section 2 introduces materials and methods, including data preparation, feature extraction, and network architecture. Section 3 shows the results of this research. Section 4 discusses the efectiveness of this proposed method. Section 5 summarizes this work.

Database.
Te publicly available database provided by PhysioNet/CinC Challenge 2017 (CinC 2017) was used in this study, and it contains four categories of ECG recordings, that is, normal rhythm (N), AF rhythm (A), other rhythm (O), and noisy (∼) ECG recordings. Tis database consisted of 8528 single-lead ECG recordings ranging in length from 9 s to over 60 s and the ECG recordings sampled at 300 Hz [22]. All recordings were identifed by the clinical experts and technicians. Among them, 5076 ECG recordings were marked as N, 758 ECG recordings were marked as A, 2415 ECG recordings were marked as O, and 279 ECG recordings were marked as ∼. Tese ECG waveforms are shown in Figure 1.
Tis study used a data-balanced method based on the imbalance of ECG recordings length, and the method effectively retained the critical information of the ECG recordings [23]. A QRS complex location algorithm was used to locate the complex position and made the recording length consistent by copying, cutting, and symmetry. In this study, all recordings were segmented or flled to 30 s. Among them, the ECG recordings with lengths greater than 30 s were randomly segmented. Te recordings with lengths less than 30 s were frst located to the QRS complex using the Pan-Tompkins algorithm, then the initial downward defection in the QRS complex was determined as the starting point of the complex, and fnally the recording from the starting point of the frst QRS complex to the starting point of the last QRS complex was intercepted and copied until the recording length was 30 s. After unifying the length of all segments, nearly 80% of the segments were used as training set and the remaining 20% as the test set. Te performance of the proposed classifcation method was evaluated using the remaining segments. Table 1 shows the details of the CinC 2017 database used in this study.

Outline of the Proposed Method.
In this study, the ECG recordings were frst unifed to the length of 30 s. Ten, 62 features were calculated, including 24 artifcial features, that is, 8 waveform features, 11 interval features, 4 frequencydomain features, and 1 nonlinear feature and 38 abstract features extracted by a 13-layer 1-D CNN. Te abstract and artifcial features constituted a feature vector for yielding the fused feature matrix. Finally, a random forest [24] containing 300 decision trees was employed to classify the AF segments. Figure 2 shows the fowchart of the proposed method.

Artifcial Features.
In the feld of machine learning, the use of artifcial features is essential. Based on a large number of previous studies, this study used four types of features, that is, waveform features, interval features, frequency-domain features, and nonlinear feature without discarding prior knowledge, and 24 specifc features were calculated [4][5][6][7][8]. Table 2 shows the artifcial features used in this study.

Waveform Features.
In most cases, the number and amplitude of R waves within the four categories of ECG segments are signifcantly diferent, so the features based on the number and amplitude of R waves were frst calculated. Te Pan-Tompkins algorithm [25] was used to locate the R waves of all ECG segments. Ten, the number of R waves and amplitude of all R waves were obtained by the location of R waves. Finally, the number of R waves was taken as one of the features, and the basic amplitude features, that is, maximum, minimum, mean, and median of R wave, in each segment were calculated according to the amplitude of all R waves. Suppose that there are N pieces of R waves in the time series. Te r represents the amplitude of R wave. Terefore, the amplitude of all R waves is defned as [r 1 , r 2 , r 3 . . .. . .r N ], so the maximum value of the amplitude is [r 1 , r 2 , r 3 . . .. . .r N ] max , the minimum value is [r 1 , r 2 , r 3 . . .. . .r N ] min , and the mean value is [r 1 , r 2 , r 3 . . .. . .r N ] median .
In the analysis of time series, many time series exhibit irregular distribution. Still, the distribution of the mean of the series shows a certain regularity, which requires that we must have an indicator to measure the relationship between each point in the series and the mean. So, the standard deviation was used to distinguish the pseudo law of distribution in this study. Another waveform feature, namely, the feature based on standard deviation, was also calculated in this study. Suppose the time series with N points is defned as [X 1 , X 2 , X 3 . . .. . .X N ], and their mean value is ‾ X. Te standard deviation (S) is calculated as the following: Figure 1: Examples of four categories of ECG recordings.

Journal of Healthcare Engineering
where i takes a non-negative integer and starts from 1 until N. According to the defnition of S, the amplitude standard deviation is also calculated as one of the waveform features. Based on the standard deviation, the skewness (SK) and kurtosis (KU) of the segments were calculated. SK represents the characteristic number of the asymmetry degree of the probability density distribution curve relative to the average value, and KU represents the characteristic number of the peak height of the probability density distribution curve at the average value. SK is calculated as the following: KU is calculated as the following: (3) To sum up, 8 waveform features were extracted from the ECG segments.

Interval
Features. RR interval refers to the duration between two adjacent R waves in ECG, and it can refect the duration of one heart contraction. Tese features of RR interval can refect whether a person's heart rate is normal, so heart rate can be calculated by the RR interval [26]. Te heart rate of patients with AF or other abnormal hearts may be irregular, and the RR interval may be too large, too small, or unstable. Terefore, the relevant features of RR interval, that is, maximum, minimum, mean, median, and standard deviation of RR interval were calculated, and the heart rate was also obtained from the RR interval as a feature.
Heart rate (HR) is calculated as the following: PR interval refers to the time interval from the starting point of the P wave to the starting point of the QRS complex on ECG. Some studies have used and proved the efectiveness of PR interval for ECG classifcation [3,27,28]. To get the PR interval, the P wave of the ECG recording should be located. P wave is easy to detect in regular ECG recordings, but it is difcult to detect in noise environment because the change is not obvious. Terefore, we used the P-wave detection method based on wavelet transform proposed by Li et al [29]. Te PR interval was then calculated. Too long, too short, or variable PR interval represents diferent conditions of patients. Considering that there may be diferent situations for separating other classes in this database to locate these situations to the greatest extent, the relevant features of PR interval, that is, maximum, minimum, mean, median, and standard deviation of PR interval were extracted in this study. Te calculation methods of relevant features of PR interval are the same as that of RR interval.
Finally, 6 features of RR interval and 5 features of PR interval were extracted from the ECG segments.

Frequency-Domain Features.
In most of machine learning methods, frequency-domain features are usually used to refect frequency and energy information within the ECG recordings. In medical diagnosis or other application scenarios, it can be used as a part of the feature vector together with time-domain features and other features to enrich the types of feature quantities and improve the diagnostic accuracy [30]. In this study, Fourier transform, a simple spectrum analysis method, was selected to obtain the spectrum of the ECG segments and the four frequency-domain features, that is, frequency center of gravity, mean-square frequency, root mean square frequency, and frequency variance were received and applied to this study.
Assuming the frequency function is S (f ), and S represents the spectrum and f represents the frequency of the segment. Te frequency center of gravity (FC) is calculated as follows: (5) Te mean-square frequency (MSF) is calculated as follows: Te root mean square frequency (RMSF) is calculated as follows: Te frequency variance (FV) is calculated as follows: Finally, 4 features of frequency domain were extracted from the ECG segments.  [21]. Tis PRE can refect the amplitude diference between two adjacent data points of a certain time series.
Because it is sensitive to recording mutation and various changes, the classical permutation entropy is often used to measure the complexity of physiological recording sequence. However, the original time series cannot be measured by permutation entropy, so some details will be lost. Furthermore, permutation entropy is based on the ranking between data points, which also shows that permutation entropy ignores the diferences between adjacent data points. Comparing with the classical permutation entropy, the PRE can refect the relationship between adjacent data points by constructing the relationship matrix of adjacent elements and better refecting the confusion degree of time series. First, PRE constructs a new relationship matrix B to represent the relationship between adjacent elements and then calculates the number of new patterns c. Let B (i) be the ith row vector of matrix B, and c (i) be the number of the ith pattern. For B (i), when another vector B (j) of matrix B has the same mode as B (i), c (i) increases by 1, and the two have a high correlation; when each vector of matrix B represents a new mode, the maximum total number of mode c is n − m − 1. Finally, the total number of mode c contained in matrix B can be obtained. P i is the probability of pattern c (i), which is defned as the following: where k is the total number of patterns c, 1 ≤ k ≤ n − m − 1. PRE is defned as the following:

1-D CNN and Abstract
Features. Actually, a deeper network helps to extract deeper features within ECG segments; however, the most severe problem of deeper network was to use too many parameters, which would lead to a large amount of memory and computing resources for training and interference [31]. So, a 1-D CNN was directly used to extract abstract features in this study which was constructed from six pairs of convolutional layers and a maximum pooling layer in our proposed feature extraction network. Larger convolution kernel size had been used on the frst layer of convolution layers, and the convolution kernel size rose stepwise as the number of layers increased. Table 3 shows architecture of the 13-layer 1-D CNN and its detailed parameters. When an ECG segment was fed into the network, the segment passed through 6 pairs of convolution pooling layers. In order to obtain the abstract features, the fnal full connection layer changed the dimension of the output to get a 1 × 38 vector which meant 38 abstract features.

Fusion of Artifcial and Abstract Features.
Artifcial features and abstract features were fused, and a feature vector of length 62 was constructed. Te vector was denoted as [R 1 , R 2 , R 3 . . .R 24 , S 1 , S 2 , S 3 · · ·S 38 ] T . Te R i represents the ith artifcial features, and i � 1, 2, . . ., 24. Te S j represents the jth abstract features, and j � 1, 2, . . ., 38. So, the feature matrix is defned as the following: where N represents the number of input segments.

Random Forest.
In CinC 2017, Zabihi et al. [32] and Kropf et al. [33] used random forest to train the extracted features to obtain classifcation results because random forest is interpretable explain [34]. So, random forest was employed in this study. Random forest is inherited together by several decision trees. Each decision tree is a small classifer, and random forests synthesize all classifcation voting results to determine the fnal output categories.
In this study, the classifcation of random forest included training and testing, and the bootstrap method was used to train the random forest. In the training process, 80% of the feature vectors were used as the training set, and a group of decision trees was trained according to the tags marked in the ECG recordings. Te remaining 20% was used for testing. Te training process sets the maximum number of decision trees as 300, where each node randomly selected features in the generation process. Assuming that the number of the samples was n, the number of features in the randomly selected feature subset by the decision tree node at each segmentation was set as default, that is, the square root of the total number of features, that is, � n √ . Te minimum number of samples required for internal node division was set as 2, the maximum depth of the decision tree was set as 40, and the training ended when the maximum depth was reached. Te above parameters were set to prevent overftting. Finally, the classifcation category was determined by averaging the classifcation voting results of all decision trees.

Evaluation Indicators.
In this study, accuracy (Acc), precision, recall, and F 1 were used to evaluate performance of the proposed method.
Te Acc is calculated as the following: where true positive (TP) represents the number of ECG recordings in a given category that are correctly classifed as the given category, false positive (FP) represents the number of ECG recordings that other categories are misclassifed as the given category, true negative (TN) represents the number of ECG recordings that other categories are not classifed as the given category but are classifed as the correct category, and false negative (FN) represents the number of ECG recordings that other categories are not classifed as the given category and are not classifed as the correct category. Te precision is calculated as the following: Te recall is calculated as the following: Like CinC 2017, the F 1n , F 1a , F 1o , and F 1p are defned as the F 1 score of the N, A, O, and ∼ categories, respectively, and they are calculated as the following [22]: Where Nn, Aa, Oo, and Pp represent the number of predicted classifcations obtained by the proposed method that are consistent with the actual reference classifcations of ECG recordings. N represents the number of recordings whose reference classifcation is N and n represents the number of recordings whose predicted classifcation is N, A represents the number of recordings whose reference classifcation is A and a represents the number of recordings whose predicted classifcation is A, O represents the number of recordings whose reference classifcation is O and o represents the number of recordings whose predicted classifcation is O, and P represents the number of recordings whose reference classifcation is ∼ and p represents the number of recordings whose predicted classifcation is ∼. Table 4 clearly showed the counting rules of the above variables. Te total of F 1 is defned according to the rules of the CinC 2017 and it is obtained by taking the macro average of the three scores, and it is defned as the following: 3.2. Results. In this study, 80% of the ECG segments were used as training set, and the rest 20% were used as test set for evaluating the proposed method. For the training set, we used 10-fold cross-validation which randomly selected 90% of the data for training and 10% for validation. Te results are shown in Table 5. Te corresponding recall, precision, and F 1 of the N category achieved the highest 0.896, 0.910, and 0.913 than that of other three categories, that is, A, O, and ∼. In addition, the average of indicators of four categories, that is, recall, precision, and F 1 , is higher than 0.800, at 0.816, 0.813, and 0.809, respectively. Table 6 shows a confusion matrix of the proposed method for the test set and the corresponding recall, precision, F 1n , F 1a , F 1o , F 1p , Acc, and F 1 . Te N category yields the highest recall of 0.893, precision of 0.901, and F 1n of 0.901 than other categories, that is, A, O, and ∼. Te ∼ category yields the lowest recall of 0.761, precision of 0.711, and F 1p of 0.735 among all categories. In addition, the F 1 and the Acc reached 0.837 and 0.857, respectively.  Table 7 collected the results of some previous studies and compared them with the results of the proposed method. Te proposed method achieved the highest Acc of 0.857, F 1p of 0.735 than all studies and the highest F 1 of 0.837 than all studies except the F 1 0.841 of Wang et al. [35]. Actually, Wang et al. ignored the ∼ category of ECG recordings and used only three categories of ECG recordings of the CinC 2017, that is, N, A, and O for classifcation. Zihlmann et al. [39] combined LSTM and CNN to extract abstract features, and the total F 1 score reached 0.820. Te classifcation results of ∼ category in the training process were low, and the F 1p was only 0.645.

Evaluating Efectiveness of PRE for Noisy Recording
Recognition. Two feature schemes, that is, all features and all features except the PRE were compared to evaluate the efectiveness of the PRE for recognizing noisy ECG segments. Table 8 shows the comparison results for the two feature schemes using this proposed method. Te Acc of 0.857, F 1 of 0.837, and F 1p of 0.735 for all features are higher than for all features except the PRE. Te results indicate the PRE helps to classify the noisy ECG segments because the F 1p of 0.735 for all features is higher than the F 1p of 0.679 for all features without PRE. Meanwhile, a radar chart was also designed to show more clearly the diferences between results of the two schemes. Figure 3 shows a radar map of results for the two feature schemes.
Te F 1p for all features is obviously higher than that for all features except the PRE.
PRE was an improvement based on permutation entropy for identifying nonlinear chaotic character within time series instead of randomness. In PRE, a new relationship matrix B was constructed. Tis matrix was based on the relationship between adjacent elements and can closely refect the gap between two points, especially in complex signals. Te generation of the new mode c can avoid the repeated counting of the vector and was conducive to the complexity analysis of the whole signal. Te ablation experiment showed the PRE not only played a role in noise classifcation but also helped the overall classifcation indicators.

Comparison of Efectiveness of Artifcial, Abstract, and Fusion Features.
In this study, the corresponding Accs of the three feature schemes, that is, artifcial features, abstract features, and fusion features were also calculated to evaluate the efectiveness of the schemes. Table 9 shows the corresponding Accs of artifcial features, abstract features, and fusion features. Te Acc of 0.820 was obtained for the scheme using only artifcial features. Similarly, the Acc for only abstract features generated by the 13-layer 1-D CNN was the lowest 0.734 than that for all feature schemes.
Actually, deep learning can extract efective abstract features with the support of a large amount of data. However, the existed ECG databases are small so that deep learning algorithms cannot make full use of its power for

Conclusions
In this study, an AF detection method that combined artifcial features with abstract features was proposed, and it yielded the higher results, that is, Acc of 0.857, F 1 of 0.837, and F 1p of 0.735 for the database provided by the CinC 2017 than the previous studies. In addition, the nonlinear feature, that is, PRE, helps to identify the noisy ECG recordings from   Figure 3: Radar map of results for the two feature schemes.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that there are no conficts of interest regarding this work.