Robust and Accurate Anomaly Detection in ECG Artifacts Using Time Series Motif Discovery

Electrocardiogram (ECG) anomaly detection is an important technique for detecting dissimilar heartbeats which helps identify abnormal ECGs before the diagnosis process. Currently available ECG anomaly detection methods, ranging from academic research to commercial ECG machines, still suffer from a high false alarm rate because these methods are not able to differentiate ECG artifacts from real ECG signal, especially, in ECG artifacts that are similar to ECG signals in terms of shape and/or frequency. The problem leads to high vigilance for physicians and misinterpretation risk for nonspecialists. Therefore, this work proposes a novel anomaly detection technique that is highly robust and accurate in the presence of ECG artifacts which can effectively reduce the false alarm rate. Expert knowledge from cardiologists and motif discovery technique is utilized in our design. In addition, every step of the algorithm conforms to the interpretation of cardiologists. Our method can be utilized to both single-lead ECGs and multilead ECGs. Our experiment results on real ECG datasets are interpreted and evaluated by cardiologists. Our proposed algorithm can mostly achieve 100% of accuracy on detection (AoD), sensitivity, specificity, and positive predictive value with 0% false alarm rate. The results demonstrate that our proposed method is highly accurate and robust to artifacts, compared with competitive anomaly detection methods.


Introduction
Electrocardiogram (ECG or EKG) signal is a time series data sequence which represents electrical impulses from myocardium. An ECG signal is recorded from many electrodes which are attached over skin. Most physicians prefer a use of ECG as a noninvasive tool to detect and diagnose cardiac diseases. The two important characteristics of an ECG signal are its multiple signal recordings from various positions of myocardium and its periodic waveform [1] synchronized with a cardiac cycle. A normal ECG consists of five morphology segments, that is, PQRST waveforms that correspond to electrical conductivity through the whole cardiac cycle. One cycle is composed of depolarization and repolarization from an atrium to a ventricle. ECGs from different leads may be morphologically different depending on the vector of the heart. The ECG morphology in each lead reflects the electrical activity in each segment of the heart. Therefore, the multilead ECG could be used to interpret the electrical activity of the whole heart and is very useful for abnormal myocardium detection (for more details, see [2][3][4]).
ECG anomaly detection therefore has increasingly become a popular task among researchers and practitioners [5][6][7][8][9][10][11][12][13]. It has been used to detect any time periods of unusual ECG beats. The accuracy of the anomaly detection method directly reflects the result of the cardiac disease detection and diagnosis. However, existing algorithms that have claimed to achieve high accuracies still suffer from false alarm results [13,14]. False alarm results typically occur because the algorithm detects some ECG artifacts as anomaly beats; in fact, some ECG artifacts are just normal beats. ECG artifacts result not only from the electrical activity of the heart alone but also from noise interference as illustrated in lead V1 in Figure 1.
Although artifacts are common in typical recordings, they pose serious problems in medical treatment as referred in several medical research works due to the impossibility of ECG artifacts occurred in lead V1. Therefore, an ECG machine interprets these ECGs as abnormal ECGs whereas cardiologist diagnoses them as normal ECGs with artifacts (see the handwritten note). eliminating artifacts [14][15][16]. A number of works in medical domain have raised this concern to physicians to be aware of the artifact problem.
To study the problem of ECG artifact and false alarm result, we give questionnaires to physicians who have worked closely with ECG machines. We discovered that ECG machines often misinterpret ECG artifacts as anomaly, triggering excessive alarms to both patients and physicians on bedside monitors. Nonexperienced physicians then need to consult with cardiologists to manually reanalyze the results by considering all 12-lead ECG signals. In addition, clinical correlation is demanded and re-recording of a patient's ECG may be needed. Therefore, false alarm results not only waste valuable time for cardiologists but also lead to misdiagnosis for nonexperienced physicians.
These techniques are highly effective in removing artifacts frequencies or components that are very different from those of ECG morphology (see Section 3.2). However, they still cannot handle ECG artifacts that have appearances similar to ECG morphology. These techniques could distort the ECG waveform and easily lead to misinterpretation [33,34] because of the failure to distinguish between ECG artifacts and real ECG morphology.
In this paper, we propose a robust and accurate anomaly detection algorithm (RAAD) for ECG to reduce the false alarm rate. The challenges of this work include how to distinguish ECG artifacts from real ECG morphology in the case (1) where ECG artifacts could have numerous and uncertain shapes and (2) where the shapes of the waveform in each individual lead in each patient are all different.
To deal with the variation of ECG, our algorithm utilizes time series motif discovery to first determine frequent patterns in the ECG. In addition, we use expert knowledge to design every step of the algorithm to guarantee the conformity with the interpretation of cardiologists.
To evaluate the algorithm, brute force discord discovery [21], HOT SAX algorithm [35], and BitClusterDiscord [36] are compared with our work on several datasets from Physionet [37]. Our experiments are twofold: (1) to compare the detected anomaly subsequences with real anomaly beat diagnosed by cardiologists and (2) to evaluate algorithms using five statistical measurements which are accuracy on detection (AoD), sensitivity, specificity, positive predictive value, and false alarm rate.

Definitions.
To better understand terminology in this paper, the following definitions are provided. Definition 1. An ECG in one single lead is a time series , where a time series of length is an ordered set of real number sequence 1 , 2 , . . . , .
Other terminologies in motif discovery process follow those in [38].

Related
Works. Several research works have utilized time series mining technique in anomaly detection. Anomaly detection algorithms generally define an anomaly or a discord as the most unusual subsequences in a long time series. Many works utilize discretization methods to avoid noise inferences and use distance measures with pruning methods to measure the dissimilarity among subsequences. For instance, HOT SAX algorithm [21] bases itself on the representation of symbolic aggregate approximation (SAX). The algorithm has Computational and Mathematical Methods in Medicine 3 shown a great potential for extending and applying to many other works [39,40].
Most recently, Sanchez and Bustos [41] proposed a new algorithm for efficient discord discovery in time series data based on HOT SAX algorithm. The aim of the algorithm was to reduce the time complexity of the algorithm. However, an ECG dataset was used in the experiment without showing any anomaly result or the parameter settings. Another work [20] aimed to improve an effectiveness of the anomaly detection algorithm in time series data but only focused on clean signals and did not concern much about the issue of noise within the signal. Some anomaly detection algorithms specifically for ECG data [5,42] have been proposed based on machine learning technique, so training process is needed. However, ECG artifacts are known to be unstructured; in other words, they have numerous and uncertain shapes and the shapes of the waveform in each individual lead in each patient can totally be different. Therefore, it is difficult and impractical to collect a large number of training data with ECG artifacts. Another recently proposed work, BitClusterDiscord algorithm [36], bases itself on bit representation clustering, aiming to improve the effectiveness of the algorithm without requiring a training model. However, these algorithms do require the length of an anomaly beat as an input from users, which could be impractical for real ECG anomaly analysis.
To deal with the problem of predefined input length, many works have been proposed. For example, in [38,43], minimum description length or MDL is used to automatically discover intrinsic features and is utilized to detect anomalous ECGs. An adaptive window-based discord discovery (AWDD) [19] has been proposed to detect ECG anomaly. It has been developed from brute force discord discovery (BFDD) [35]. This work uses R point in every 40 seconds to extract a variable-length ECG.
Apart from the abovementioned works, there are numerous research works relating to anomaly detection in both single-lead ECG and multilead ECG. However, these works are not practical due to the following reasons.
(2) Many works [21,35,36,48] required a fixed length of result as an input parameter from users. To determine the length, it is, in fact, very difficult to know what the proper length is. Although some works [19,20,38,43,47,49] have presented their algorithms with variable lengths of results, and the length of the results may not be consistent with the length of actual cardiac cycle. Therefore, the result may not properly cover the cardiac cycle or morphology which is crucial for diagnosis.
The abovementioned problems are very challenging. To the best of our knowledge, our work is the first anomaly detection technique that works for ECG artifacts with the main focus on reducing false alarm rates, whilst retaining high accuracy.

12-Lead Electrocardiogram.
Electrocardiogram [2,3,50] is a graphical signal that represents the electrical activity generated by the cardiac muscle. We usually plot an ECG signal as an amplitude or voltage in millivolt (mV) versus time in seconds. A 12-lead ECG provides twelve different perspectives of the electrical cardiac activities. In particular, leads I, AVL, V5, and V6 represent the view of the lateral wall of the heart; leads II, III, and AVF represent the inferior wall of the heart; leads V1 and V2 represent the septal wall of the heart; leads V3 and V4 represent the anterior wall of the heart; and lead AVR is used to indicate the correctness of the electrode placement. These leads can also be categorized into three different types, which are bipolar, unipolar, and precordial. Each lead is represented in various shapes of ECG morphologies. This helps cardiologists or physicians to find out where the abnormalities are. [4,51] simply is the waveform or perspective of the electrical activity of the cardiac muscle, depolarization, and repolarization, in a cardiac cycle. The heart produces electrical impulses which spread through the cardiac muscle to make the heart contract. A normal ECG morphology consists of PQRST waves and each of the PQRST waveforms represents a single heartbeat or a cardiac cycle as shown in Figure 4.

ECG Morphology or ECG Waveform. ECG morphology
Cardiologists use the following morphologies for diagnosis. (1) P-wave refers to the electrical activation of atrial depolarization that causes the conduction of electrical impulse through the atria. to the end of the T-wave. QT-interval presents ventricular depolarization and repolarization. (6) TP-segment starts from the end of the T-wave of the previous ECG beat to the onset of the P-wave of the following ECG beat. TP-segment represents the time when the heart muscle cells are electrically silent. So, it is always illustrated by isoelectric interval which represents a zero line, a baseline, or an electric line.

ECG
Artifacts. An ECG artifact [14,[52][53][54][55] is described as waveform interference in an ECG recording resulting from noise contamination, anything that is not caused by the electrical activity generated by the heart. Artifacts can generally be classified into 2 groups as nonphysiologic and physiologic artifacts. Nonphysiologic artifact is caused by equipment problems or interference from neighboring electrical devices, whereas physiologic artifact is caused by muscle activities or skin interferences. ECG artifact is found in various and uncertain forms, some even have complete components of ECG morphology. Four common types of ECG artifacts are shown in Table 1.

Dynamic Time Warping.
Dynamic time warping (DTW) [57] is considered one of the most accurate similarity measures for time series data. With a dynamic programming technique to find an optimal warping path, DTW can handle nonlinear alignments or local time shifting and handle different-length subsequences [58,59]. Consequently, in time series domain, DTW is generally more suitable than the classic Euclidean distance. Given two time series sequences, a sequence of length and a sequence of length are as follows: ( An -by-matrix is constructed to store the cumulative distance between any two data points, and . The warping path can be found by dynamic programming to calculate a cumulative distance ( , ) from a sum of distance in the current cell and the minimum of the cumulative distance of the three adjacent elements as follows: where Computational and Mathematical Methods in Medicine 5 Finally, the optimal path that minimizes the warping distance is achieved and the DTW distance value is calculated as follows: where is the element ( , ) of the matrix and also belongs to th element of a warping path .

Contribution of Our Work.
Our work addresses the following 4 major problems in ECG anomaly analysis.
(1) Problem of Identifying the Length of Anomaly ECG Beat. Each person has different cardiac cycles; in other words, each person has variable beat lengths according to individual respiratory drives. It is hard to know what the length of anomaly ECG beat will be. Unfortunately, many current algorithms are fixed-length algorithms and they require the users to specify the length themselves. As illustrated in Figure 5, a slight difference of length identification has an effect on the results of the algorithm. At = 142, a typi-cal fixed-length brute force algorithm would produce a false alarm result whereas for = 143, no false alarm is detected. Therefore, the problem of identifying the beat length essentially must be addressed.
(2) Problem of Single-Lead Consideration. Most research on ECG mainly focuses on single-lead signal and ignores associations among other leads. However, in practice, ECG is typically recorded in many leads in order to capture activities in different perspectives. For example, for myocardial infarction (MI), so-called a heart attack, cardiologists have to consider and interpret multilead ECG to see where the lack of blood occurs in myocardium. In particular, for Inferior MI, at least, leads II, III, and AVF must all be considered. Therefore, we aim to design an algorithm to support multilead ECG and consider its association.
(3) Problem of False Alarm Results. Currently, many research works have emphasized on noise reduction, but none of them have proposed a method to handle the problem of ECG artifacts that mimic the ECG morphology. Additionally, existing methods may distort the ECG morphology after suppression.
Existing commercial ECG machines and applications cannot handle the false alarm problem perfectly. Bedside monitor may give excessive alarms to physicians in the case when ECG artifacts occur, interfering with perfectly normal ECG signals. This clearly wastes the physician's valuable time detecting the causes. Moreover, inexperienced physicians may misinterpret the results and give improper treatment to patients.
Consequently, our work focuses on the reduction of these false alarm results.
(4) Problem of Re-Recording ECG. In the case where ECG artifacts are detected, re-recording of the entire ECG from patients seems to be an easy fix. However, it could waste False alarm result (1) Figure 5: Different anomaly detection results by typical fixed-length algorithms. Only a small change in the input length could produce false alarm results, detecting an extra beat as anomaly. The boxes frame real anomaly beats and the bold lines denote the results by fixed-length anomaly detection algorithm. the physician's time in treating other patients, and it may not even find any abnormality in the re-recorded signals due to some heart diseases that contain only a few anomaly beats, for example, short run ventricular tachycardia (short run VT).

Methodology
In actual clinical practice, to differentiate a real ECG beat from an ECG artifact, cardiologists need to compare the morphology of that beat to other ECG beats in cleaner leads based on the following facts: (1) each time alignment comes from the same electrical activity across every lead and (2) TP segment is an isoelectric interval on the ECG, which should always be at the baseline; that is, any changes in TP segment must be considered as ECG artifacts. It is noted that U-wave is not considered in this work because it is just a condition that does not properly reflect anomalous ECGs, such that the physician needs further clinical investigation and diagnosis such as potassium level in the blood.
It is generally difficult to identify a TP segment in ECG artifacts. Therefore, we propose a method to first locate TP segments in the least contaminated ECG and to use it as a reference to identify TP segments in other leads.
We propose a robust and accurate anomaly detection algorithm in ECG artifacts (RAAD) by applying clinical knowledge from cardiologists and techniques from time series mining. The algorithm consists of preprocessing step, cleanest lead discovery, morphology segmentation, and robust anomaly detection.

Preprocessing.
Generally, the frequency of baseline wandering is less than 0.5 Hz, and AC interference is in the range of 50-60 Hz [60][61][62][63]. To suppress the baseline wandering and AC interference, we apply second-order Butterworth [63], zero-phase digital filtering [62] in the preprocessing step through functions available in MATLAB. We have conducted extensive preliminary experiments to determine the proper cut-off frequency for band-pass filter from a wide range of frequency from 0.5 to 50 Hz. As a result, a low-pass filter at 20 Hz is used to reduce the AC Interference, and then a high-pass filter at 2 Hz is used to reduce the baseline wandering because the frequency spectrum at these ranges has been shown not to distort the ECG morphology for PQRST detection and ECG interpretation.

Cleanest Lead Discovery.
Ideally, a clean lead is a lead that is not contaminated by noise. However, in reality where noise is inevitable, the cleanest ECG lead is considered the lead that is least contaminated. The shape of ECG beats within the same lead are generally similar to each other. So, if the shape of an ECG beat is different, we can suspect it to be an abnormality or an artifact.
Therefore, we assume that the lead that has the highest number of similar ECG beats is the cleanest lead. This corresponds to the definition of motif in time series mining task, which defines a motif as a group of frequently occurring patterns.
In our work, proper length motif discovery algorithm [38] is utilized to identify the cleanest lead that produces the maximum frequency of motif. In addition, an anomaly candidate in the cleanest lead is obtained from the remaining beats that are excluded from the motifs.
This algorithm is based on a minimum description length [64] which is a parameter-free algorithm and uses the bitsave as a heuristic to obtain the set of motif. The higher the score of bitsave is, the more similarity there is of patterns in a lead. Consequently, it is used to indicate the cleanest lead. We further extend the algorithm in [38], as follows.
(1) The starting length is instead determined by a sampling rate and expert knowledge. The sampling rate is a number of ECG samples or data points per second.
In view of expert knowledge [3,65,66], the length of systole is 1/3 of the cardiac cycle and the length of diastole is 2/3 of the cardiac cycle at resting. In addition, the normal heart rate range is 60 to 100 beats per minute. Therefore, the starting length is calculated as follows: (2) To find a motif candidate, instead of using the thcompression motif as a motif candidate as in [38], our algorithm uses only the 1st compression motif which is the most similar pair of subsequences. (3) We use only the first motif as the result in each ECG lead. The first motif consists of a motif candidate, a pair of the most similar subsequences, and its neighboring subsequences which are similar to the motif candidate.
The algorithm is summarized as in Algorithm 1. More details are available in [38].
Line 3: Z-normalization: to address the problems of amplitude scaling and offset translation in subsequences, all data points in subsequence are normalized as follows: wherêis the normalized value of the th data point in subsequence . Therefore, the result for is a set of real number sequenceŝ, Input: single-lead ECG = 1 , . . . , , : sampling rate Output: motif : the 1st motif in single-lead ECG (1) For = startingLength( ) to .length/2 (2) := extractSubsequence( , ) : bs:= BitsaveOfAdd(Group, ) (10) ifbs > Group.bitsave then (11) Group.bitsave += bs (12) Group:= AddToGroup(Group, ) (13) else break (14) end (15)  Lines 4-6: The most similar pair of subsequences , are identified as motif candidates with the lowest Euclidean distance and then bitsave of the pair is computed by createGroup function. The center is calculated from the average of and . Lines 7-14: Find a neighboring subsequence, and then compute new bitsave as . If is higher than the group's bitsave, the neighboring subsequence is added to the group. This step is repeated until the new cannot improve any further. At this point, we obtain a group of similar subsequences at length in a single lead.
Line 15: The group which is updated to be a motif must have higher bitsave.
Next, the algorithm repeats lines 1-16 for every possible length. Finally, the result of the motif discovery in a lead is the group that has maximum bitsave.
The abovementioned algorithm runs on a single lead. So, in finding the cleanest lead, the algorithm must also run on every single lead. The lead that has the maximum number of bitsave becomes the cleanest lead.

Morphology Segmentation.
The morphology segmentation aims to specify TP segments in ECG artifacts by referring to the position of PQRST in the cleanest lead. To identify the position of PQRST, we first locate an R-peak which is a striking peak, then a P-wave which is a waveform before the R-peak, and then a T-wave which is a waveform after the Rpeak.
Our work uses difference operation method (DOM) [67] to find QRS complex. DOM is selected because it is not complicated and has been applied in several works [60,68,69]. We slightly modify DOM to locate the PQRST waveform, incorporating the expert knowledge [3,70] about normal ECG waveform to justify thresholds as shown in Table 2. The algorithm is presented as follows.
Step 1. Extract a QRS complex using DOM. In this step, we acquire the positions of Q, R, and S.
Step 2. Extract a P-wave by first finding the P point as the maximum voltage before the Q point within 0.20 second (0.20 second is an upper bound of a normal PR interval). Afterwards, we retrieve a starting point and an end point of the P-wave by finding the minimum voltage position before and after the P point within 0.06 second (0.06 second is half of the upper bound of a normal P-wave). In this step, we acquire the onset, the peak, and the end of the P-wave.
Step 3. Extract a T-wave by first finding a T point as the maximum voltage after an S point within 0.38 second (0.38 second is the difference between an upper bound of a normal QT interval and a lower bound of a normal QRS complex). Afterwards, we retrieve a starting point and an end point of the T-wave by finding the minimum voltage position before and after the T point within 0.06 second. In this step, we acquire the onset, the peak, and the end position of the Twave.
After we apply steps 1-3 to the ECG in the cleanest lead, a TP segment is identified as a period from the end of the Twave to the onset of the P-wave of the next cycle. Finally, we can use all the TP segments in the cleanest lead as references to the TP segments in other leads.

Robust Anomaly Detection.
We use motif discovery algorithm to identify a period of a motif and a period of anomaly candidates. A period of motif can group normal beats in the cleanest lead and also can give some hints of any artifacts, anomaly, or normal beats in other leads. In this step, dissimilar beats from normal beats are considered as anomalies. A period of anomaly candidates is the rest of the ECG subsequence that is not detected as a motif. In this step, the period of anomaly candidates is then shifted accordingly to align with a cardiac cycle. Finally, these two periods are produced as the result of the algorithm. The challenges of this stage are listed below.
(1) Subsequence extraction technique: each subsequence is extracted from the whole ECG, starting from the onset position of a P-wave to the end of a T-wave. Therefore, each subsequence represents each ECG beat according to actual cardiac cycle.
(2) Partial similarity calculation: to measure the dissimilarity between two variable-length subsequences, dynamic time warping (DTW) is useful because of its nonlinear alignment handling abilities. However, using DTW can cause a problem of excessive alignment such as a P-wave aligning with a QRS complex. Therefore, we propose a new method that limits the alignment within each portion of ECG morphology. The algorithm calculates DTW distance only between the two-beat pair at each portion of morphology, that is, P-wave, PR interval, and QT interval.
The algorithm is presented in Algorithm 2. Inputs of the algorithm are an ECG in each lead, the positions identified by the morphology segmentation step and the anomaly candidates obtained from motif discovery algorithm.
Lines 1-5: The P-wave, PR interval, and QT interval are extracted from the period of the motif.
Lines 6-18: nearestneighbordis of beat is the shortest distance between the beat and other beats.
Lines 10-12: Distance calculation between beats and is computed; the distance is the total summation of DTW distance of the P-wave, PR segment, and QT interval.
Line 19: The newanomaly is the beat that has a larger nearestneighbordis distance than the threshold. The threshold is set to be a sum of the mean and the standard deviation of distances. Since most beats are similar to each other, the distances are not varied much. Therefore, dissimilar beats will be considered anomaly beats.
Lines 20-21: The starting position and the end position of anomaly candidates are shifted to the closest and , respectively. The anomalybeats is the result of our work. It is from a collection of the newanomaly and nonmotif beats from the motif discovery step.

Real ECGs.
We conducted experiments on real ECG datasets taken from PhysioNet [71] as shown in Table 3. A variety of datasets are used to illustrate various cases of comparison, that is, anomaly detection with normal ECG, single-lead ECG, multilead ECG, and various ECG artifacts.
We also perform some empirical studies and analysis to find out whether a number of data points have any effect on the result of the existing algorithms. As shown in Figure 6, we found that the more data points used in the calculation, the more false alarm results are generally produced because the existing algorithms consider similar anomaly beats that occur more than once so that the dissimilarity is low. On the other hand, our work considers the number of anomaly beat occurrences, so that the number of data points has no effect on the result. To be fair to other methods, we used less than 3,600 data points/lead, which is the typical length that can fit nicely on printouts of a standard 12-lead ECG and computer screen display. If != If distance < nearestneighbordis[ ] (14) nearestneighbordis[ ] = distance (15) End (16) End (17) (20) anomalybeats = Merge answer(Shift(cananomaly), newanomaly) (21) Return anomalybeats Algorithm 2: Robust anomaly detection.

Competitive Algorithms.
One of the most recent anomaly detection algorithms for time series sequences and two of the most popular algorithms, BitClusterDiscord [36], brute force discord discovery (BFDD) algorithm [21] and HOT SAX algorithm [35], have been chosen to be compared with our proposed work. BFDD produces exact anomaly subsequences whereas HOT SAX and BitClusterDiscord produce an approximate anomaly subsequence while avoiding noise inference through the discretization method and bit serialization, respectively. However, these algorithms require the length of anomaly beat ( ) as a predefined input. To give the best advantage to these three rival methods, we provide them with the precise lengths of actual anomaly beats specified by cardiologists as shown in Table 3.

Evaluation Metrics.
The algorithm is evaluated based on 5 measurements: AoD (accuracy-on-detection) [72], sensitivity, specificity, positive predictive value, and false alarm rate. Before stating how to evaluate the detected anomaly subsequence, the overlap criteria must be explained.
(1) Overlap between the Detected Anomaly Subsequence and the Real Anomaly Beat. Evaluation based on overlaps is extremely subjective. How much of the overlap with the ground truth anomalous beat would be enough to be considered a correct detection? To give an extensive evaluation, we define various criteria for the overlaps as follows.
(i) Overlap Based on Thresholds. The detection result is considered correct when its overlapping ratio with the ground truth is greater than a specified threshold. In our experiments, the thresholds are set at 0%, 30%, 40%, and 80% due to the following supportive reasons.
(a) 0% is most accommodating; even with only one overlapped data point, the result is considered a correct detection. (b) 30% and 40% are a little more restricted. These numbers are approximate percentages of data points typically covering an anomalous beat's morphology. (c) 80% is much tighter and has been used in some research work [73].
We define an overlapping ratio as a ratio between a number of the detected data points that overlaps the ground truth anomalous beats and the length of that ground truth anomalous beat. It can be calculated as follows: Overlapping ratio ( , ) = | ∩ | | | × 100, where is the set of the resulting data points from the algorithm and is the set of the ground truth data points according to cardiologist's analysis.

(ii) Overlap Based on Clinical Diagnosis by Cardiologists.
In clinical diagnosis process, a cardiologist needs to analyze   the anomaly detection results produced by the algorithm or ECG machines by examining the morphology of those signals to make a final diagnosis. Ideally, the results that only highlight the crucial morphology would be very beneficial. Therefore, the overlap between a detected subsequence and a real anomaly beat should correspond to two conditions as follows.
(a) The overlap must cover the area of morphology that is between the starting point and the end point of the morphology of anomaly beat, as shown in Figure 7.
(b) The overlap must not cover any adjacent beats. It means the algorithm can produce results that are, in fact, longer than those in (a) and must be contained within the anomaly beat which is between the starting point and the end point . is an end point of a Twave of the previous cardiac cycle and is a starting point of a P-wave of the following cardiac cycle, as shown in Figure 7.
(2) Accuracy on Detection (AoD). AoD [72] is used to indicate how well an algorithm can recognize anomalous ECG beats. AoD refers to an average percentage of detected subsequences that cover real anomaly beats. It is important to note that the higher the AoD, the better the accuracy of the detection. AoD can be calculated as follows: where denotes the number of real anomaly beats. denotes a real anomaly beat. denotes a detected subsequence that overlaps with the real anomaly beat .
(3) False Alarm Rate. False alarm rate or false positive rate refers to a percentage of normal beats that are incorrectly detected by an algorithm, as they are instead identified as anomalous. It can be calculated as follows: where a true negative denotes the number of normal ECG beats that are correctly identified as normal and a false positive denotes the number of normal ECG beats that are incorrectly identified as anomalous as shown in Table 4.  (4) Sensitivity or Recall. Sensitivity refers to a percentage of real anomalous beats that are correctly detected by an algorithm. It can be calculated as follows: where a true positive denotes the number of real anomalous beats that are correctly identified as anomalous and a false negative denotes the number of real anomalous beats that are incorrectly identified as normal, that is, missing the abnormalities of those beats in the detection.

Anomaly Detection Results with Normal ECG.
On the INCARTDB01 dataset which only contains normal ECG signals, RAAD works correctly as it produces no anomaly beat. On the other hand, other algorithms did produce anomaly beat results as shown in Figures 8, 9, and 10. Due to space limitations, we only show two ECG leads; the full detail is provided on our support website [74]. AoD, sensitivity, and positive predictive value are not calculated since no anomaly beats are present. However, the experiment results evidently show that RAAD outperforms other competitive algorithms in terms of both specificity and false alarm rate.

Anomaly Detection Results with Single-Lead ECG.
Many existing algorithms can only detect anomalies in one single lead. This experiment therefore aims to compare effectiveness among existing works and our proposed RAAD algorithm on a single-lead ECG (MITDB dataset). Even though RAAD is designed under multilead setting, Figure 11 demonstrates that RAAD can correctly detect premature ventricular contraction (PVC) in accordance with cardiologists' diagnosis and has superior performance to other competitive algorithms because the result from RAAD covers an entire morphology of PVC as shown in a dotted-line box in Figure 11(a) and also does not cover any portion of adjacent beats as shown in a solid-line box in Figure 11(a). More importantly, no false alarm results are produced. On the other hand, BFDD and BitClusterDiscord detect an anomaly subsequence that does not completely cover the morphology of the anomalous beat, and they cover some portion of the following beat, as shown in the dotted-line and solid-line boxes of Figures 11(b) and 11(d) and the zoom-in picture in Figure 12. In HOT SAX algorithm, its first detection turns out to be a false alarm (shown as (1) in Figure 11(c)), and the second detection does not cover the entire beat; that is, some portion of the previous beat is covered, but some part of TP segment at the end of the beat is missing (shown as (2) in Figure 11(c)). Therefore, the result of our proposed RAAD can be instantly utilized for clinical diagnosis because RAAD can obtain the result that corresponds to the cardiologists' diagnosis. Table 5 is provided to compare the results of AoD, sensitivity, positive predictive value, specificity, and false alarm rate with several overlap criteria. We use Car. as an abbreviation of cardiologists' diagnosis criterion. In particular, for rival methods, only the results for 40%, 80%, and cardiologists' diagnosis overlap criteria are shown because the results of 0% and 30% overlaps are identical to those of 40%. Likewise, for RAAD, only the results for 80% and cardiologists' diagnosis overlap criteria are shown because the results of 0%, 30%, and 40% overlaps are identical to those of 80%. The complete results are provided in our support website [74].
As evidently shown in Tables 5 and 6, RAAD's results are quite promising as its AoD is nearly 100%; sensitivity, positive predictive value, and specificity are 100%, and false alarm rate is 0%. Overall results demonstrate that RAAD can correctly detect anomaly beats in a single-lead ECG in accordance with   Figure 11(b). Some part of the anomaly's morphology is missing from the detection, and some part of the following beat is covered. all criteria and measurements. No false alarm results were produced in this case.

Anomaly Detection Results with Multilead ECG Containing ECG Artifacts and Variable Length of Anomaly Beat.
ITDB, INCARTDB02, and INCARTDB03 datasets are used in these experiments, as they contain multiple-lead signals (2-12 leads), with various lengths. Tables 5 and 6 show the results of AoD, sensitivity, positive predictive value, specificity, and false alarm rate with several overlap criteria. Since the datasets contain multiple signals, to give the best advantage to the rival methods, we tested the algorithm on each lead independently given the exact lengths of the anomalous beats (cf. Table 3) and then reported the best result among all the ECG leads, along with the mean ( ) and standard deviation (SD). However, as our proposed RAAD can handle multilead data, one single result is produced.
Due to space limitations, we only show the result of the INCARTDB03 dataset for RAAD in Figure 13 and BFDD in Figure 14. The results of HOT SAX and BitClusterDiscord are identical to those of BFDD, so it is not presented.
In Figure 13, RAAD does not produce any false alarm results in any lead and can give the same anomaly detection subsequences as cardiologists' diagnosis even if the lengths of anomaly beats are varied. The result of RAAD covers Table 5: The best value, mean ( ) and standard deviation (SD) of AoD, sensitivity, positive predictive value, specificity, and false alarm results obtained by BFDD, HOT SAX, and RAAD with various overlap criteria. Bold figures denote the wining algorithms.          an entire morphology of PVC (trigeminy) as shown in the dotted-line boxes and also does not cover any portion of adjacent beats as shown in the solid-line boxes. On the other hand, in Figure 14, BFDD produces many false alarm results. Although the algorithm can detect anomaly subsequences, these results do not cover the entire anomalous beats. We would like to reemphasize that the results of our proposed RAAD can be instantly utilized for clinical diagnosis because RAAD can obtain the results that correspond to actual diagnosis by cardiologists.
The results confirm that our proposed RAAD is able to accurately detect anomaly beats in multilead ECG in accordance with cardiologists' diagnosis even when the ECG is contaminated with artifacts. Additionally RAAD can efficiently identify anomalous beats with variable lengths. In ITDB, INCARTDB02 and INCARTDB03 datasets and AoDs are nearly 100%; sensitivity, positive predictive value, and specificity are 100%, and false alarm rates are 0%. Standard deviations (SD) are always 0 due to identical results in all leads.

Anomaly Detection
Results with Multilead ECG Containing ECG Artifacts That Mimic ECG Morphology. The INCARTDB04 dataset contains ECG artifacts that mimic the shape of the ECG morphology as shown in the shaded areas in Figure 15. The experiment was conducted on a 12-lead ECG. Due to space limitations, only the results of lead I and lead II are shown. The results of other leads are identical to those of lead II. In Figure 16, RAAD does not produce any false alarms in any lead and can obtain the same anomaly detection results as those obtained by cardiologists' diagnosis even if the ECG is contaminated by noise. The results by RAAD can still cover the entire morphology of PVC as shown in a dotted-line box and also does not cover any portion of adjacent beats with the detected subsequence all contained within the solid-line box.
On the other hand, BFDD, HOT SAX, and BitCluster-Discord did produce various false alarm results, as shown in Figures 17, 18, and 19. Although the algorithm can still detect anomalous subsequences, they do not properly cover entire anomalous beats.
It is clearly shown that the results by our proposed RAAD can be instantly utilized for clinical diagnosis because RAAD can obtain the results that correspond to actual cardiologists' diagnosis. Tables 5 and 6 show that RAAD is able to accurately detect the anomaly beat in multilead ECG in accordance with cardiologists' diagnosis even when the ECG contains artifacts that mimic ECG morphology. In INCARTDB04 datasets, AoDs are nearly 100%; sensitivity, positive predictive value, and specificity are 100%, and false alarm rates are 0%. Standard deviations (SD) are always 0 due to identical results in all leads.

Anomaly Detection Results with Multilead ECG Containing Extremely Noisy ECG Artifacts That Mimic ECG
Morphology. The INCARTDB05 dataset contains very noisy ECG artifacts as shown in Figure 20. Figures 21 and 22 are shown to compare the results of RAAD and BFDD. Due to space limitations and full clarity/readability in the figures, only the results of leads I, II, AVR, and V5 are shown. The results of other leads are very similar. The results of HOT SAX and BitClusterDiscord are also similar to that of BFDD so they are not presented here. Therefore, the complete results and details are provided in our support website [74]. The results show that all algorithms are able to detect the real anomaly beat. However, RAAD produces much fewer false alarm results than BFDD, HOT SAX, and BitClusterDiscord. Nonetheless, the results of all four algorithms still do not cover the morphology which is essential for diagnosis.
According to Tables 5 and 6 with the INCARTDB05 dataset, the overall results from RAAD cover the portion of real anomaly beat more than those from competitive algorithms, as indicated by mean AoD results.
According to this failure, it suggests that RAAD may not be appropriate for very noisy ECG artifacts because it is difficult to detect PQRST morphology. Likewise, it is difficult for other algorithms or even for nonexperienced physicians to interpret and detect anomaly beats accurately.

Discussion
According to all experiments, overall results indicate that RAAD outperforms the other existing algorithm and can be used for both single-lead and multilead ECGs. Additionally, variable lengths in anomaly beats have no effect on RAAD and no predefined length of anomaly beat is required. It is because our algorithm relies on ECG morphology analysis that cardiologists use for diagnosis in clinical practice, together with a utilization of the proper-length motif discovery technique.
When compared with the competitive algorithms, RAAD outperforms others because the detected subsequences do  cover real anomaly beats accurately and correspond to cardiologists' diagnosis. Consequently, the result of RAAD can be promptly utilized by cardiologists.
In addition, to overcome the problem of ECG artifacts that mimic ECG morphology, RAAD considers the association of each lead so the algorithm uses the cleanest lead as a reference lead to help identify ECG artifact that mimic ECG morphology. On the other hand, the competitive algorithms produce numerous false alarm results because they consider each lead independently and do not utilize expert knowledge.
With RAAD's overall sensitivity of 100%, it is shown that our algorithm can discover all anomaly beats and does not generate any false negatives. Moreover, with 0% and much smaller number of false alarm rates, it is shown that our algorithm can significantly reduce false alarm results. However, our algorithm still has a limitation when applied to extremely noisy ECG artifacts as it is very difficult to accurately detect anomaly beat. Likewise, this also happens to other algorithms as well as to the cardiologists themselves. The last dataset in Section 5.2.5 is taken directly from the Physionet repository, where no specific cause such very noisy ECG signal was given. Nonetheless, in view of cardiologists, they would not use such a noisy ECG for diagnosis and assume that such a very noisy ECG signal may be recorded from a patient who has tremor or agitation or myoclonus. To address this problem, the patient would be immobilized before performing an ECG re-recording or another investigation might alternatively be considered. It is apparent that the validation for such case is very difficult since all of the evaluation and validation for ECG anomaly detection problems are based solely on the expert's opinion/diagnosis; we would have no ground truth for the problem if the cardiologist is unable to annotate the signals. As a future work to alleviate this difficulty, an improvement of ECG artifact reduction/removal algorithms should be done, making the results more trustworthy to the cardiologists.

Conclusions
This research proposes a novel algorithm for robust and accurate anomaly detection in ECG artifacts. Motif discovery is used to find normal beats and identify the cleanest lead. The cleanest lead is utilized to detect positions of PQRST on the lead and other leads. ECG morphology is used to compare the similarity of each beat instead of a calculation of the whole subsequences like other algorithms.
The experimental results reveal that our proposed RAAD yields better anomaly detection results than brute force algorithm (BFDD), HOT SAX algorithm, and BitClusterDiscord algorithm. The results of RAAD cover the morphology that is essentially used for actual diagnosis and can significantly reduce false alarm rates. In the meantime, it can be used for both single-lead and multilead ECG; no predefined length of anomaly beat is required, and it can be applied with ECG artifacts even when they mimic ECG morphology. Finally, the result of RAAD can promptly be utilized by cardiologists. In the future, we will improve the algorithm to support real-time detection.