A Novel Deep Learning Approach for Anomaly Detection of Time Series Data

Anomalies in time series, also called “discord,” are the abnormal subsequences. )e occurrence of anomalies in time series may indicate that some faults or disease will occur soon. )erefore, development of novel computational approaches for anomaly detection (discord search) in time series is of great significance for state monitoring and early warning of real-time system. Previous studies show that many algorithms were successfully developed and were used for anomaly classification, e.g., health monitoring, traffic detection, and intrusion detection. However, the anomaly detection of time series was not well studied. In this paper, we proposed a long short-term memory(LSTM-) based anomaly detection method (LSTMAD) for discord search from univariate time series data. LSTMAD learns the structural features from normal (nonanomalous) training data and then performs anomaly detection via a statistical strategy based on the prediction error for observed data. In our experimental evaluation using public ECG datasets and real-world datasets, LSTMAD detects anomalies more accurately than other existing approaches in comparison.


Introduction
Time series analysis is a hot research topic, which mainly includes two aspects: (1) identifying the nature of the phenomenon represented by the time series of observation [1] and (2) predicting future values of the time series variable based on historic data [2,3]. It was widely used in many areas in the real world, e.g., signal processing, pattern recognition [4], mathematical finance [5], weather forecasting [6], and control engineering [7]. Particularly, anomaly detection of time series is a more important direction, which promotes the development of outlier recognition techniques in realtime big data [8].
knowledge [33][34][35]. us, it provides a new way to improve the area of anomaly classification of time series [36]. Different from various popular computational tools of anomaly classification, DL-based discord search in time series was not well studied. As a new type of DL model, long short-term memory (LSTM) provides great power in time series forecasting [37,38], which raises a question whether we can use LSTM to achieve discord search. In this study, we proposed a LSTM-based anomaly detection approach (LSTMAD) for identifying the abnormal subsequence from univariate time series. LSTMAD can learn the temporal structure of normal signal from the historic values so that it can easily identify the discords in the testing series. In the simulation experiments, we applied our LSTMAD model on various time series datasets and found that it can offer high accuracy. Moreover, LSTMAD also outperformed three other typical discord search algorithms. In summary, the developed LSTMAD provides a new pipeline to accurately capture abnormal sequences in the real-time systems. e rest of the paper is structured as follows: in Sections 2 and 3, the related work and the proposed computational approach LSTMAD are presented. In Section 4, the datasets for validation and the experiment design are described in detail. In addition, this section describes the steps and parameter settings of the method in detail. In Section 5, the simulation results are shown and discussed, while in Section 6 conclusions are drawn and suggestions for future work are presented.

Related Works
According to the previous works reported in literature, the computational approaches for anomaly detection can be summarized as three categories: statistical approaches, data mining based techniques, and machine learning. We summarized these methods as follows.

Statistical Approaches. Yamanishi et al. proposed a
Gaussian mixture model by scoring each data point and identifying the outlier with high scores [9]. Zhang and coworkers proposed a mathematical criterion to distinguish between normal and abnormal data using statistical algorithms [10]. Kosek et al. developed a regression model based method for anomaly detection [11]. Goldsein et al. proposed histogram-based outlier detection (HBOS) algorithm, which assumes independence of the features, making it much faster than multivariate anomaly detection approaches. It points out that the histogram is required if the results of outlier detection are available immediately [12]. e limitation of these approaches is that anomaly detection depends on the assumption that the data is generated in a particular statistical distribution [13].

Data
Mining-Based Techniques. Solutions making anomaly detection more effective are by using data mining techniques, including clustering, or classification. Researchers have mostly used K-means clustering for grouping of similar data points [15,16], so that the data points locating outside of these clusters were considered as anomalies. ese approaches operate in an unsupervised mode; however, they may not offer accurate insights at the required level of detail in smaller datasets. Classification-based anomaly detection was also widely studied for real-world applications, e.g., traffic, intrusion, or network detection [17][18][19][20]. e goal of classification is to learn from labeled classes of training data for identifying classes of new or unknown instances [39]. However, the good performance requires that the training set must have well defined labels.

Machine Learning.
In recent years, machine learning techniques were widely used for anomaly detection, including fuzzy logic [22][23][24], Bayesian approach [25,26], genetic algorithm [23,27], and neural network [28,29]. Nakano et al. proposed a fuzzy logic-based anomaly detection method for network anomaly detection [22]. Hamamto and coworkers developed a hybrid approach for network anomaly detection by using genetic algorithm and fuzzy logic [23]. Mascaro et al. explored the use of Bayesian networks for analyzing vessel behavior and detecting anomalies [26]. Combining the dynamic and static networks, they proved that their approach improved the detecting accuracy in vessel tracks. As the rapid progress of artificial intelligence, various neural network models, e.g., recurrent neural network (RNN) [29] and back propagation neural network (BPNN) [28], were developed to monitor the anomalies of a complicated system. ese approaches work well in some special application areas; however, the generalization is still a big challenge.
Comparing with traditional machine learning methods, deep learning (DL) has stronger learning ability and can achieve higher accuracy [40]. e most frequently deep learning methods are generative adversarial network (GAN) [41], autoencoder [42], convolutional neural network (CNN) [43], and Long Short-Term Memory (LSTM) [44]. Previous studies show that almost all of the above models were applied to anomaly classification [45][46][47]; however, the work focusing on DL-based abnormal subsequence detection in time series is rarely reported.
Despite this, there still have been many attempts to perform anomaly detection in time series using various statistical or SVM-based methods, including MFAD [31] and LRRDS [32]. However, few attempts have been made to accurately predict the abnormal subsequence in time series using LSTM. erefore, a proper deep learning method is required to perform anomaly detection using LSTM.

Method
e flowchart of the proposed LSTMAD approach is shown in Figure 1(a). e whole framework consists of four modules, including noise reduction, normalization, LSTM model, and anomaly detection. e details of each module are described in the following sections.

Noise Reduction.
Since the noisy signal might be involved in the processing of data collection, which will affect the accuracy of the computational results, therefore, it is necessary to reduce the noise from the raw sequence before constructing anomaly detection model. In this study, we removed the noise information from time series by using S-G filter, which was proposed by Savitzky and Golay in 1964 [48]. It can be applied to a set of digital data points for the purpose of smoothing the data, to increase the precision of the data without much destroying its original properties. S-G algorithm is capable of not only removing the noise from raw data, but also ensuring the shape and width of the original signal [49,50].

Normalization. Given a univariate time series
A � [a 1 , a 2 , . . . , a i , . . . , a N ] with length N (N > 1), the normalization was implemented as follows: where A and S i are the mean value and standard deviation of is the normalized sequence. After normalization, the series X will follow 0-1 normal distribution.

LSTM Model.
e Long Short-Term Memory (LSTM) model was first developed by Horchreiter and Schmidhuber in 1997 [51]. Different from RNN's capability to process short term sequential data, LSTM can be used to represent the long-term dependencies in time series data [52]. A common LSTM unit is composed of a memory cell, an input gate, an output gate, and a forget gate ( Figure 2). e cell remembers values over arbitrary time intervals and the three gates regulate the flow of data into and out of cell. e processing of state transition in the memory cell was implemented via formula (2)-(6). e input vector at time point t is x t , and the hidden state vector at t − 1 (h 1 ) is introduced to the LSTM unit, and then the hidden state h t will be finally obtained. Equation (2) decides what information is going to be thrown away from the cell state via the forget gate (f t ). e input gate (i t ) decides which values to be updated, and (3) and (4) were used to update the old cell state (c t−1 ) into the new cell state c t . Equation (5) indicates that the output gate (o t ) decides what parts of the cell state are going to be produced as output. Finally, the cell state goes through tanh layer and multiply it by the o t so that we get the hidden value h t as the output of the LSTM unit (in (6)).
According to Figure 1(b), the LSTM model in our LSTMAD framework includes five layers. e input layer has L − 1 nodes, indicating that a subseries with L − 1 elements was used as input to a fully connected hidden layer. ere are three hidden layers to process the information from input

Hidden layer Hidden layer Hidden layer
Output y L -1 Scientific Programming layer. Each blue node shown in the hidden layer is a LSTM unit. e output layer only has one node (Y), which can be obtained from where W 3 is the weight of the i-th hidden node in layer 3 (last hidden layer) and output node Y. h 3 is the i-th hidden node connecting to node Y. For a certain subseries STS j � [x j , x j+1 , . . . , x j+L−1 ], where 1 ≤ j ≤ N − L + 1, the first elements were L − 1 sent to the input layer of the LSTM model simultaneously, and then the last element was considered as the expected result to be optimized.
is also can be represented as is a trained LSTM model. Before training the LSTM model, the original time series was segmented to multiple subseries via a sliding window with length L (Figure 3). All these segmented subsequences were randomly ordered. To obtain enough training samples, each subseries was replicated k copies, where 1 ≤ k ≤ 10. Finally, each row in the augmented matrix was input into LSTM network for model training.
According to the above description, our LSTM model can be considered as a supervised regression machine for predicting the upcoming values based on the historic data. Based on this rational, the LSTM module was firstly trained with the samples converted from the series without anomaly; hence, the model prediction would reflect the tendency of the normal signal.
3.4. Discord Search. As described above, a normal subsequence X Trn � [x 1 , . . . , x m ] (1 < m < N) was firstly extracted for LSTM training. In the meantime, a testing subsequence X Tst � [x m+1 , . . . , x N ], including discords (anomalies), will be selected. Our rationale is that a trained LSTM model "memorized" the characters of a dynamic system in normal state; hence, it can predict the future state of the system if it still normally works. Given a testing sequence that contains abnormal signals, the discord values can be easily identified by comparing the predicted values from LSTM with the observed values. e calculation for discord search includes the following steps.

Segmentation of the
Testing Sequence X Tst . Similar to the training sequence, the testing series also needs to be converted to multiple segments via sliding window (Figure 3). Here, we set the length of sliding window as L. In our experiments, we set X Tst � X to simultaneously present the fitting error and prediction error.

Prediction of LSTM Model.
For each segmented small piece of sequence STST j � [x j+1 , . . . , x j+L ], the element vector STST j (1), STST j (2), . . . , STST j (L − 1)| would be used as input of trained LSTM model. We thus obtained the model outcome Prd j � F(STST) j , which is the theoretical value of observation STST j (L). For J testing samples (subsequences), we will obtain the prediction error vector:  where AE j � |Prd j − STST j (L)|.

Discord Search.
e vector PEV reflects the difference between prediction and observation. If the value AE j is significantly higher (AE j ≥ 0.33 M), the corresponding point at time j + L can be considered as the peak of discord. We then use Gaussian model to fit each candidate point with significant higher value AE j , and the abnormal sequence was finally selected from the region [μ − 3σ, μ + 3σ] (μ and σ are the mean value and standard deviation, respectively).

Data Collection and Preprocessing.
To examine the performance, we applied the LSTMAD approach on 6 datasets, including four well-known public datasets and two industrial time series from the real world. e details of these datasets are described as follows.
(1) Chf01 [30] and Chf13 [53], ECG (electrocardiogram) related data, are collected from BIDMC Congestive Heart Failure Database [53,54]. e length of both datasets is 3751 and 3750, respectively. Each of them includes two series. In our experiments, we selected the 1st series from Chf01 and the 2nd series from Chf13 to test our algorithm. (2) Ltstdb_20221 [30], an ECG dataset, is selected from Long Term ST Database. Its length is also 3750. We used the 1st series in our experiment. (3) Xmitdb_x108 [30,55], an ECG dataset with length 5400, is selected from MIT-BIH Arrhythmia Database. e first sequence was used in our simulation. (4) SLD1 and SLD2, two sequences related with "soil pressure" in shield tunneling machine [56], were collected from a project of shield tunnel construction in the real world. e real-time construction state was collected at each 10 seconds by local sensors. Totally, over 400 features were observed during the whole process of construction. In our experiments, we focused on the time series related with "soil pressure" because abnormal pressure is a typical fault in tunneling construction. e lengths of SLD1 and SLD2 are 18,087 and 210,907, respectively.
Before the implementation of anomaly detection, the performance of S-G filters on both categories (original signal and processed signal) of data sample was evaluated in terms of PSNR (peak signal-to-noise ratio), SNR (signal-to-noise ratio), MSE (mean square error), and PRD (root mean square difference) values [57].

Experiment Design.
First, we applied the proposed LSTMAD approach on the above six datasets to prove its outstanding performance. Second, we further compared LSTMAD with three well-known algorithms, including HOT SAX [30], Robust Random Cut Forest (RRCF) [58], and Telemanom [59]. To evaluate the accuracy of anomaly detection, two statistical metrics, MCC and F 0 , are defined as follows: As reported in previous studies, MCC produces a more informative and truthful score in evaluating binary classifications, particularly for the imbalanced data [60,61]. In (8),

Evaluation of Noise Reduction.
Firstly, we evaluated the quality of each time series processed by S-G filter. e performance of S-G filter of data sample was compared in terms of PSNR, SNR, MSE, and PRD values. Table 1 shows that S-G filter works well on the six datasets. e anomaly detections implemented on the processed datasets are reliable.

Validation on Univariate Time Series.
According to the description in Section 4.2, the proposed LSTMAD approach was tested on six time series datasets shown in the above section. e simulation results were presented as follows.
e reference (observed) and predicted anomalies were highlighted with red color. Figure 4 shows the simulation results of LSMAD on chfdb_chf01. Figure 4(a) presents a reference anomaly, which locates in the range [2182, 2392]. Figure 4(b) shows an abnormal subsequence from 2252 to 2392 identified by LSTMAD.
Comparing with Figure 4(a), we found that the predicted result is very close to the reference.
Similarly, Figure 5 shows the simulation results of LSTMAD on the dataset chfdb_chf13. In Figure 5(a), we found that the normal signal is a periodic sequence, which is repeated many times. Moreover, there is a discord located in the range [2758,2967]. e outcomes of LSTMAD revealed that the predicted anomaly occurred in the range from 2758 to 2874 (Figure 5(b)). It indicates that the prediction of LSTMAD fit the observation well.
Different from the above two sequences (Figures 4 and  5), the series ltstdb_20221 is not easily identified because the abnormal subsequence is very similar to the normal signal. In Figure 6(a), the discord is determined in the range [583,783]. After calculating with LSTMAD, we predicted the subsequence locating at [583, 857] as a discord (Figure 6(b)).
In addition, we examined the performance of LSTMAD on the last ECG dataset xmitdb_x108 (Figure 7). e reference and predicted anomaly locate at [3995, 4207] and [3899, 4207], respectively. Taken the above together, we found that the proposed algorithm works well on four wellknown ECG datasets.
Furthermore, we applied the LSTMAD framework on two real datasets, which were generated from a shield tunnel construction project. For SLD1, the log file recorded that there was a fault ("soil pressure continues to decrease" that occurred in the region from 11,940 to 12,160. e reference discord also can be obviously identified in Figure 8(a). e prediction of LSTMAD shows that our method is capable of capturing the abnormal subsequence (Figure 8(b)). However, the predicted discord is located at the region [11,255,12,219], where there exists a little bias.
Finally, we tested the performance of LSTMAD on the time series SLD2. It seems that there are two peaks in the reference sequence (Figure 9(a)); however, only one fault was reported in the log file. e reference anomaly, from 173,982 to 174,002, was shown in Figure 9(a) with red color. Our algorithm successfully identified the anomaly in the range [173,982, 174,002] (Figure 9(b)). In summary, the developed LSTMAD approach not only works well on some public time series, but also works on real-world sequence.

Comparison with Other Algorithms.
To further prove the effectiveness of the proposed algorithm, we tested all the above datasets on three classic anomaly detection methods: Hot SAX [30], Robust Random Cut Forest (RRCF) [58], and Telemanom [59]. Table 2 shows that LSTMAD outperformed the three other methods for anomaly detection in univariate time series. e values of MCC on six datasets show that LSTMAD can capture the abnormal subsequences in most of the time series. However, the performance of RRCF and Telemanom is obviously lower than that of others. Moreover, the measurements of F 0 on six datasets indicate that the predicted anomalies obtained from LSTMAD match the references very well. In summary, the accuracy of our approach is significantly superior to existing methods.

Discussion and Conclusion
In this study, we proposed a novel LSTM-based approach (LSTMAD) for anomaly detection in time series data. LSTMAD was developed by combing LSTM network with a statistical strategy.
ere is no need to depend on prior knowledge; our method is capable of learning the context of sequence data from the normal signal and then identifying the abnormal regions based on the prediction error for observed data. To verify the performance, we applied LSTMAD on several time series datasets, including wellknown public data and real-world data.
e simulation results revealed that LSTMAD can identify the discords from 6 Scientific Programming a whole sequence with high accuracy. Moreover, LSTMAD outperformed the other golden standard approaches on all the testing datasets.
In previous studies, LSTM was widely used for time series classification or forecasting [2,3,17,44]. However, it was rarely reported for discord search in time series. We are        the first to build a predictive model from nonanomalous training data and then perform anomaly detection based on the prediction error for observed data. Moreover, the experimental evaluations also indicate that both the performance and generalization of LSTMAD are strong. Our method is suitable for real-time anomaly prediction, especially when the underlying physical process is less fully understood and characterized. It does not rely on prior knowledge and is not sensitive to the length of sliding window; it thus will be a scalable algorithm for future application.
Limitations exist in the proposed LSTMAD approach. First, the current version is mainly developed for univariate time series so that it cannot directly address multivariate sequences. Second, bias also exists in the selection of public data because anomalies in periodic sequences are often more easily detected [30]. ird, enough evidence is still lacking to mathematically prove that the structure of current LSTM network is optimal. To refine the LSTMAD approach, there are four aspects that need to be considered in the future work: (1) new component will be included into the current framework to transform the multivariate sequence to univariate; (2) the rationality of the LSTM network needs further argumentation; (3) we will further design a reasonable strategy for parameter search in the future to improve the performance of our model; (4) various goldenstandard time sequences need to be tested.

Data Availability
All the data used in this study are available at GitHub: https://github.com/Lostinparadise1981/LSTMAD.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this manuscript.