An Anomaly Detection Algorithm Selection Service for IoT Stream Data Based on Tsfresh Tool and Genetic Algorithm

methods help to choose suitable service and their respective conﬁguration based on the patterns of stream data. The features used to describe and reﬂect time-series data’s intrinsic characteristics are the main success factor in our framework. Consequently, experiments are conducted to evaluate the eﬀectiveness of features closed by genetic algorithm. Experimentations on both artiﬁcial and real datasets demonstrate that the accuracy of our proposed method outperforms various advanced approaches and can choose appropriate service in diﬀerent scenarios eﬃciently.


Introduction
With the growth of the Internet of ings (IoT), the sensor or stream data is bound to be collected at tremendous speed. In such real-time scenarios, there can be various anomalous data streams, for example, the data diverge from the usual behavior of the stream or the abruptly jumped data [1], which are dissimilar to familiar patterns.
It is critical for further decision making to capture these anomalous data accurately and timely. Banerjee et al. [2] introduced the trend of everything as a service (XaaS). Following Banerjee et al. [2], a lot of researchers try to encapsulate various data or common functions into services. For example, streaming as a service is studied by many researchers [3][4][5], which can provide the sharing and simple processing capabilities for stream data. e idea of choosing suitable service or methods can be referred to in [6][7][8]. It is proposed to provide common functions for various data sources, which enable users to conveniently reuse these functions and form more complex functions through service composition.
In real-world software systems, numerous anomaly detection algorithms (ADAs) are industrialised and are offered as a service to be utilised in diverse domains [9,10]. In our preceding work [11], a proactive data services abstraction was applied to appropriately encapsulate present ADAs into a service.
Even though, with the scenario in hand, it is still a challenge to effectively capture anomalous data considering various circumstances. Following the concept of the No-Free-Lunch (NFL) optimisation theorem [12], it is infeasible to find a single algorithm for all the cases that dominate all others on the same optimisation problem [1]. In the state-ofthe-art survey paper, Braei and Wagner [13] state that for the most part the univariate dataset may suffer from contextual anomalies; therefore, statistical methods will not perform well. Deep learning models may perhaps increase the area under the curve (AUC) and neural network models might outperform the statistical methods. On contrary, the volume of novel stream data can appear frequently and continuously and can result in missing part of the anomalous data through manual service selection. Consequently, running an ADS possibly will not adjust to different types of stream data. erefore, for faster and more accurate anomaly detection, it is obligatory to choose an appropriate service for different stream data dynamically at run-time.
Since each type of anomaly detection algorithms gives better results only for a particular set of stream data [14]. erefore, to automatically choose appropriate services for diverse IoT scenarios, it is required to correctly and quickly characterise the underlying stream data. Hence, proper service might be chosen and configured based on the pattern of a particular stream of data. Keeping in view the gigantic volume of stream data, this study finds out that several IoT streams are alike owing to their shape similarities and implicit relations.
For effective handling of anomalies from various stream data, based on the above observation, in this paper, an Anomaly Detection via Service Selection (ADSS) framework was proposed. To recognise the pattern of various stream data, in our proposed ADSS framework, it tries to capture intrinsic similarity and dissimilarity in various stream data established on time-series statistical features. Moreover, a fast classifier based on the XGBoost algorithm is trained to record features of stream data in order to detect appropriate ADS dynamically at run-time. Due to the presence of the best classifier, our ADSS method can identify the dynamics of data stream patterns of newly appearing stream of data and then choose and configure the suitable service.
Firstly, it is well known that there could not be an algorithm that could defeat others in all the datasets. Consequently, our aim of this study is not to build a model or to develop a new algorithm which could beat all the other algorithms in all the datasets. Instead, a method is designed to capture the variation of the stream data in the run-time and configure different algorithm to handle the stream data. Experimental results show that we could achieve a better performance in the long run.
is study focuses on the selecting algorithms based for dynamically changing IoT stream data. e original idea is to construct features to be a representation of different stream data and build a supervised model to recommend a suitable algorithm for a certain stream data. Collections of historical data are gathered from a real monitor system. Further, on the basis of these data, an XGBoost model is trained based on the feature and its label. Here, the label is the best algorithm which is more suitable for a certain kind of stream data. is manuscript is the extended version of our recently published conference paper [15].
In the revised version of the manuscript, the features' construction process is improved by applying a Tsfresh tool and intelligent optimisation algorithm [16]. e former tool is taken to extract multiple features of time-series. ese features consist of 100+ kinds of features from different angles which could represent intrinsic features of stream data completely. Moreover, an intelligent optimisation algorithm such as genetic algorithm is applied to help choose a subset of features which could further result in the reduction of computing complexity of the algorithm recommendation procedure. e specific contributions of the manuscript are summarised below: (i) In this paper, we develop a method that facilitates IoT-based systems to automatically choose appropriate services using the existing data features in order to detect an anomaly. (ii) In this paper, we develop a service update framework in which service quality and its resultant algorithms and data stream are recorded. e aforesaid historic data will assist in the training of different decision models that paves the approach for accurately recommending ADS. In this approach, freshly designed algorithms can easily be added to the service pool. (iii) In this paper, we carried out various experiments by means of data streams from NAB [17] and Yahoo datasets [18]. e experimental results demonstrate that our method can select the best service dynamically according to changes in the stream data pattern. (iv) In this paper, an improved features' construction method by applying Tsfresh tool and intelligent optimisation algorithm is devised [16].
e remainder of this article is organised as follows. Section 2 describes the related work to build a proper problem statement. e proposed ADSS framework is accessible in Section 3, while Section 4 is based on experimental outcomes. Section 5 is the last section which summarises the paper.

Anomaly Detection Algorithms.
In this study, the unsupervised methods are mainly considered to detect anomalies due to its good generalization ability. e possible reasons why we do not consider supervised methods are as follows. Firstly, in real-time IoT arrangements, different types of time-series data are collected that are hard to label for anomalies. Secondly, to rapidly deploy ADS, almost there is very less or no time to train a complex anomaly detection model. irdly, for the dynamic change of time-series in real-time IoT systems, even some of the good models perform badly and cannot handle this dynamism. Summary of the unsupervised class of ADAs is given in Table 1.
Although, for anomaly detection, there are numerous deep learning algorithm-based methods, for example, AutoEncoder [21] and LSTM [22], they cannot be used directly on the continuous stream data, because these methods need fine parameter tuning and a lot of training data. Allowing for the scenario of frequent changes of data pattern or anticipated behavior in the frequently launched streams stream, the notion of selecting appropriate service algorithms is becoming challenging.

Anomaly Detection Algorithm in the System.
To present a unified and easier way to adapt to changes and accurately detect anomalies in diverse circumstances, a lot of ADAs are delivered as a service.
In [14], the first ever ADS framework was developed to consider the aforementioned problems through semisupervised learning and clustering. is study was the first work that applies semisupervised learning to key performance indicator (KPI) anomaly detection [14]. Still, the postulation of huge resemblance in KPI stream data is not effective in conventional IoT stream data.
In [10], an anomalous behavior recognition system composed of two phases was developed based on the past data learning the normal behavior of the system in the first phase and then by processing real-time data and detecting abnormal behavior in the system dynamically in real time in the second phase. In their system, complex event processing (CEP) patterns and anomaly detection are combined as a REST service to be utilised through the interface by a user.
In [27], the authors divided stream data into four different time-series groups, i.e., periodic, stationary, nonperiodic, and nonstationary. Furthermore, they used diverse techniques to detect anomalous data.
In [28], the authors state that, in the age of big data, it is a very challenging but important task to detect anomalies. ey presented the Interactive Data Exploration As-a-Service approach for the identification of significant data.
A dynamic IoT stream data ADA must recognise various data pattern changes in diverse stream data anomaly detection approach. ough previously researchers were aware of the problem of runtime outlier detection, yet solution formation did not consider this problem and ignored consequent changes in the stream data. While working with a fast growing volume of IoT data with their respective dynamic nature, current approaches are not effective.
We attempt to develop a framework based on the features collected in the first phase to characterise the timeseries data and then apply deep learning models in the second phase to recognise the pattern of data that will help reconfigure the ADS dynamically in the run-time.

Description of Our Proposed ADS Framework.
e framework developed in this paper comprises of three parts: (i) service selection procedure, (ii) encapsulations of ADAs, and (iii) service applied procedure. Many publicly available unsupervised ADAs are incorporated for the development of ADAs. As mentioned before, the available ADAs can be encapsulated into services based on PDservice abstraction. A RESTful API is used for the selection of ADS. Service receiver can define individual views to build IoT applications and can get the anomalous data via the Uniform Resource Identifier (URI) of service. e entire working of the developed ADSS framework is illustrated in Figure 1. As shown in Figure 1, the collected tuples are portions of historical data that can be collected through recording the stream data for a long time by field experts along with the appropriate ADAs.
Stream data along with its appropriate algorithm are kept in the database in the form of a tuple <stream data, algorithm> that can be used as a metadata for onward service identification and selection procedure. ese historic data can be updated by collecting running examples from anomaly detection systems or by experts in this field. Usually, these recorded data monitored the performance of various ADAs and stream data that can be used to generate a scheme to select an ADS for specific stream data.
ADSS is the basic unit of our framework. Each stream data can be represented by a feature vector for by applying a stream feature extraction technique on stream data. As a training data of service selection model, the paired data are constructed and combined with the best possible service. In the service applying part of our framework, a new stream data is transformed to features vector grounded on the same Table 1: Summary of stream data anomaly detection algorithms.

Typical algorithms Category Characteristic and limitations
Prediction confidence interval (PCI) for time-series outlier detection, simple exponential smoothing (SES) [19], and ARIMA model [20] Statistical approaches (1) A supposition about outlier data and normal data need to made first (2) Domain-specific knowledge is needed for threshold selection depends on Autoencoder [21], LSTM [22] Artificial neural computing Since clustering methods cannot deal with continuous changes in data, therefore careful parameter tuning is needed Density-based spatial clustering of applications with noise (DBSCAN) [23], subsequence time-series clustering (STSC) [13], isolation forest [24], local outlier factor (LOF) [25], oneclass support vector machine (OC-SVM) [26] Machine learning approaches Work on stream data; therefore, the normal reference model might be outdated at the moment they are actually used feature extraction technique. Finally, the service selection model is used on the feature vector to select appropriate service for the existing stream data and ultimately call the service in real time to identify anomalous data.

Model for Service Selection.
Abundant stream data are gathered and their feature is extracted through the process discussed in the previous sections in order to select appropriate services for stream data anomaly detection. e finest ADS is chosen by analysing the historic data based on the recorded stream data fragment and its corresponding best service. In general, few ordinary services are tested repeatedly on these stream data fragments to identify its finest service. Grounded on the stream data fragments and its finest ADS, the service selection problem has been transformed into a pattern recognition problem. Taking into account its computing efficiency, in this paper, XGBoost [29] is utilised as a base classifier to choose a service for real-time stream data anomaly detection. is procedure is illustrated in Figure 2. It should be noted that any classifier can be used in our framework. However, in this study, we have chosen the XGBoost algorithm as to best choose service considering the easy explanation and high computing efficiency of the XGBoost algorithm. e time-series features are the main part of our framework, as presented in Figure 2. Some renowned stream data features are taken from publicly available features and some former anomaly detection schemes. What is more, a feature selection method was employed to find some good features to capture stream data essential features. e objective of the selected features of stream data is to accurately and quickly select the appropriate service dynamically in real time for novel evolving stream data.

Stream Data Patterns
Representations. Stream data may generate dissimilar patterns as demonstrated in Figure 3. According to Bu et al. [14], supervised techniques such as SVM or deep learning-based techniques are not achievable for the huge amount of novel IoT stream data applications and the dynamic nature of the stream data. is might be due to two reasons: difficult parameters tuning process and a large amount of training data.
Researchers like Bu et al. [14] state that for some kinds of stream data simple ADAs may perform well compared to some multifaceted algorithms such as deep learning. e pattern of stream data can also be recognised in time which overlays the way for future algorithm selection in modern microservice architecture also recognised as a service selection. e main contribution of our work focuses on the extraction of features to characterise stream data and based on these features select suitable algorithm service.
In order to select useful features that could distinguish different stream data patterns, a feature selection method was applied. We surveyed all the features which could be considered for the representation of time-series data. ere are multiple types of features from different angles such as statistics, mathematics, shape, distribution of data, and others in the classification of time-series field.
Christ et al. [16] automatically extract 100 features from time series and develop a tool called Tsfresh. ese features label basic characteristics of the time series, for example, maximal or average value, the number of peaks, and additional complex features, for example, time setback symmetry statistics. At the same time, through hypothesis testing to reduce the characteristics to those, which can best explain the trend called decorrelation. ese feature sets are then used to construct machine learning or statistical models based on time series data such as classification or regression tasks.
In addition, these collected features are the reflection of the inherent nature of data patterns, for example, the distribution, the fluctuation, and shape of data. Some typical features are demonstrated in Figure 4.
As is shown in Figure 4, these simple features or complex features are designed to characterise the time-series data  Figure 1: Anomaly detection via service selection framework for service selection.
from different angles and own their special geometric interpretation or statistical meaning. ese special characteristics are quantified by computing these features. In other words, it is possible to distinguish these stream-data from each other by comparing these features. More details about other features in Tsfresh are discussed in Table 2.
As is presented in Table 2, some computing techniques are taken from Extendible Generic Anomaly Detection System (EGADS) [30], and some metrics are taken from Tsfresh [16] and the rest from other renowned statistical techniques such as standard deviation and mean. Local fluctuation, metrics of symmetrical values, and fluctuation ratio are recommended in our study to characterise stream data from diverse perspectives. e flowchart of selecting features from multiple original features is illustrated in Figure 5. As shown in Figure 5, the genetic algorithm (GA) [31] is applied to find a feature subset, which is enough to characterise different traits of various stream data.
In the process of GA, the fitness computing consists of two steps: decoding individual to feature subset and computing test score based on the feature subset. e test score is utilised as the fitness of the individual. e other steps of GA such as selection, crossover, and mutation are following the normal behavior as in the traditional computing processes. e above process belongs to wrapping feature selection approaches which build many models with dissimilar subsets of input features and hand-picked those features that have best performance agreeing to the performance metric. Although these approaches are independent of the types of variables, yet they might be computationally expensive.   [18] and NAB [17].
ough these features are designed for general classification and clustering problems, and not for algorithm selection problems, as dictated by the literature on machine learning technology, the transform learning technology may perform well in similar problems for various problem fields. Considering the similarity of the above two problems, a conclusion can be drawn that the selected features are useful in the algorithm selection task.
Finally, these features will help choose a suitable algorithm for a certainly given stream of data by training a classification model. As it is mentioned before, if these features could be computed in real time, the decision of Table 2: e name, design principle, and computing method of some features in Tsfresh.

Name
Design principle Computing method [16,30] Mean e baseline of time series approximate_entropy Approximate entropy is used to measure the periodicity, unpredictability, and volatility of a time series Refer to [16] Autoregressive coefficient Measure the cyclical nature of data 1 n−1 i�1,...,n 1/(n − l)σ 2 n−l t�1 (X t − μ)(X t+l − μ) Kurtosis e feature number indicating the peak value of the probability density distribution curve at the average value absolute_sum_of_changes Absolute sum of first-order difference Calculation of a linear least squares regression for the values of the time series to the sequence from 0 to the length of the time series −1 Refer to [16] fft_aggregated Returns the variance, mean, kurtosis, skewness, and absolute Fourier transform spectrum Refer to [16] Cyclicity  Security and Communication Networks choosing the optimum algorithm service will be quicker and thus it will be accepted by many application users. Table 3 have been selected. As is presented in Table 3, the reason for selecting these datasets for the assessment of the proposed framework is the availability of similar characteristics in the data. e synthetic and real data encompasses all the commonly known three anomaly forms: random, collective, and point anomaly [14].

4.2.
Preprocessing of Data. Standardisation helps numerous machine learning approaches to converge quickly. A dataset is said to be standardised one if its standard deviation σ is 1 and its mean µ is 0. Mathematically, let D be the dataset and σ the standard deviation of D while µ is its mean. en, standardised D is given by the following equation:

Metrics Evaluation.
e performance of our developed framework is evaluated by plotting the receiver operating characteristic (ROC) curve. As a first step, False Positive Rate (FPR) and True Positive Rate (TRP) are illustrated below: where FP denotes the total number of wrong positive predictions, TP denotes the total number of correct positive predictions, and P is the total number of positive-labeled values. A list of δ ∈ R are used as a threshold that leads to various pairs of FPR and TPR for each δ. A list of two-dimensional coordinates from values already computed is made, and then they will be plotted as a curve. e starting pair of points for this curve will be (0, 0) while the ending pair of points will be (1, 1), respectively. e area under the curve is labeled as AUC. Higher AUC represents the higher possibility that the dignified algorithm allocates anomalous points randomly to the time series. Furthermore, higher anomaly scores than random normal points will enable AUC to correctly associate with various anomaly detection approaches. us, in this study, AUC is chosen as an evaluation metric.

Comparison of Various
Methods. Five algorithms out of numerous sets of algorithms such as Long Short-Term Memory Networks (LSTM) [22], Local Outlier Factor (LOF) [25], Prediction Confidence Interval (PCI) [20], One-Class Support Vector Machines (OC-SVM) [26], and Autoencoder [21] are set as baseline algorithms. ese algorithms represent machine learning techniques, deep learning techniques, and statistical techniques that are developed for anomaly detection in stream data. Some of the hyperparameters used in our study are borrowed from the work of Bu et al. [14]. Table 4 explains the hyperparameters of these algorithms.

Experimental Procedure and Outcome Analysis.
First, each and every dataset is divided into training set 60% set and testing set 40% using a stratified statistical sampling technique. Each time series of the training dataset and its appropriate algorithm are constructed and computed as a paired dataset. Secondly, the XGBoost model is trained to recognise the patterns of a stream using the paired dataset as an input. irdly, the trained XGBoost model is used for the recognition of patterns in each time-series in the test set and finds out a suitable algorithm as a service. Finally, the performance of ADS with the recommended algorithm employed on each time series is evaluated. e AUC values of the anomaly detection datasets are presented in Table 5. e outcomes presented in Table 5 Line  Security and Communication Networks proved that for a given dataset the most suitable algorithm may be different in each case. Results presented in Table 5 signpost that LSTM performs best for the NYCT dataset, LOF performs best for dataset 1, while OC-SVM achieves best for datasets 3 and 4. As is given in Table 5, out of the five datasets, our framework shows better performance in four. Even in the case of dataset 2, the performance is nearly equal to the OC-SVM which is the best algorithm. is is the reason; our ADSS framework for algorithm service selection can quickly and flexibly choose the most appropriate algorithm service for any type of data flow processing.

NAB Dataset and Its Outcome
Analysis. In paper [32], researchers compared multiple anomaly detectors such as Skyline, Relative Entropy, and HTM-based algorithms. From its public available experiment reports, we found that Numenta algorithm could achieve the best average performance on all the datasets. However, for one certain dataset such as Twitter_volume_UP, the EarthgeckoSkyline could defeat other detectors. Inspired by the ensemble learning and algorithm selection strategy, we use the supervised learning method to choose a suitable detector for one certain dataset, so we show the experiment on NAB dataset. In NAB results, the evaluation metrics are Standard Score, Reward Low FP rate scores, and Reward low FN rate scores; for more information, one can refer to [17]. e process of the experiment is the same as that explained in Section 5; the performance of the experiments on the NAB dataset is shown in Table 6. As is demonstrated in Table 6, our framework had achieved better performance considering all these detectors as candidate ADAs. A conclusion can be drawn that our framework could recognise the feature of streaming data and help choose a good detector for it and achieve better performance on average.

Outcome Analysis.
In our framework, the algorithm is decided and recommended as best for current stream data and be configured to check the anomalous data. e base algorithms can be added as needed and the available algorithms will become more and more. So, in the long run, when we add enough algorithms to the service pool, the final anomaly detection performance will become better. is framework takes full advantage of metalearning idea which recognises the stream data pattern and configures its best algorithm.   Our framework is not creating a new algorithm; instead it is choosing the finest algorithm for any time-series data, thus possibly improving the total performance of the entire IoT system. In general, our framework modifies the quality of service, in the background of encapsulation of algorithm as web service.

Conclusion
In practice, it is unfeasible to build a universal method to detect all types of anomalies in IoT stream data; we attempt to discriminate the data pattern and adjust appropriate ADS. Various ADSs can be chosen and then according to their stream data pattern, they can be configured. We attempt to extract features of a stream and select an appropriate algorithm for its anomaly detection.
Experimentations through five datasets (illustrated in Table 3) demonstrate the performance of our method and are presented in Table 5. e experimental outcomes described in Table 5 prove that our method is able to select the accurate service proficiently and can recognise the data pattern efficiently. Moreover, the result on the NAB dataset is shown to further illustrate the good performance achieved by our method.
To further analyse the experimental result, we found that our method is like an ensemble learning process that will merge together different kinds of models in order to achieve better results. Different from the traditional ensemble method, we try to capture the intrinsic characteristics of streaming data from the view of feature engineering. So, the Tsfresh tool and GA algorithm played an important role when finding the importance of features. However, our method is able to select the service efficiently and can recognise the data pattern efficiently. Still, our method needs sufficient historical data to improve the accuracy of a service selection process that can be done by collecting further real-world data and experimenting with more artificial dataset in the future.

Data Availability
All the data used to report the findings in this paper are provided in the form of tables in the paper.

Conflicts of Interest
ere are no conflicts of interest as declared by the authors of this paper.