Shapelet Discovery by Lazy Time Series Classification

As a representation of discriminative features, the time series shapelet has recently received considerable research interest. However, most shapelet-based classification models evaluate the differential ability of the shapelet on the whole training dataset, neglecting characteristic information contained in each instance to be classified and the classwise feature frequency information. Hence, the computational complexity of feature extraction is high, and the interpretability is inadequate. To this end, the efficiency of shapelet discovery is improved through a lazy strategy fusing global and local similarities. In the prediction process, the strategy learns a specific evaluation dataset for each instance, and then the captured characteristics are directly used to progressively reduce the uncertainty of the predicted class label. Moreover, a shapelet coverage score is defined to calculate the discriminability of each time stamp for different classes. The experimental results show that the proposed method is competitive with the benchmark methods and provides insight into the discriminative features of each time series and each type in the data.


Introduction
In recent years, massive time series data have been generated in many fields, including weather forecasting [1], malware detection [2], voltage stability assessment [3], human identification [4], and biomedicine [5]. Hence, the study of time series has been widely applicable, among which classification is an important research field. e classification issue of time series is the same as the traditional classification problem. We hope to find a function that can map any time series to a target class label. Although a large number of time series classification algorithms have been proposed, extensive experiments show that the 1NN classifier combining different distance metrics is still a competitive model in many problem areas [6][7][8][9][10][11][12]. In addition to the common Euclidean distance, alternatives have been proposed to measure the similarity between time series, including dynamic time warping (DTW) [13], weighted DTW (WDTW) [14], edit distance with real penalty (ERP) [15], time warp edit (TWE) [16], and move-split-merge (MSM) [17]. e improved distance function can help promote the performance of the nearest-neighbor model, but the 1NN classifier presents obvious drawbacks. is classifier cannot indicate the common characteristics of similar instances and the dissimilarity between different classes. In other words, its interpretability is insufficient. In reality, except for the accuracy, the features of distinct instances are our concern. ese features provide a deeper understanding of data and improve the interpretability of the classification model. Unfortunately, time series usually have no definite features. Hence, various feature prototypes are proposed to mine potential patterns of time series [9,18,19]. Among them, the most classic is the local discriminative features "shapelet" [18].
A shapelet (as shown in Figure 1(a)) is a special discriminatory subsequence of time series, which is originally applied to construct the shapelet-based decision tree (SDT) [18] through recursively searching for the best shapelet in the training set. Since the shapelet can be used to establish an interpretable classification model, it has been widely studied [9]. Shapelet-based classification models can be divided into two categories. One type of the method utilizes the top-k shapelets to create a transformed dataset, on which the traditional classification algorithms [20][21][22][23][24] could be applied. e other uses the shapelets to build the classification model directly [18,[25][26][27][28].
One major problem that all shapelet-based approaches generally face is the massive size of the candidate shapelet set. To solve this problem, researchers have put forward several methods. ese methods can be roughly divided into four categories: (1) training instances are selected to generate the candidate shapelets. For example, Ji et al. [29] put forward a subclass splitting method to sample the training instances for candidate shapelet generation. (2) Heuristic shapelet search method. Grabocka et al. [27] presented a heuristic gradient descent shapelet search algorithm, which created a smaller candidate shapelet set. Rakthanmanon et al. [26] proposed a fast shapelet discovery algorithm based on Symbolic Aggregate approXimation (SAX). Similarly, Fang et al. [28] introduced a novel method to search shapelets based on piecewise aggregate approximation (PAA). (3) A random selection mechanism is used to select shapelets. Renard et al. [30] first proposed a randomshapelet algorithm to build the decision trees. Karlsson et al. [31] constructed the shapelet-based random forest, in which each decision tree is built based on instances and shapelets selected randomly. Further, to omit the shapelet threshold search, Shi et al. [32] have put forward the random pairwise shapelet forest. (4) Reformulating the shapelet search problem into a numerical optimization problem. For example, Hou et al. [33] treated the shapelet search task as a numerical optimization problem, and then the shapelets were learned by numerical analysis methods. Likewise, Wang et al. [34] designed a semisupervised shapelet learning model, which transforms the feature search problem into a joint optimization problem. Ma et al. [35] proposed an endto-end model to learn the most discriminative shapelets by the gradient descent method. Zhao et al. [36] recently proposed a regularized shapelet learning framework to improve the shapelet learning efficiency. Although the above methods improve the classification efficiency of shapeletbased models to some extent, the vast majority of shapeletbased global classification models still has the following disadvantages: (1) e shapelets captured by most shapelet-based models cannot adequately reflect the information of feature distribution and frequency of each class in the dataset. For example, due to the existence of intraclass variation, a few instances in different classes may have low-frequency discriminative features. (2) e whole training set is generally applied for the discriminatory evaluation of candidate shapelets.
Owing to the influence of the redundant instances and the intraclass variability, the extracted shapelets are merely the best on average for instances in the training dataset and cannot accurately reflect the local characteristics of the instance to be classified. In other words, the established shapelet-based model is not suitable or efficient for each test instance. Targeted evaluation strategies are not given enough attention.
To address these problems, we have first proposed a lazy shapelet-based model to capture the local features of each instance in the literature [37] (the literature [37] is a poster in the conference ICONIP 2018. In the conference manuscript, we simply proposed a lazy model to classify the instance based on its own local features. However, in this paper, the extended version further studies the fusion of global and local similarity, the local feature distribution and frequency information discovery, etc. In addition, this paper includes details about how to discover the local features for each instance to be classified in the shapelet-based model, how to use the local characteristics of each instance to determine the classwise discriminatory information, more experiments on parameter setting, statistical analysis, model comparison, and case studies). However, the proposed model still cannot get insight into the feature distribution and frequency information of the time series data. For that,  Figure 1: (a) e shapelet found by SDT for the ItalyPowerDemand dataset, and its corresponding training instances. e black bold part indicates the discovered time series shapelet. (b) ree time series from the ItalyPowerDemand training dataset and their corresponding shapelets captured by LSCR. Train 9 and Train 44 (in blue and red, respectively) are instances with different classes, while Train 9 and Train 46 (in blue) are instances of the same type Class2. Based on the discovered shapelets, our model can make correct predictions for the three instances. (c) e classwise shapelet coverage score obtained on the training set for each sampling point.
we significantly extend the research on the data-driven, shapelet-based model (lazy shapelet classification route, LSCR) to study the feature distribution and frequency information discovery. Here, the advantages of our model are interpreted in conjunction with Figure 1. From Figure 1(b), it can be found that the heterogeneous instances are different, and that there are also differences between the homogeneous instances. Since LSCR performs targeted analysis for each instance, compared with the SDT shapelet (as shown in Figure 1(a)), our model may capture characteristics that SDT cannot discover. For example, the shapelet S 46 0 does not appear in the model built by SDT. Further, to evaluate the classwise discriminative feature frequency, the shapelet coverage score is defined. From Figure 1(c), we find that the scores can not only indicate the local discriminant intervals for different classes but also reflect their frequency information. For instance, the low-frequency discriminative interval [0, 7] reflects the location of the local features detected by our model on a few instances of Class2, which are usually caused by intraclass variation and ignored by the global shapelet-based model. e main contributions of this paper are summarized as follows: (1) In contrast with the classical kNN or 1NN model based on global similarity, our model is a fusion of global and local similarities. For the consideration of global similarity, the instance selection strategy is used in evaluating the discrimination of shapelets. e smaller evaluation dataset can eliminate the interference of intraclass variation and improve the classification performance. In addition, local similarity is applied instead of global similarity for prediction, which makes the proposed model more interpretable.
(2) To reduce the massive number of redundant candidate shapelets generated by the brute-force algorithm, a novel strategy is proposed for extracting candidate shapelets from the instance to be classified. is strategy can guarantee that the extracted shapelets accurately reflect the local characteristics of each test instance.
(3) e shapelet coverage score of each sampling point is calculated to analyze the local characteristics of different classes in the dataset. Since the proposed model can efficiently analyze the local features of each instance, more accurate local characteristic information can be obtained. In particular, the classwise discriminative feature frequency and distribution information can be presented, which can help us to understand the data more comprehensively.
e remainder of the paper is organized as follows. Section 2 introduces related concepts and basic theories. Section 3 describes the proposed model and algorithm design in detail. Section 4 presents the experimental analysis. Section 5 offers the conclusion of this paper.

Definitions and Notation
In this section, some definitions and formulas related to our model will be presented.

Definition 1 (time series). A time series
T is an ordered sequence that contains m actual observation values t 1 , t 2 , . . ., and t m , i.e., T � t 1 , t 2 , . . . , t m , t i ∈ R. e symbol D � T 1 , T 2 , . . . , T n represents the dataset containing n time series. Definition 2 (time series subsequence and shapelet). Given a time series T � t 1 , t 2 , . . . , t m , a subsequence S of T contains l consecutive values from T; that is, S � t i , t i+1 , . . . , t i+l−1 , where 1 ≤ i ≤ m − l + 1. A shapelet is a tuple (S, δ) that consists of a subsequence S and a distance threshold δ. Definition 3 (candidate shapelet set). e candidate shapelet set is composed of subsequences of time series. e symbol D node is a dataset corresponding to an arbitrary tree node in the model SDT or LSCR, and W l (D node ) is a set of candidate shapelets with length l built on the dataset D node . W l (D node ) in SDT could be represented as follows: where W l (T i ) denotes the set of subsequences with length l from T i . However, in our work, since the best shapelets are searched from the subsequence space of the instance to be classified, the set of shapelet candidates of length l for each node in LSCR is where T is the test instance. en, the whole candidate shapelet sets W(D node ) for each node in SDT and LSCR can be obtained through the following equation: where min and max are the minimum and maximum candidate lengths, respectively. erefore, compared with SDT, LSCR reduces the scale of candidate shapelets in a single node by an order of magnitude.
Definition 4 (similarity of equal-length time series). Let dist(T i , T j ) be a similarity function of time series, which takes T i and T j with equal length as the input. e function will return a nonnegative value, which represents the similarity degree.
Generally, the smaller the distance is between two time series, the more similar the two time series are. In reality, we Computational Intelligence and Neuroscience often need to judge the similarity between unequal time series. For example, in our work, we need to determine whether a time series contains a specific local feature through the distance between a subsequence and a whole time series.
Definition 5 (similarity of unequal-length time series). Let distul(T, S) be a similarity function, in which time series T and S have different lengths. e function returns a nonnegative optimal matching distance between two sequences as the degree of similarity. e distance between time series T and sequence S is where the symbol W |S| represents the subsequence set with length |S| of time series T (|S| and |T| denote the length of sequences S and T, respectively, and |S| ≤ |T|). In our model, the distance between subsequences with equal length is calculated by the Euclidean distance, while the distance between complete time series is measured by the specified distance function.
In reality, different subsequences may have disparate discriminability. In our work, information gain is used to measure the discrimination of shapelets. To reduce distance computation and improve the shapelet discriminant property, the concept of the evaluation dataset is put forward.
Definition 6 (shapelet evaluation dataset). e shapelet evaluation dataset is a specific subset of the training dataset, which is designed to evaluate the discriminability of local features of each test case.
Definition 7 (entropy of dataset). e entropy of a given dataset D is calculated by the following formula: where c i is an element in the class value set C of D, D c i is the subset of instances with class c i in D, and the proportion p(D c i ) is calculated by Definition 8 (shapelet information gain). Given a shapelet S and a dataset D containing instances with different classes, the information gain of S is calculated as follows: where s is a split distance that can be applied to divide the dataset into two subsets: Finally, the maximum value of information gain is normally treated as the shapelet discrimination, and the corresponding distance s is taken as threshold δ. e split distance s usually takes the middle distance between two distance points. e detailed calculation process can be found in the literature [18]. In our model, the shapelet information gain calculated by equation (7) on a specific evaluation dataset reflects the reduction in uncertainty in the predicted class label of the test instance. Here, the mathematical description of our model is introduced.
Given a specific test instance T, its uncertainty of predicted class in our model is where D e (T) is the initial evaluation dataset of T.
In the prediction process, the proposed model LSCR tries to progressively reduce the uncertainty by its own characteristics. e model can be formulated as where S i denotes the ith element in the learned shapelet set S of T. Additionally, D(T) i e is the corresponding evaluation dataset of S i , which is determined by Finally, the main class property of the dataset D(T) |S|−1 e would be taken as the predicted class value of T. Generally, there will be only one type of instance left in the final evaluation dataset.

Overview.
To provide a brief introduction to the model LSCR, a schematic diagram is first presented in Figure 2. As determined from the figure, the evaluation dataset D e for the test instance T is first generated; second, the candidate shapelets from T are evaluated on D e , and then the best shapelet is employed to exclude the instances that do not contain the local feature represented by S T 0 ; third, the optimal shapelet on the dataset D e ′ is continually searched until termination. Generally, there will be only one class of instances left in the final dataset, and the class will be taken as the predicted value. e predicted result may be different from the class value of the nearest neighbor or the majority class in the initial dataset. is is the greatest difference between our model 4 Computational Intelligence and Neuroscience and the nearest-neighbor models. Here, the model will be described in detail.

Building Shapelet Evaluation Dataset.
From the perspective of information theory, the purpose of extracting the best shapelet is to minimize the uncertainty of the class label of the test instance. e uncertainty is reflected in the class distribution of each subdataset generated based on whether the instance contains the specific feature. Hence, the more unbalanced the distribution of the subset of instances selected based on the shapelet, the more discriminative the feature. For example, for a binary classification problem, it is ideal to use the shapelet to divide the dataset containing instances with different class values into two subsets, each of which contains only one type of instance.
For the large-scale dataset, the running time of the searching shapelet is unbearable, so we attempt to sample the training instances for shapelet evaluation. In this paper, instances are selected based on the neighbor distance and the class value of the closest neighbor. Moreover, if the nearestneighbor instances in the neighborhoods corresponding to the initial node of the classification path belong to the same class, then the route degrades into a single node. In particular, when the size of neighborhoods is set to 1, the model degenerates into 1NN, and the discriminant feature cannot be extracted effectively. In view of these problems, we propose to build a small targeted subset that contains instances with different classes for the instance to be classified. e subset ensures that the distinguishing nature of the local features can be evaluated. In addition, the data sampling strategy can eliminate the impact of intraclass variation on the local feature discriminant evaluation.
As shown in Algorithm 1, according to the class value of the nearest-neighbor instance, we select k identical and different instances for the test instance to construct a shapelet evaluation dataset (lines 2-6).

Finding the Optimal Shapelet.
To reduce the computational complexity of extracting the best shapelet and make the extracted shapelets better reflect the characteristics of the test instance, a data-driven shapelet search algorithm is further proposed to find the best shapelet. We only search the best shapelets from the subsequence space of the instance to be classified so that the extracted shapelet accurately reflects the local features of each test instance. e process of generating the candidate shapelets collection for the test instance is given in Algorithm 2. In the algorithm, each subsequence of T with starting point i and length j constitutes the candidate shapelet set (line 4). e candidate set corresponding to each node in the model SDT contains O (nm 2 ) candidate shapelets, where n is the number of time series and m is the length of each time series. In our model, we only consider the subsequences of test instance T, so there are only O (m 2 ) candidate shapelets to be evaluated for each node. Our work reduces the size of candidate shapelet collection of each node by one order of magnitude. e purpose of extracting shapelets from time series is to classify the time series using the discriminator. e discriminability of shapelets provides us with a way to explain the classification results. Algorithm 3 introduces the method for finding the best shapelet for the instance to be classified on the evaluation dataset. First, the candidate shapelet set is generated for T (line 3). en, each candidate is evaluated by equation (7) to search for the best shapelet (lines [4][5][6][7][8]. e time complexity of our model to find the optimal feature is O (km 4 ), while that of the brute-force search algorithm in SDT is O (n 2 m 4 ). Considering that k is smaller than n, the calculation in the progress of searching for the best shapelet is significantly less. In addition, two ways to improve the efficiency of shapelet search are used [18,25]: the early abandonment mechanism and the shapelet entropy pruning strategy.

Lazy Shapelet Classification Algorithm.
A classification route based on shapelets for each instance to be classified is built through Algorithm 4.
Algorithm 4 mainly consists of five steps. First, the targeted shapelet evaluation dataset is established for T (line 1). Second, the candidate shapelet set is generated for T (line 2). ird, the Terminal node Initial node: D e 1st node:D′ e T 11 T 12 Figure 2: A shapelet classification route diagram with 3 nodes. e symbol S T i denotes the ith shapelet on the classification route of the test instance T, and δ i is its corresponding split threshold. e circle, rectangle, and triangle icons represent neighbors of T from Class1, Class2, and Class3, respectively. ey are distributed around the center (the black spot) of the circle according to their distances from T.
Computational Intelligence and Neuroscience model searches for the best shapelet on the evaluation dataset (line 3) and judges whether the termination condition is satisfied (line 4). It does not meet the termination condition at the initial node; that is, it will not degrade to the single node route. Generally, a best shapelet S is found. Fourth, the extracted shapelet S is applied to update the evaluation dataset for the child node (lines 7-8). Only the training instances whose distances are not greater than the split threshold are selected to form the subdataset. Fifth, the model repeats steps 2-8 until the termination condition is satisfied (line 10). Last, the shapeletbased classification route for T is returned (line 11).

Computing Shapelet Coverage Score
Definition 9 (shapelet coverage). Shapelet coverage refers to the corresponding time interval of a given shapelet S. If a time stamp t falls within the shapelet coverage of S, then we state that t is covered by S.
In our work, the discriminatory score is calculated for each time stamp based on the coverage intervals of all obtained shapelets. First, an indicator function is presented to determine whether a time stamp t is covered by a given shapelet S: Input: training dataset: D; test instance: T; the number of homogeneous and heterogeneous instances: k; the distance function used to calculate the similarity between complete time series: dist. Output: the shapelet evaluation dataset: shapeletEvaluationData.
Input: test instance: T; the minimum and maximum length of the shapelet: min and max. Output: the candidate shapelets set: CandidatesSet.
Input: test instance: T; shapelet evaluation dataset for T: D e (T); the minimum and maximum length of shapelet: min and max. Output: the best shapelet of the test case: bestShapelet.
if gain > bestGain then (7) bestGain ⟵ gain (8) bestShapelet ⟵ S (9) end if (10) end for (11) return bestShapelet ALGORITHM 3: FindingBestShapelet (T, D e (T), min, max). 6 Computational Intelligence and Neuroscience where sp denotes the starting position of S in the time series. en, based on shapelets captured on the decision path of all correctly predicted time series, the importance of time stamps for different classes can be evaluated through the following formula: where D c correct represents the set of correctly predicted instances with class value c and S T indicates the set of shapelets captured on the classification path of T.
In essence, the coverage score score(t, c) reflects the discriminability of the time stamp t, which can be used to detect the distribution of distinguishing feature intervals and the feature frequency information for each category. Generally, the interval composed of several consecutive time stamps with similar scores corresponds to the local differentiating features. erefore, in our work, the interval satisfying the condition would be regarded as the local feature location. In addition, the scores demonstrate their occurrence frequency.
Here, the algorithm of computing the shapelet coverage scores for different classes in the dataset will be introduced.
In Algorithm 5, to compute the shapelet coverage score for each class, the dataset D i correct of the correctly predicted instances with the specific class is first obtained (lines 2-3).
en, all shapelets captured by LSCR for every instance in D i correct are collected (lines 4-6). Finally, the scores reflecting the discriminability of each time stamp for every class are calculated based on equation (12) (line 7).

Experiments
Experimental analyses are conducted on 20 datasets from the UCR time series repository [38], most of which are frequently used for evaluation of shapelet-based models [9,21,22,24,27]. e experimental data are divided into training and test parts. e former part is used to build the model, while the latter part is applied to calculate the classification accuracy. e information of datasets is listed in Table 1, including train (size of the training set), test (size of the test set), max_k (the minimum number of instances of a class in the training set; that is, the maximum value that parameter k can take.), length (the length of time series), and classes (the number of classes). Input: training dataset: D; the test instance: T; the distance function used to calculate the similarity between complete time series: dist. Output: the classification route for T: CRForT.
(1) Build the initial evaluation dataset D e (T) for T at the root node by Algorithm 1 (2) Generate the candidate shapelet set W(D node ) for T by Algorithm 2 (3) Evaluate each candidate shapelet in W(D node ) based on D e (T) using Algorithm 3 and search for the best shapelet S. (4) if no discriminatory shapelet S can be found in W(D node ) then (5) return CRForT and the majority class c in D e (T) is taken as the predictive value (6) else (7) Update the evaluation dataset D e (T) to exclude instances that do not contain the feature S (8) Build the child node (9) end if (10) repeat steps 2 to 9 until the end (11) return CRForT

Parameter k Analysis.
To study the effect of the size of the evaluation dataset on the discriminative evaluation of shapelets, the accuracy trends of LSCR DTW within the specified range over 10 datasets are first analyzed as a representative. en, the average accuracy curves of 5 fusion models (LSCR DTW , LSCR ERP , LSCR ED , LSCR TWE , and LSCR MSM ) are presented for parameter setting. Figure 3 shows the sensitivity of prediction results of different datasets to parameter k. From Figure 3(a), it can be seen that most accuracy rates on 5 binary-class datasets reach the maximum values when parameter k is 5, and then all of the curves show a significant downward trend. From Figure 3(b), except for the MiddlePhalanxOutlineAgeGroup, the accuracy rates on 4 multiclass datasets exhibit a growth trend as k increases in the previous stage. When k is greater than 6, the accuracy on each dataset tends to be stable or decrease.
e above experimental results suggest that the accuracy of LSCR generally varies regularly with k. erefore, we propose to set k based on the trend of average accuracy.
In Figure 4(a), it can be seen that, on binary-class datasets, the average accuracy variations of 5 fusion models show two significantly different trends. One shows a trend of increasing and gradually becoming stable, while the other presents a significant decline in accuracy after passing the inflection point. In Figure 4(b), on multiclass datasets, the average accuracies of all fusion models first increase and then decrease slowly after reaching the peak. Finally, in our work, the value k corresponding to the highest average accuracy is set as the final parameter of LSCR dist on binaryclass and multiclass datasets, respectively. See Table 2 for specific settings of the 5 fusion models.
In addition, the effect of instance selection on model performance is interpreted in Figure 5. In the scatter diagram, each point stands for a dataset. e more points there are in the figure that fall below the diagonal line, the better the performance of LSCR DTW . Since the targeted evaluation dataset is very helpful to improve the feature quality, the proposed shapelet evaluation strategy can significantly improve the performance of our model. As shown in Figure 5, LSCR DTW outperforms LSCR DTW (whole) on almost all datasets. e accuracy rates of the above two models are listed in Table 3.

Fusion Strategy Analysis.
In this section, the effectiveness of the fusion strategy of global and local similarities is analyzed. Table 3 presents the accuracies of 5 1NN models combined with different distance functions (DTW, ERP, ED, TWE, and MSM) and their corresponding five fusion models. e accuracies of the 1NN models are taken from the website [38]. Figure 6 shows a critical difference diagram studied in literature [39,40] for the 10 classification models on 20 datasets. is diagram is used for the overall test of significance of average ranks and can group models without significant differences into cliques. From the figure, we find that there are no significant differences in the performance of the 10 models on the 20 datasets, but the rankings of all fusion models are better than those of the corresponding 1NN models. is result suggests that the fusion model can effectively improve the classification performance of the 1NN model to some extent. Since LSCR DTW ranks first, it will be used for further analysis in the following.

Performance Analysis.
In this section, the proposed model is compared with shapelet-based classifiers and 1NN classifiers combining with many commonly used distance functions and deep learning models.  Train  Test  max_k  Length  Classes  ArrowHead  36  175  12  251  3  Beef  30  30  6  470  5  BeetleFly  20  20  10  512  2  CBF  30  900  8  128  3  ECGFiveDays  23  861  9  136  2  FaceFour  24  88  3  350  4  FacesUCR  200  2050  4  131  14  GunPoint  50  150  24  150  2  ItalyPowerDemand  67  1029  33  24  2  Lightning7  70  73  8    Computational Intelligence and Neuroscience 9 listed in Table 4, and the accuracies of FS are obtained from the website [38]. In addition, the last line in Table 4 provides the average accuracy of each model over 20 datasets. As observed from Figure 7, it is clear that although there is no significant difference between LSCR DTW and the other 4 shapelet-based models, the average rank of LSCR DTW is the best. Figure 8 presents the scatter plot of accuracy comparison between LSCR DTW and the 4 classical shapelet-based classifiers. Figure 8(a) shows that LSCR DTW is better than SDT (14 of 20) over the 20 datasets. From Table 4, we find that compared with that of SDT built on the entire training set, the accuracies of LSCR DTW on datasets MoteStrain, FacesUCR, SonyAIBORobotSurface2, etc., are significantly improved. In particular, on the dataset FacesUCR, the accuracy of LSCR DTW is 20% greater than that of SDT. In Figures 8(b)-8(d), it can be concluded that LSCR DTW is also better than STC4.5 (13 of 16), SSC4.5 (13 of 20), and FS (14 of   To compare the time complexity of LSCR DTW with other shapelet-based models, the changes in running times of 6 shapelet-based models with increasing number of instances and instance length are shown in Figure 9, respectively. LSCR DTW (single) represents the training time of a single model LSCR DTW for a specific test instance, while LSCR DTW represents the model running on the entire test set. In the experiment, a binary-class dataset with uniform distribution is designed to run the analysis models, and the size of the training set is always the same as that of the test dataset. For the first experiment, the sizes of training and test datasets    Figures 9(a) and 9(b), we can determine that the training time of LSCR DTW (single) is not sensitive to the size of the training set and is polynomial with respect to the length of the time series. In particular, it is faster than the current fastest shapelet-based model FS. Further, as observed from Figures 9(c) and 9(d), with increasing training set size and instance length, the time consumption gap between SDT, STC4.5, and LSCR DTW widens, while the gap between LSCR DTW and SSC4.5 is not obvious.
In conclusion, LSCR DTW is an accurate and rapid shapelet-based classification model, which is the basis of learning feature distribution and frequency information.

Comparison with Various 1NN Classifiers.
In this section, except for the five distance functions listed above, we further compare our model with several DTW variants, including WDTW, complexity-invariant DTW (CID) [42], and derivative distance DTW (DD DTW ) [43]. e accuracies for the above 1NN models are provided by the website [38] (see Table 4). Figure 10 demonstrates that our model exhibits no significant difference from DTW and its variants, but the average ranks of LSCR DTW are the best. In Figure 11 results suggest that continuously narrowing the search space of class attributes based on local similarity is an effective way to make prediction.

Comparison with Deep Learning
Models. Now, various deep learning models have been widely studied in the field of time series classification. Karim et al. [44] attempted to improve the univariate time series classification performance of fully convolutional neural networks (FCNs) by using the long shortterm memory recurrent neural network (LSTM-RNNs) submodules and attention mechanism and proposed the excellent models LSTM-FCN and ALSTM-FCN. Furthermore, the authors applied the above two models to the multivariate time series classification problem [45] and studied the reasons why the two models have superior performance [46]. Fawaz et al. [47] have reviewed some deep learning models of time series classification. Here, our model is compared with LSTM-FCN and 9 deep learning models (ResNet, FCN, Encoder, MLP, Time-CNN, TWIESN, MCDCNN, MCNN, and t-LeNet) analyzed in the literature [47]. All the experimental results were obtained from the corresponding literature (see Table 5). Figure 12 shows that, except for ResNet and FCN, LSTM-FCN is significantly better than other models, and that our model is significantly better than MCNN and t-LeNet. Among the 10 deep learning models, the average rank of our model is better than 7 of them. In Figure 13, it can be seen that LSCR DTW is better than Encoder ( Generally, to improve performance, deep learning model tuning requires an enormous computational cost. To pursue the optimal accuracy rate, Fawaz et al. [48] even proposed the neural network ensemble model with 60 deep learning models, but it is still not better than the traditional ensemble model HIVE-COTE [49]. It is unfair to compare our model with the deep learning model based on accuracy alone. In addition to improving accuracy, we believe that the model interpretability and data comprehensibility require more attention. However, the existing feature extraction methods for time series usually cannot simultaneously obtain the feature distribution and frequency information. For the deep learning model, it is difficult to train a targeted model for the specific instance on a small dataset. Accordingly, in our work, a highly interpretable classification model based on the lazy learning strategy is built for each target instance, which can be applied to gain insight into the local feature  distribution and frequency information. e following is a detailed introduction.

Interpretability.
To demonstrate the stronger interpretability of our model, this section separately analyses LSCR DTW on a binary-class dataset, MoteStrain, and the CBF multiclass dataset.

MoteStrain Dataset.
Sensing data in MoteStrain are originally collected to detect potential variables online in the sensor network [50]. e classification task on this dataset is to distinguish whether the sensor is used for humidity measurement or temperature measurement. e classification performance of our model is significantly better than that of the model used for comparison and is close to the current best classification result provided by Bagnall et al. [38] on this dataset. To further investigate the proposed model, the shapelet decision tree built by SDT is shown in Figure 14, where the symbol S 4 (0,0) represents the shapelet extracted from the fourth training instance Train 4 in the root node of the shapelet decision tree. As seen from Figure 14, there is only one shapelet in the decision tree, where d denotes the distance between the test instance and the shapelet and δ represents the split threshold of the shapelet. Based on the shapelet decision tree, when the distance between the test instance and the shapelet corresponding to the root node is not greater than the split threshold, the class prediction value of the test instance is Class1. Otherwise, it is Class2. Figure 15 shows six instances and their shapelets extracted through LSCR DTW . Our model can correctly      Computational Intelligence and Neuroscience predict the class properties of these six instances, while SDT fails. Meanwhile, it is obvious that there are not only significant differences between instances of different types but also intraclass variations among similar instances. For example, the differences among the three instances with the same class label, Test 11 , Test 129 , and Test 189 , are noticeable, while there are no obvious common features. However, in the shapelet decision tree built by SDT, only a shapelet is found, which is not sufficient to distinguish the two classes. In contrast to the illustration of the shapelet given in Figure 14, it is not difficult to find that the optimal shapelet obtained in the shapelet decision tree is not the most discriminatory feature for the test instances. is result verifies that the shapelets extracted from the entire dataset are the most discriminatory for each training instance in the average sense; this is also the reason for the poor performance of SDT on the MoteStrain dataset.
Since the characteristics of each test instance have been considered in our model, in light of this situation, we can achieve better classification results. In addition, based on the shapelet obtained by LSCR DTW , the prediction process of each instance can be explained. For example, as shown in Figure 15, the reason why the 11th test instance belongs to Class1 is that the local feature S 11 0 lies in its initial stage, while the local feature S 41 0 of the 41st test instance in the middle part determines its predicted class label.    Since there are only 20 instances in the training set, the scores of shapelet coverage (as shown in Figure 16(a)) may not accurately reflect the local characteristics. However, we propose to directly evaluate the local characteristics of test cases, which can help us utilize large amounts of test data. In Figure 16(b), it can be seen that local features with different coverage frequencies of various classes have been detected from the test dataset, which cannot be captured by other shapelet-based models. For example, the intervals [9,16] (low frequency), [44,54] (medium frequency), and [59, 76] (high frequency) (as shown in Figure 16(b)) are three significant discriminative intervals of Class2, while the interval [21,50] (high frequency) is the most discriminative part of Class1. Based on the proposed model, using more instances with accurate labels results in obtaining more accurate local feature information. e statistical information helps us acquire more comprehensive local characteristic information of time series data, such as feature distribution and frequency.

CBF Dataset.
is section studies the multiclass dataset CBF, which contains three types: Cylinder, Bell, and Funnel. Figure 17 shows the coverage scores of the CBF test dataset. It can be determined that the interval [31,75] covers the most discriminative intervals for all three classes. In addition, unlike Cylinder and Funnel, the local interval [0, 15] with relatively low coverage frequency is discriminative for Bell.
Next, four specific test instances are presented. As shown in Figure 18, for the multiclass dataset, our model can not only detect high-frequency shapelets (as shown in Figures 18(a)-18(c)) but also effectively capture low-frequency shapelets (as shown in Figure 18(d)). e proposed model LSCR dist is helpful to make a targeted analysis of each category of data and each test instance.

Conclusions
Aiming at the problems of global shapelet-based models built on the whole training set, a data-driven model fusing global and local similarities is proposed. In the model, the shapelet discriminability is evaluated through a specific subdataset. A smaller evaluation dataset reduces the computational time and improves the quality of shapelets. Moreover, target learning for each instance helps us understand the prediction process clearly. For example, the shapelets extracted by our LSCR model can be directly used to explain what characteristics determine the class value of the test instance. Furthermore, the proposed shapelet coverage score is applied to accurately analyze the local feature information of each class, which provides comprehensive insight into data characteristics. In the future, the application of the model in specific fields will be further studied, including ECG detection and image contour feature discovery.

Data Availability
e time series data used to support the findings of this study have been deposited in the UEA and UCR Time Series Classification Repository (http://www.timeseriesclassification.com).

Conflicts of Interest
e authors declare that they have no conflicts of interest.