Application of Discrete Wavelet Transform in Shapelet-Based Classification

Recently, several shapelet-based methods have been proposed for time series classification, which are accomplished by identifying the most discriminating subsequence. However, for time series datasets in some application domains, pattern recognition on the original time series cannot always obtain ideal results. To address this issue, we propose an ensemble algorithm by combining time frequency analysis and shape similarity recognition of time series. Discrete wavelet transform is used to decompose the time series into different components, and the shapelet features are identified for each component. According to the different correlations between each component and the original time series, an ensemble classifier is built by weighted majority voting, and the Monte Carlo method is used to search for optimal weight vector. 'e comparative experiments and sensitivity analysis are conducted on 25 datasets from UCR Time Series Classification Archive, which is an important open dataset resource in time series mining. 'e results show the proposed method has a better performance in terms of accuracy and stability than the compared classifiers.


Introduction
A time series is a data sequence that represents recorded values of a phenomenon over time. Time series data constitutes a large portion of the data stored in real world databases [1]. Time series data have widely existed in many fields, such as commerce, agriculture, meteorology, bioscience, and ecology. Data such as meteorological data in weather forecast, floating currency exchange rate in foreign trade, radio wave, images captured by medical devices, and continuous signals in engineering applications can be regarded as time series [2]. Time series data are more complex to analyse than the cross-sectional data due to the way in which measurements change over time [3]. Time series classification (TSC) is one of the important tasks in time series data analysis. e TSC is applied to build a classification model based on labelled time series, and then the model is used to predict the label of unlabelled time series. Unlike traditional classification methods, the TSC requires not only numerical relationships between different attributes but also the order relationship between data.
In the past ten years, hundreds of methods have been proposed to solve the TSC problem. One of the traditional methods is the 1-nearest neighbor (1NN) classifier, which uses different distance functions. Faloutsos et al. [4] used Euclidean distance for time series matching. e Euclidean distance can only deal with time series of equal length, and it calculates time series point-to-point in the time axis but cannot match similar shapes if they are out of phase in the time axis. In order to solve these problems, Berndt et al. [5] applied dynamic time warping (DTW) technology in the speech recognition field to the pattern detection in time series. e DTW is a much more robust distance measure for time series. e DTW not only eliminates the "point-topoint" matching defect of Euclidean distance but also achieves "one-to-many" matching of time series data points through stretching or compressing the series. e traditional DTW assigns the same weight to each observation value and ignores the phase difference between the observation value and the test value. On this basis, Jeong et al. [6] proposed to use weighted DTW for time series classification. is kind of 1NN classification algorithm has high classification accuracy and is easy to implement, but it consumes too long computing time and has poor interpretability. Many other researchers have concerned about the measurement of dissimilarity. erefore, several dissimilarity metrics, such as normalized eigenvector correlation (NEC) [7], signal directional differences (SDDs) [8], and square eigenvector correlation (SEC) [9], are proposed recently, which measure the dissimilarity between the features extracted from the distinct path between specific features. ese metrics have been verified to be effective in improving the accuracy of the feature matching technique.
Recently, many researchers have used shape similarity to solve TSC problems. e most popular method is shapeletbased classification. Shapelet is a time series subsequence which can be regarded as maximally representative of a class in some sense [10]. Classification algorithms based on shapelets were proposed at first time by Ye et al. [10,11], and the algorithms used information gain to measure the split point of data and build decision tree by recursively searching the most discriminating shapelets. is strategy is to build a classifier at the same time as shapelets are discovered. In contrast, the other strategy is to map the time series to other spaces at first and then build a classifier. Lines et al. [12] proposed a time series classification method based on shapelet transformation (ST).
is method creates new classification data before constructing the classifier, so that it keeps the explanatory power of shapelets and improves simultaneously the accuracy of classification.
Ensemble learning strategy has also been applied to time series classification, such as time series forest (TSF) proposed by Deng et al. [13], elastic ensemble (EE) method proposed by Lines et al. [14], Collection of Transformation Ensembles (COTEs) method proposed by Bagnall et al. [15], and the Hierarchical Vote Collective of Transformationbased Ensembles (HIVE-COTEs) method based on the COTE proposed by Lines et al. [16]. ese methods combined multiple subclassifications, such as distance measure, shapelet identification, spectrum analysis, other time series feature representation, and transformation strategies. Compared to the method with a single classifier, the ensembled classification method has a higher accuracy, but a higher time complexity. In terms of classification accuracy, Bagnall et al. did a comparative experiment with the current popular time series classification algorithms [17,18] and found the highest classification accuracy is in the order of HIVE-COTE, COTE, and ST. However, the ST is an important part of both COTE and COTE-HIVE algorithms. In other words, the ST is one of the effective methods to solve the time series classification.
Generally, new features extracted from time series may help to improve the performance of classification models. Techniques for feature extraction include singular value decomposition (SVD), discrete Fourier transform (DFT), discrete wavelet transform (DWT), and so on [19]. e DWT as formulated in the late 1980s has inspired extensive research into how to use this transform to study time series. e DWT is a powerful tool for a time-scale multiresolution representation on time series by using wavelets. In contrast to other techniques, the DWT is localized in time, and hence, the wavelet variance can be readily adapted for exploring processes that are locally stationary with time varying [20] and for detecting inhomogeneities in time series [21]. Due to its ability to separate original time series into its decompositions, the DWT is a powerful tool to help researchers capture trends and patterns in data. At the same time, it is a data transformation technique that concurrently localizes both time and frequency information from the original data in its multiscale representation [22].
In this study, combining with the advantages of the DWT and shapelet approach, we propose a new ensemble method, which embeds the DWT into shapelet-discovery algorithm to get a transformed data and then implements an ensemble classifier to train and test the transformed data. By using the DWT, the original time series data are divided to one low-frequency information component and several high-frequency information components. Each decomposed information component is still in the time domain. e shapelet sets are then selected from each component, respectively.
ese shapelet sets reflect the corresponding classification characteristics and are used to convert the original time series into feature vector representations accordingly. ese feature vectors contain more features of the original time series. Base classifier is trained with the transformed data. Finally, a weighted majority voting technique is used to integrate the prediction results of the base classifiers, and the Monte Carlo method is used to search for the local optimal weight vector. We make a comparative experiment with other popular time series classifiers and perform qualitative analysis in this study. e experiment is conducted on 25 datasets from UCR [23]. e results show the proposed method has a good performance in terms of accuracy and stability. e paper is structured as follows: Section 2 provides related definitions on time series classification and shapelet; in Section 3, we propose a new method and describe the overall framework and the details of the method; in Section 4, we describe our experimental design and results and perform qualitative analysis for the proposed method; finally, we draw conclusions based on our analysis results in Section 5.

Related Definitions
Univariate time series dataset: a univariate time series is a sequence of data that are typically recorded in temporal order at fixed intervals. e number of real-valued data is the length of the time series.
A dataset T � T 1 , T 2 , T 3 , . . . , T n has n time series. Each time series T i has m real-valued ordered data < t i,1 , t i,2 , t i,3 , . . . , t i,m > and a class label c i and then Sets of candidate shapelet: every subsequence of series in dataset T is defined as a candidate. So the set of candidate shapelets is the union of subsequences of each series in T.
e subsequence of T i is a contiguous sequence on T i . e length of subsequence can be 1, 2, 3, . . ., m. A subsequence of T i can be described as S i,p,l � t i,p , t i,p+1 , t i,p+2 , . . . , t i,p+l−1 , where p is the starting position and l is the length. So the set of all subsequences of length l in the time series T i is defined as S T i,l � S i,p,l , 1 ≤ p ≤ m − l + 1 .
Similarity measures: classification of time series depends on similarity measures between data. e common time series similarity measures include Euclidean distance, dynamic time warping, Fourier coefficients, and autoregressive model. In this study, Euclidean distance [24] is used to compare the similarity between two time series with the same length. For example, consider two m-length time series, S and R, and let Euclidean distance given by equation (1) be the utilized measure of similarity: Before calculating the distance, the z-normalization method is used to normalize each time series [25] according to equation (2). In equation (2), the X and σ X are mean and standard variance of m real-valued ordered reading data < t i1, t i2, , t i3, , . . . , t im > in each time series T i , respectively: e similarity between each candidate shapelet and each series is measured, and this sequence of distances with associated class membership is used to assess shapelet quality. e candidate shapelet is short, and the time series is relatively long. When calculating the distance between two time series with different lengths, the short series slides on the long series until getting the minimum distance between them. e distance between a time series T i and a candidate shapelet S with length l is defined by equation (3). e distances between S and all subsequences of length l in T i are calculated, and the minimum distance is taken as the distance between S and T i : Information gain and shapelet: in probability theory and information theory, the information gain (IG) is asymmetric to measure the difference between the two probability distributions. e IG is usually used to determine the quality of a shapelet [10,11,26]. After calculating all the distances between a candidate shapelet S and all time series in T, it will get a set D S with n distance values. e D S is sorted, and the IG at each possible split point sp is then assessed for S. Here, a valid split point is defined as the mean value between any two consecutive distances in D S . For each possible split point sp, as shown in Figure 1, the IG is calculated by partitioning all elements of D S < sp into A S , and all elements of D S > sp are grouped as B S , respectively. e IG at sp is calculated according to the following equation: where |D S | is the cardinality of the set D S and H(D S ) is the entropy of D S . e H(D S ) is defined as follows: where V is the set of class label and p v is the probability of each label. e IG of shapelet S, IG S , is calculated as In general, shapelets are extracted with maximum information gain by comparing all the candidate shapelets.

Method Structure.
e proposed method in this study consists of three major parts: decomposition, feature extraction, and classification. e whole process of the proposed method is outlined in Figure 2.
e three major parts of the proposed method are described briefly as listed below: to predict class label. Based on the predictive result of the base classifier, a weighted majority voting is implemented to build an ensemble classifier according to the correlation between components and original data. e weights are optimized by the Monte Carlo method, and then, the final classification result can be obtained.

Discrete Wavelet Transform.
e DWT is a technique of a mathematical origin and is very appropriate for a timescale multiresolution analysis on time series [22]. e DWT provides an effective way to isolate nonstationary signals into signals at various scales. is kind of signal processing is called signal decompositions. Various aspects of nonstationary signals such as trends, discontinuities, and repeated patterns are clearly revealed in the signal decompositions. Some time series data have multiscale signal components that are more meaningful in parts than in sum, such as audio signals and patients' ECG heart rates. For those reasons, the DWT is a suitable technique to combine with classification approaches in order to categorize an unknown signal into a predefined type of signals [22]. is section explains how the DWT assists in the classification process. e effective way to implement DWT is to use a filter, which was proposed by Mallat in 1988 and is well-known as Mallat algorithm.
is algorithm uses filter banks to implement the DWT which can decompose the signal into several different frequency components, and Figure 3 illustrates an example of a two-level wavelet decomposition and reconstruction processes of the decimated DWT.
Generally, a filter bank approach is adopted because of its efficiency. As shown in Figure 3, the S(n) is a real signal, h(n) is the high-pass filters which filter out the low-frequency part of the signal, and g(n) is the low-pass filters which can filter out the high-frequency part. e half-band filters downsample the signal by a factor of 2 at each level of decomposition. At the first level decomposition, the input signal is firstly passed through the wavelet filters and followed by a decimation factor of two. en, the output of the low-pass filter is used as the new input signal, and the same filtering and decimation process will be reiterated. is is carried out until the desired level of wavelet decomposition is reached, or the allowed maximum level is reached. e combination of the filtering and the decimation processes enables the same filters to be used throughout the entire wavelet decomposition procedure [27]. e outputs of the decomposition process are the approximation coefficients (cA i ) and detail coefficient (cD i ), where i denotes the level of filter. In practical application, the appropriate decomposition level is generally selected according to the characteristics of the signal or the appropriate standard.
For the reconstruction process, the original signal can be reconstructed from the approximate and detail coefficients at every level by upsampling by two, passing through highand low-pass synthesis filters, and adding them. e original signal can be reconstructed from the approximation coefficients of the last level and detail coefficients of each level.
Similarly, the approximate component (A) and the detail component (D) of the signal can be reconstructed from the approximate coefficient and the detail coefficient by omitting the other sets of coefficients, separately. is can be done best by setting the corresponding coefficients to zero of matching the same shape. In this way, the reconstructed component is the same length as the original signal. Approximation component can capture rough features that can be used to estimate the original data, while detail components can capture detail features that can be used to describe frequent movements of the data. For example, considering a dataset containing n time series and class labels, each time series has m data points. After choosing the mother wavelet, if the maximum level allowed is R, we can get approximation component matrix A n,m+1 and R detail component matrixes D n,m+1 . e DWT decomposes a single signal into multiscale signals using wavelet functions. e filter coefficients are determined by the mother wavelet. e characteristics of the transformation are also impacted by the choice of the mother wavelet. e commonly used mother wavelets include Haar, Daubechies, biorthogonal, Coiflets, and symlets. e influence of different mother wavelets on classification performance will be tested in the following experiments.

Feature Extraction.
We extract features of on each component through the shapelet transformation, which has been proposed by Lines et al. [12]. e main contribution of shapelet transformation is to separate shapelets discovery and classifier construction. e transformed data can be used in different classifiers. e corresponding algorithm includes two major steps: Step 1: the algorithm performs a single scan of the data to extract the best k shapelets.
Step 2: by calculating the distance between k shapelets and every time series, an instance with k attributes is obtained; then, a new transformed dataset is created.
Algorithm 1 describes the process of extracting k best shapelets from the dataset. e min and max parameters limit the length of the candidate shapelets. Each time a candidate shapelet is obtained, and the distance between the candidate shapelet and every time series is calculated. e results are sorted to calculate the split point that can be used to get the maximum information gain. After all the candidate shapelets are accessed, they are sorted according to the information gain and self-similar shapelets are removed. Finally, the top k shapelets are retained in the set of nonselfsimilar shapelets.
Once the best k shapelets have been found, the transform is performed with Algorithm 2. For each instance of data T i , the subsequence distance is computed between T i and SK j , where j � 1, 2, . . . , k. e calculated k distances are used to form a new instance of transformed data, where each attribute corresponds to the distance between a shapelet and the original time series. e subsequence distance calculation has been described in equation (3). With shapelet transformation technology, the selection process of shapelets is optimized, and different classification strategies can be flexibly applied. On this basis, several other shapelet approaches have been proposed, such as logical shapelets [26], fast shapelets [29], binary shapelets [30], and learnt shapelets [31]. e extracted low-frequency and high-frequency information components in the time domain are used as separate new time series to generate candidate matrix. en, the corresponding shapelets are extracted from the candidate matrix. e distance between the shapelets set extracted from each component will be calculated to form a set of new feature vector. In this step, we can get R+1 transformed matrix T k,m+1 ′ .

Ensemble Classification.
In this paper, we build a combined classifier finally. We train the base classifier on the R + 1 transformation matrix and use weighted majority voting to integrate the prediction results of the base classifiers, and then use the Monte Carlo method to optimize the weight vector. e above process is described by Algorithm 3.
In order to evaluate the strength and direction of relationship between each component and original time series, Pearson correlation coefficient is calculated. e obtained correlation coefficient matrix is normalized to meet the equation 7. e mean value of each type of component is taken as the initial value of weight ω j , where j can be 0, 1, 2, 3, . . ., R. e weights meet the condition shown as follows: For the component with high correlation with the original data, its classifier is assigned a larger weight, so as to improve the performance of the ensemble classifier.
We discuss a multiple classification task with class labels iϵ 1, 2, . . . , c { } and predict the class label y based on the predicted probabilities p for each base classifier L j , where j can be 0, 2, 3, . . ., R. e label y is calculated as follows: where ω j is the weight of the jth base classifier L j and p ij is the class probability for j th classifier L. e key part to build the ensemble classifier is the selection of weights. In the proposed method, the Monte Carlo Input: a list of time series T, min, and max length shapelet to search for and k the maximum number of shapelets to find Output: the best k shapelets 1: k shapelets ⟵ Φ 2: for all T i in T do 3: shapelets ⟵ Φ 4: for ⟵ min to max do 5: for p ⟵ 1 to m − l + 1 do 6: S T i,l ⟵ generateCandidate(T i , l) 7: for all candidate shapelet S in S T i,l do 8: D S ⟵ subdist(S, T) 9: quality ⟵ assessCandidate(S, D S ) 10: Shapelets.add(S, quality) 11: removeSelfSimilar(shapelets) 12: sortByQuality(nonself − similar shapelets) 13: return kShapelets ALGORITHM 1: ShapeletSelection(T, min, max, k).
Input: SK, a set of the best k shapelets which is generated from the training data and T, dataset containing time series and class labels Output: a new transformed dataset 1: method is used to find the optimal weight parameters, as described in Algorithm 4. It includes following major steps: Step 1: Pearson correlation coefficient of each component and the original time series is calculated and normalized. e mean value of each type of component is taken as the initial value of the weight ω j .
Step 2: the initial weight ω j is multiplied by the predicted class probability of the base classifier corresponding to each component, and the maximum probability is taken to determine the final class and to obtain the accuracy of the ensemble classifier.
Step 3: the extreme value of each component's Pearson correlation coefficient can be calculated in Step 1, and it is recorded as d j , where j can be 0, 2, 3, . . ., R. e new weight combination is generated by the Monte Carlo method. In each Monte Carlo event, we generate R+1 uniformly distributed random number in range of [ω j − d j , ω j + d j ]. After N simulations, N groups of weight combination will be produced.
Step 4: the N groups of weight combination will be substituted into Step 2 to calculate the accuracy, respectively. e maximum accuracy is the result of this step.
Each iteration contains N times Monte Carlo simulation. If the accuracy does not improve compared to the accuracy in last iteration, we will update d j to 2d j to broaden the domain of generated random numbers and increase the Monte Carlo statistics from N to 2N .
Monte Carlo simulation is a computerized mathematical technique to generate random sample data based on given distribution for numerical experiments. We use Monte Carlo to generate a large set of random weight vector, and the range of weight is constrained by d j so that the prediction result of components with strong correlation will be given a higher weight. Different weight vectors are calculated with the above method to get different accuracies, and the optimal weight vector and accuracy are obtained after several Monte Carlo iterations.
In Figure 5, the blue dot line indicates the termination position of the iterations. e condition of iteration termination is that the accuracy obtained is no longer increasing. Obviously, this method cannot obtain the global optimum, but the weight obtained is closest to the initial Input: R+1 transformed matrix T′ the original time series dataset T base classifier L simulation times N Output: the optimal weights and the maximum accuracy 1: get the initial weight ω < ω 1 , ω 2 , . . . , ω R+1 > and step length d j 2: for all T i ′ in T′ do 3:

Experimental Dataset.
In this paper, we use 25 datasets from UCR repository [23]. ese have been commonly adopted by TSC researchers. e basic information of the datasets is shown in Table 1.
e classification labels of multiclassification datasets are represented by Arabic numerals. For example, for 4 classification datasets, the classification labels are 1, 2, 3, and 4, respectively. As shown in Table 1, the types of datasets used are diverse and come from three fields, including sensor data, image contour information, human ECG, and action data. e length is also different, the shortest is 24, and the longest is 512. erefore, the performance of the algorithm can be comprehensively tested. In order to facilitate the performance comparison, the default training set and test set partition are adopted in this paper, k value is set to m/2, min value is selected to 3, the max value is m, and m is the length of time series. e initial value of N is 1000 in our experiments.

Experiment Design.
Our first objective is to choose a base classifier which has best performance on transformed data. For this purpose, we test the performance of five traditional classifiers on the transformed data constructed by the ST method. ese classifiers are Naïve Bayes [32], C4.5 decision tree [33], support vector machines [34] with polykernels, random forest [35,36] (with 100 trees), and Bayesian networks [37]. ese algorithms are commonly used in machine learning. e characteristics of the transformation are impacted by the choice of the mother wavelet and the number of detail levels, and thus, the mother wavelet type and the number of detail levels should be taken into consideration in the experiment. We try different mother wavelets and number of levels to test the influence of these two parameters on the results.
Finally, we implement a comparative experiment to compare the performance between our method (DSE) and other six time series classifiers, including 1-nearest neighbor classifiers using Euclidean distance (1NN-ED) based on raw data, 1-nearest neighbor classifiers using dynamic time warping (1NN-DTW) based on raw data, 1-nearest neighbor classifiers using dynamic time warping with window size set through cross validation (1NN-DTWCV) based on raw data, a random forest classifier based on raw data binary shapelet transform (BinaryST) [30], time series forest (TSF) [13], and elastic ensemble (EE) [14].

Evaluating Indicator.
To the classification problem, classification accuracy is the most important criterion to evaluate algorithm performance. In addition to accuracy, Friedman test and Nemenyi test are widely used in machine learning to evaluate the performance of algorithms over multiple datasets. After getting the accuracy of the K algorithms on the N dataset, Friedman test ranks algorithms for each dataset separately. e algorithm with the highest classification accuracy is marked as 1, and the secondhighest label is marked as 2, and so forth. e algorithms with the same accuracy value will be marked as average ranks between them. In this way, we can get a rank matrix of N × K. r ij is the rank mark of the i th dataset on the j th algorithm, and the average ranges R j are calculated as follows: Under the null hypothesis, all algorithms are equivalent, so their R j should be equal. e Friedman statistics is defined by  Mathematical Problems in Engineering which is according to χ 2 F with K − 1 degrees of freedom. e research of Demiša et al. [38] shows that Friedman's statistics are too conservative and proposed a better statistical formula as follows: which is according to the F-distribution with K − 1 and (K − 1)(N − 1) degrees of freedom. If the null hypothesis is rejected, indicating significant differences between these algorithms, the difference between the algorithms can be tested by the Nemenyi test to compare all the algorithms to each other. At a significance level of α, the critical difference (CD) value is defined by the following equation: All algorithms were divided into different groups by the CD value so that there was no significant difference in the performance of the algorithms in the group. In this way, performance differences between different algorithms can be represented by the critical difference diagram.

Experiment Results.
e experimental platform used in this paper is Python 3.7, hardware configuration: Pentium Dual Core CPU (2.5 GHz), 8G memory. Table 2 lists the accuracy results from five classifiers on the transformed data. Random forest has a good performance, with an average rank of 2.2200 and the best performance in 13 out of 25 problems. e results show that random forest provides a reliable predictive performance on different datasets.

Base Classifier Selection.
Random forest [35] refers to an ensemble learning method of training, classifying, and predicting sample data by using multiple decision trees whose outputs are aggregated by majority voting. To classify a new instance, each decision tree provides a classification for input data; random forest collects the classifications and chooses the most voted prediction as the result. e input of each tree is sampled data from the original dataset. In addition, a subset of features is randomly selected from the optional features to grow the tree at each node. Each tree is grown without pruning. Essentially, random forest enables many weak or weakly correlated classifiers to form a strong classifier [36]. It does not need to assume data distribution; it can handle thousands of input variables without variable deletion. It is relatively fast, simple, robust to outliers and noise, and easily parallelized; avoids overfitting; and performs well in many classification problems.
In the following experiments, we chose random forest as the base classifier.
As shown in Figure 6, in terms of ECG dataset (ECG200, ECGFiveDays, and TwoLeadECG) and sensor dataset (DodgerLoopWeekend, SonyAIBORobotSurface1, and Ita-lyPowerDemand), the choice of parameters has little effect on the results. Generally, the best prediction accuracy can be achieved after one level decomposition. Increasing the value of level leads to increasing the amount of calculation and may also cause a significant decrease in accuracy. In terms of image dataset (BeetleFly, Herring, and BirdChicken), the choice of parameters has significant influence on the results. For example, the highest accuracy is 0.9500 with Haar wavelet in level 2 on the BeetleFly dataset, the highest accuracy is 0.9500 with Haar wavelet in level 2 on the BeetleFly dataset, the   Note. e results highlighted in bold denote that the method gets the highest accuracy for this dataset.
Mathematical Problems in Engineering highest accuracy is 0.6562 with coif4 wavelet in level 2 on the Herring dataset, and the highest accuracy is 1.0000 with Haar wavelet in level 3 on the BirdChicken dataset. Table 3 lists the classification accuracies of seven classifiers for 25 datasets. e last two lines of Table 3 represent the average rank of each classifier on different datasets and best performing times, respectively. According to the results shown in Table 3, the EE is the best classifier, with an average rank of 2.38, and the best performance in 12 out of 25 problems. e performance of DSE proposed in this paper is slightly lower than the performance of EE. It wins on 8 out of 25 datasets and has the close average rank of 2.64 to the EE. e EE integrates a variety of distance measurement methods, and the DSE only uses Euclidean distance, which could lead to the little difference of performance between them. However, the DSE is still significantly more accurate than all the other alternatives, including BinaryST.

Comparison Result.
is underlines the utility of decomposition on original time series data. e DWT is effective to improve the accuracy of shapelet transformation method.
When the significance level is 0.05 and the degree of freedom is (6, 144),F F � 2.2781 > F 0.05 (6, 144) � 2.162. erefore, given the significant level of 0.05, the original hypothesis is rejected, and the seven classifiers are significantly different. e critical difference diagram is shown in Figure 7. e critical difference for α � 0.05 is 1.8019. Figure 7 depicts the superiority of the proposed method, and the EE and DSE have significantly a higher accuracy than the BinaryST, the TSF, the 1NN-DTW, and 1NN-ED on these datasets. e difference between the performance of DSE and the EE is not significant, relatively.
Based on the above analysis, the results show that the performance of the DSE method proposed in this paper is very close to the EE method and has higher accuracy and better stability than the other five compared classifiers.

Conclusions
In this study, an ensemble method by combining time frequency analysis and shape similarity recognition of time series is proposed to solve TSC problems. e proposed method embeds DWT into the shapelet-discovery algorithm to produce a transformed data and then trains and tests base classifier on the transformed data; finally, the method implements a weighted majority voting on the results of base classifiers according to the correlation between components and original data. e experiment results indicate that the proposed method outperforms other methods in terms of accuracy. We also pay attention to the influence of parameter selection for the results and carry out study, which gives suggestions on the selection of mother wavelet and number of levels for different time series data types. According to the results in our experimental comparative studies, the proposed method is not only robust and efficient but can also be generalized for use in different application domains. However, the proposed method is still timeconsuming. How to improve its efficiency will be considered in the next work.

Data Availability
e dataset used to support this study is the open dataset "UCR Time Series Classification Archive," which is available at https://www.cs.ucr.edu/∼eamonn/time_series_data_2018/.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.  Table 3.