A Weighted Error Distance Metrics (WEDM) for Performance Evaluation on Multiple Change-Point (MCP) Detection in Synthetic Time Series

Change-point detection (CPD) is to find abrupt changes in time-series data. Various computational algorithms have been developed for CPD applications. To compare the different CPD models, many performance metrics have been introduced to evaluate the algorithms. Each of the previous evaluation methods measures the different aspects of the methods. Based on the existing weighted error distance (WED) method on single change-point (CP) detection, a novel WED metrics (WEDM) was proposed to evaluate the overall performance of a CPD model across not only repetitive tests on single CP detection, but also successive tests on multiple change-point (MCP) detection on synthetic time series under the random slide window (RSW) and fixed slide window (FSW) frameworks. In the proposed WEDM method, a concept of normalized error distance was introduced that allows comparisons of the distance between the estimated change-point (eCP) position and the target change point (tCP) in the synthetic time series. In the successive MCPs detection, the proposed WEDM method first divides the original time-series sample into a series of data segments in terms of the assigned tCPs set and then calculates a normalized error distance (NED) value for each segment. Next, our WEDM presents the frequency and WED distribution of the resultant eCPs from all data segments in the normalized positive-error distance (NPED) and the normalized negative-error distance (NNED) intervals in the same coordinates. Last, the mean WED (MWED) and MWTD (1-MWED) were obtained and then dealt with as important performance evaluation indexes. Based on the synthetic datasets in the Matlab platform, repetitive tests on single CP detection were executed by using different CPD models, including ternary search tree (TST), binary search tree (BST), Kolmogorov–Smirnov (KS) tests, t-tests (T), and singular spectrum analysis (SSA) algorithms. Meanwhile, successive tests on MCPs detection were implemented under the fixed slide window (FSW) and random slide window (RSW) frameworks. These CPD models mentioned above were evaluated in terms of our WED metrics, together with supplementary indexes for evaluating the convergence of different CPD models, including rates of hit, miss, error, and computing time, respectively. The experimental results showed the value of this WEDM method.


Introduction
Change-point (CP) detection is the application of core techniques to detect abrupt changes in properties of timeseries data. It has been widely studied in many real-world problems, such as atmospheric and financial analyses [1], fault detection in engineering systems [2,3], changes detection in a variance of oceanographic time series [4], genetic time-series analyses [5], and online detection of steady-state operation [6]. For example, the usage of this method to detect abnormal patterns in ECG and EEG signals may also be beneficial [4,[7][8][9][10][11][12][13][14][15]. is application would allow appropriate staff to be alerted of abrupt changes in a patient's medical situation and to provide on-time treatment [16,17].
In addition, CPD models can be tightly combined with some nonlinear modeling approaches and their applications, such as classification of human hand movements [18], degradation signal for prognostic improvement [19], real-life hand prosthetic control [20], single-channel surface electromyography (sEMG)-based control [21]. CPD models utilize algorithms that cover the fields of data mining, statistics, and computer science, including parametric and nonparametric methods [8,[22][23][24][25][26][27]. Each CPD algorithm can be assessed from the aspect of detection accuracy, computational cost, or whether it can be a real-time detection.
Many performance metrics have been introduced to evaluate CPD algorithms based on the type of decisions they make [28]. Aminikhanghahi and Cook [29] reviewed the performance evaluation methods commonly used for CPD models. e evaluation can be based on a yes/no decision whether the resultant change point was detected within a certain distance from the actual change point. In this case, the CPD model can be treated as a binary classification model and can be evaluated with the usual measures, such as accuracy, sensitivity, specificity, or ROC curve [30,31]. For real applications, for example, clinical decision-making, cutoffs applied to the model outcomes can be adjusted to achieve different sensitivity and specificity [32]. However, when the difference in time between the resultant eCP and the actual tCP represents the measure of CPD performance, then the evaluation of these algorithms is not as straightforward as for the binary classification. ere is no single label against which the performance of the algorithm can be measured. A few useful metrics consider the distance between the eCP and the tCP to measure CPD method performance.
ese metrics include mean absolute error (MAE), mean squared error (MSE), mean signed difference (MSD), root mean squared error (RMSE), and normalized root mean squared error (NRMSE). Of these, except NRMSE normalizes the unit size of the predicted value and facilitates a more direct comparison of error between different datasets, the other methods measure only the absolute distances between the eCP and the tCP. However, even NRMSE does not count the difference between the situations when the eCP is before and after the actual tCP. It also fails to consider the relative position of the tCP within the total length of the time-series sample.
In our previous studies [33], a preliminary WED method was proposed for evaluating a CPD model for single changepoint detection. In this existing method, a concept of weighted error distance (WED) is introduced for counting a normalized error distance between each pair of the resultant eCPs and the actual tCPs, and then the performance of different CPD models is ranked by the averaged WED accordingly [33]. In this study, a novel WEDM method is proposed to compare the overall performance of CPD models for MCPs detection on multiple data segments in a time series with different data features. Based on the previous WED measure, a concept of normalized error distance was introduced in this WEDM method, that allows comparisons of the distance between the estimated change-point (eCP) position and the target change point (tCP). During the successive MCPs detection, the proposed WEDM method first divides the original sample into a series of data segments in terms of assigned tCPs, and then counts a normalized error distance (NED) value for each segment. en, our WEDM presents the frequency and WED distribution of the resultant eCPs from all data segments in the normalized positive-error distance (NPED) and the normalized negative-error distance (NNED) intervals in the same coordinates. Last, the mean WED (MWED) and MWTD (1-MWED) were calculated and dealt with as important performance indexes. Based on the synthetic datasets in the Matlab platform, both repetitive tests on single CP detection and successive test on MCPs detection were executed by using different CPD models, including ternary search tree (TST) [8,34], binary search tree (BST) [15,24], Kolmogorov-Smirnov (KS) tests [22,25], t-tests (T) [23,35], and singular spectrum analysis (SSA) algorithms [36] recorded in our previous studies [22,37]. Meanwhile, these CPD models above were evaluated under the random slide window (RSW) [8,38,39] and fixed slide window (FSW) frameworks [40][41][42][43][44] in terms of our WEDM and supplementary indexes including the rates of hit, miss, error, and computing time, respectively. e experimental results showed the value of this WEDM method.

Methods
In this part, the proposed WEDM is theoretically illuminated in the following steps. First, the diagnosed sample is divided into a series of data segments according to the assigned target MCPs. Second, a normalized error distance (NED) is calculated by comparing the distance between the resultant eCP position and the actual tCP within each data segment. ird, the frequency and WED distribution of the resultant eCPs detected from all segments are presented across the normalized positive-error distance (NPED) and the normalized negative-error distance (NNED) intervals in the same coordinates. Last, the metrics of mean WED (MWED) and mean WTD (MWTD) are given to efficiently evaluate a CPD model for MCPs detection on a series of data fluctuations in an identical time series.

Data Segmentation.
Suppose a time-series signal X � X 1 , . . . , X i , . . . , X N can be observed as a trajectory of a multiple data distribution process, in which the segment X i is defined by the following equation: where t∈{ t i−1 +1,..., t i }, 0< i ≤ M, and f i ∈ f 1 , . . . , f M is a deterministic and piece-wise function of one-dimensional signal with change points (satisfying f i ≠ f i+1 , and i � 1, . . ., M−1 for insuring that abrupt changes occur), and M∈{1, 2, . . ., n} is the number of data segment regimes and therefore M−1 is the number of abrupt changes, 0 � t 0 < t 1 < ···< t i <···< t M � n. e number M−1 and locations η 1 ,. . ., η M−1 of change points in the process are supposed to be unknown. e sequence (ε i ) i ∈ N is assumed to be random white noise and such that E(ε i ) is exactly or approximately zero. In the simplest case, (ε i ) i ∈ N is modeled as i.i.d., but can also follow more complex time-series distributions.
Consider an observed time-series signal X � X 1 , . . . , X i , . . . , X N with M−1 change points mentioned above, one-part time series X ′ � X s , . . . , X j , . . . , X e with a size of N ′ is selected from X, 1 ≤ s < j < e ≤ N, and 1 < N ′ ≤ N. Suppose a set of target MCPs tMCP set � tCP 1 , . . . , tCP n is contained within X ′ , and 1 ≤ n ≤ M − 1. In the proposed WEDM method, the diagnosed data sample X ′ is first divided into a series of data segments according to different target CP positions in the tMCP set. e process of data segmentation is described below (Figure 1): (1) For each tCP i to be diagnosed in the tMCP set, the data segment Seg i can be denoted as follows: where 1 < i < n and 1 < n < N ′ , and two endpoints mCP i−1 and mCP i in Seg i are formulated as follows: (2) Especially, the first Seg 1 and the last Seg n can be presented according to the tCP 1 and tCP n as follows: . . , tCP 1 and Seg n � tCP n , . . . , X e , (4) where X s and X e are the two endpoints in X ′ , respectively. (3) en, the time series X ′ � X s , . . . , X j , . . . , X e can be divided into a set of data segments SEG set X′ � Seg 1 , . . . , Seg n . at is, X ′ � Seg 1 , . . . , Seg n , and the following equation holds where N X′ is the total length of X ′ , and N Seg i refers to the size of Seg i .

NED Evaluation on Single CP Detection.
In the scheme of error distance (ED) measurement on single CP detection ( Figure 2), each segment Seg i � X a . . . X c . . . X b in time series X ′ � Seg 1 , . . . , Seg n is divided into the former (left) part X a , . . . , X c−1 and the latter (right) part X c+1 , . . . , X b by the actual tCP i located at the data point X c and 1 ≤ i ≤ n. From a statistical point of view, we refer to the former (left) part as a positive area and the latter (right) part as a negative one. When applying a CPD to detect the actual tCP i in the data segment Seg i , a resultant eCP i might be estimated from either the positive area or the negative one. A few concepts are introduced here to measure CPD model performance: true-positive distance (tPD), positive-error distance (pED), true-negative distance (tND), and negative-error distance (nED). If the resultant eCP i is detected on the left side of the tCP i (positive area), then pED i and tPD i can be calculated. at is, the distance from the eCP i to the tCP i and the start point, respectively. Meanwhile, nED i and tND i are not applicable. Conversely, when the eCP j is estimated from the right side of the tCP i (negative area), nED i equals the distance from eCP j to tCP i , and tND i is the distance from the eCP j to the end of the data segment Seg i . At the same time, pED i and tPD i do not exist ( Figure 2). ese definitions can be represented in formulas (6)-(9)as follows: In which, X a and X b represent the start and endpoints of the time-series segment Seg i , respectively, X c is the position of actual tCP i in the Seg i , X d and X e refer to the positions of resultant eCP on the left or right side of the tCP i respectively.
Basically, for a current data segment Seg i in the scheme of NED evaluation on single CP detection (Figure 3), the distance between the start point and the tCP i and the distance from the tCP i to the end of each segment are both normalized to 1, and the normalized tCP position for each segment will match to the same point. In formulas (10)-(13), tPDR i , pEDR i , tNDR i , and nEDR i can be interpreted as the normalized true-positive distance (NtPD i ), normalized positive-error distance (NpED i ), normalized true-negative distance (NtND i ), and normalized negative-error distance (NnED i ), respectively.
ereafter, a normalized error distance NED i in formula (14) is presented by a piecewise function of NpED i and NnED i , according to the resultant eCP i located at the positive or negative area.   1] in the x-axis ( Figure 4). en, the frequencies of NED i can be defined in the all resultant eCPs as follows: In which, Num(NED i ) is the number of the resultant eCPs that their NED values equal to NED i , and Nt is the number of resultant eCPs in total, 1 ≤ i ≤ Nt. en, the weighted error distance WED i is introduced according to the NED i and Freq(NED i ) in the resultant eCPs ( Figure 5). For each eCP i in the scattered distribution of resultant eCPs, its corresponding WED i is equal to WpED i or WnED i depending on whether the NED i is located at the positive-NpED or negative-NnED area ranging from −1 to 1 in the x-axis. e definitions of WpED i , WnED i , and WED i are formulated as follows: ereafter, a mean weighted error distance (MWED) is defined as follows: where l and r refer to the numbers of the eCPs located before and after the actual tCPs (positive-NpED area and negative-NnED area), respectively. In most of the CPD models, when the search algorithm reaches the start or end of the time series, if no change point is found, then the resultant eCP can be set as either the start or the end. erefore, the sum of l and r will be equal to N (the total number of actual tCPs to be diagnosed in a time series X′). Formula (17) can be simplified as follows: Figure 1: e scheme of WEDM evaluation on the target MCPs detection in the diagnosed X′.
igure 2: e scheme of error distance (ED) measurement on single CP detection in the data segment Seg i . In the positive area, X a represents the start point of Seg i , and X d is the position of resultant eCP i within the positive area before the actual tCP i . On the other hand, X b represents the endpoint of Seg i , and X e stands for the eCP j located within the negative area after the tCP i .
Positive Area Negative Area Figure 3: e scheme of NED evaluation on single CP detection in the data segment Seg i . In which, "−1" and "1" represent the start and endpoints of Seg i , and "0" refers to the position of actual tCP i in the x-axis, respectively.

MWED �
Furthermore, following MWED, 1-MWED can be referred to as mean weighted true distance (MWTD) and used as a measure of the overall performance of a CPD model for MCPs detection on time series with a series of data fluctuations.

Results and Discussion
To accurately evaluate different CPD models, other related indexes were introduced besides our WEDM. In the synthetic experiments, time-series datasets were generated and assembled by using the Gaussian distribution function in the Matlab platform, and then repetitive tests on single CP detection were executed by using different TST, BST, KS, and SSA models. Meanwhile, the performance of CPD models was evaluated by using successive tests on MCPs detection that were implemented under different RSW and FSW frameworks, respectively.

Related Evaluation Indexes.
In the synthetic tests, some other indexes are used for evaluating the convergence of different CPD models, including the hit, miss, and error rates, and computing time. Given a data segment Seg i in the time series X' mentioned above, the related definitions are introduced in terms of the error distance between the resultant eCPs and the actual tCP i as follows ( Figure 6): (1) Error distance: Given an actual tCP i assigned in the current data segment Seg i , the error distance ED tCPi between each pair of the estimated eCP j and the tCP i is defined by ED tCPi � |eCP j − tCP i |. (2) Hit area: For the actual tCP i , the hit area named HA tCPi is formulated by where hd i is the threshold value of error distance between tCP i and eCP j . (3) Hit: Given an error distance ED tCPi mentioned above, if 0 ≤ ED tCPi ≤ hd i holds, then the tCP i is hit by eCP j and recorded by Hit(tCP i ) � 1. erefore, the value of WED i defined in formula (18) equals 0. (4) Error: On the other hand, if ED tCPi > hd i holds, then eCP j is dealt as an error result labeled by Error(eCP j ) � 1. In this circumstance, the value of WED i is within the rage (0, 1). (5) Miss: In addition, if no change point is detected from the Seg i , then the target tCP i is missed, and identified by Miss(tCP i ) � 1. Accordingly, the value of WED i is set to be 1 because of the missing tCP i . ereafter, the hit rate, miss rate, and error rate are formulated as follows:  Error(eCP i ) stand for the number of the resultant MCPs in which D tCPi > hd i holds. N eCPs is the number of resultant MCPs in total, and it is usually larger than N tCPs , that is, the number of the actual tCPs within the time series X ′ . Generally, it holds true that hit rate + miss rate + error rate � 1 for all the resultant eCPs. (6) Computing time: In addition, for a certain CPD model k, the computing time is mainly used for tCPs detecting from the multiple data segments in X ′ , and it can be denoted as follows: where ST i refers to the computing time cost in the Seg i , and N s is the total data segments. en, the normalized time is defined as follows: In which, ST k stands for the computing time of the model k, and n is the total model to be compared. e NST k represents the time ratio of model k to all methods, and then it can reflect the searching efficiency against others. Generally, both TSTand BSTmodels in our previous studies have a time complexity of nearly O(log N) [8,10,13]; therefore, they should be faster and more efficient than some traditional algorithms with time complexity about O(N 2 ), such as KS, CUSUM, t-test, or SSA methods.

Repetitive Tests on Single CP Detection.
In the first experiment, repetitive tests on single CP detection were executed on the synthetic dataset, that is, Dataset1 � X 1 , . . . X i , . . . , X K that was generated by the Gaussian function in the Matlab R2016 platform. For each time series . , x m and the negative area X iR � x m+1 , . . . , x N before and after the assigned target tCP i � x m . e former X iL and latter X iR were generated by the normal distribution N (μ � 0, σ � 1) of size m (m time points included in the positive area), and N (μ � V, σ � 1) of size N-m (N-m time points in the negative area), respectively, where V is a constant mean value, and N is the total length of X i .
Here, we first present the results from Dataset1 that was composed of multiple 20 data groups with different length N, variance V, and tCP, and each group contains 100 timeseries samples. erefore, Dataset1 included 2000 time series in total, and this experiment named Exp1 is performed by using TST, BST, KS, T, and SSA models, respectively. In our simulations, the time-series samples in each group were generated by selecting the random values of sample length N from 210 to 215, variance V from 1.0 to 3.7, and the position of actual tCP from 1 to N.
In the 20 groups of Exp1, the repetitive tests are executed by using different CPD models including the TST, BST, KS, T, and SSA, respectively (Figure 7). With the total 2000 timeseries samples in Dataset1, the frequency and WED distribution of resultant MCPs are illustrated from the positive-NpED range of [−1, 0] to the negative-NnED range of [0,1] in the x-axis. From these results, we can see that if the resultant eCP is much closer to the central axis of x � 0, then the WED value generally gets smaller and tends to be 0, and vice versa. In all five models, TST and KS obtain the eCPs that are mostly located near the central field of x � 0, and then have narrower WED distributions and smaller WED values than other models, except that TST has a few eCPs fallen into the positive-NpED field. As for other BST, T, and SSA models, the eCPs are mainly scattered with a wide range from the NpED to the NnED areas, therefore their WED distributions are wider and bigger, especially for T and SSA.
Meanwhile, these simulation results also illustrate that both TST and KS have better convergency than others, especially, the TST has the highest hit level and takes the shortest convergent time in all five models. For the rest models, BST seems much better than others, and T has the worst convergency, because of the lowest hit, the biggest error, and convergent time in all five models. Furthermore, the mean analyses (Table 1) indicate that the TST takes the shortest computing time, has the highest hit rate, the smallest MWED, and the biggest MWTD out of the other four models. For T and SSA models, a lot of eCPs are scattered the whole field from NPED to NNED, especially, T has the biggest values of error rate and MWED and needs the longest time in all five models.
In addition, the efficiencies of five models are evaluated using random parameter values in a total of 20 tests. e dynamic tracks including hit rate, miss rate, error rate, and MWED are illustrated versus the test number from 1 to 20 ( Figure 8). Also, the mean analyses on hit rate, miss rate, error rate, and MWED are presented in the histograms, in which, "1," "2," "3," "4," and "5" in x-axis refer to the TST, BST, KS, T, and SSA models, respectively. In the whole process of simulation tests, the TST model has a relatively higher hit rate with some fluctuations and keeps more stable and lower levels of miss rate, error rate, and MWED than others. Although KS has a smaller hit rate than TST and BST, it keeps lower tracks of miss and error rates than BST, T, and SSA. To some extent, BST has a bigger hit rate, and lower values of error rate and MWED than T and SSA, it seems unstable due to the drastic oscillations in the tracks of hit and miss rates. For T and SSA, both models have smaller hit rates and keep dramatic fluctuations in the tracks of error rate and MWED value, despite a lower miss rate than BST.
Furthermore, taking one representative test as an example, the simulations of single CP detection are repetitively executed by using 100 time-series samples with random values of parameters N � 214, tCP � 12267, and V � 1.9. For different TST, BST, KS, T, and SSA models, the resultant eCPs are illustrated using the locations, distributions, frequency, and WED, in line with the test number, time-series positions, NPED, and NNED in the x-axis, respectively ( Figure 9). For both TST and KS models, it is easy to see that most of the eCPs are located within the small range near the actual tCP � 12267, and similar results can be found in the distribution, frequency, and WED analyses on the resultant eCPs. On the contrary, similar results for the rest of BST, T, and SSA models are that lots of the eCPs are randomly scattered across the fields from NPED to NNED, and small parts of the eCPs are gathered near the actual tCP.
en, the mean analyses for this representative test are summarized in terms of WMTD, hit rate, miss rate, error rate, MWED, and time (Table 2). e results show that the TST model has much smaller values of MWED, miss and error rates, and computing time, as well as the biggest values of hit rate and MWTD than others. Despite a long time and smaller hit rate than TST, KS kept similar levels of MWTD, hit, miss, and error rates with it. As for the rest BST, T, and SSA, although the three models had similar performance, BST had the biggest miss rate, T had the smallest MWTD and hit rate, and the biggest values of time, error rate, and MWED.

Successive MCPs Detection under the RSW Framework.
In the second experiment, successive tests on MCPs detection were implemented by using other synthetic datasets      For TST, BST, KS, T, and SSA models, the dynamic tracks of (a) hit rate, (b) miss rate, (c) error rate, and (d) MWED versus simulation tests range from 1 to 20. In addition, the mean analyses on (e) hit rate, (f ) miss rate, (g) error rate, and (h) MWED, in which, "1" "2", "3", "4", and "5" in x-axis refer to the TST, BST, KS, T, and SSA models, respectively. 215, with mean U j from 1.0 to 0.1 × N MCPs , and variance V j from 1 to 2.0 × N MCPs , respectively. Here, we present the results of successive tests on MCPs detection under the RSW framework. First, the frequency and WED distribution of resultant MCPs (Figure 10) are displayed within the whole range from the negative-NPED field to the positive-NNED field in the x-axis. Generally, for a certain CPD model, the resultant MCPs are closer to the central axis x � 0, their values of MWED are much smaller. In contrast, the bigger MWTD has, the better efficiency is, and vice versa. In all five models, the results ( Figure 10) and the mean analyses (Table 3) show that most of the resultant MCPs detected by TST are located near the central axis x � 0, and TST has the biggest hit rate, the smallest values of miss and error rates, therefore it has the highest MWTD out of others. For the BST model, although a lot of the resultant MCPs are scattered away from the central axis x � 0, it has a smaller error rate and MWED, as well as a bigger hit rate and MWTD than the rest models. For KS, T, and SSA, the common feature is that most of the resultant MCPs are spread through the whole field ranging from −1 to 1 in the x-axis. KS has a bigger MWTD than the other two, T has the smallest MWTD, and SSA has the biggest values of error rate and computing time in all five models.
Meanwhile, these simulations illustrate that the TST has the best convergency because it has the highest hit level, the lowest error, and takes the shortest convergent time in all five models. For the others, the BST model has much better convergency due to the higher hit, lower error, and shorter time than others. SSA seems the worst one in all five models, because of the lowest hit, the biggest error, and convergent time.
Second, the performance of five CPD models is demonstrated by a series of 10 tests in total, in which the respective parameters of the sample size N, the number of MCPs N MCPs , the mean μ, and variance δ are randomly taken from 212-215, 15∼30, 1∼0.1 × N MCPs , and 1∼2 × N MCPs , respectively. e results of dynamic tracks and mean analyses (Figure 11) indicate that the TST model still keeps a better grade with a higher and more stable level of hit rate, as well as the lower levels of error rate and MWED than the other four models. Although BST looks more efficient than KS, T, and SSA, the dynamic tracks in all four items present stronger fluctuations, especially for the miss rate.  is probably means that BST has unstable performance during the process of MCPs detection. As for the rest models, they all have similar tracks of lower hit rate and bigger error rates. KS presents instability due to the fluctuant tracks of miss rate and MWED, and so does the T model because of the fluctuant miss rate in the total of random 10 tests. Also, the model's performance can be intuitively evaluated and distinguished from each other in terms of the mean analyses in the histograms (Figure 11(e)-11(h)).
Last, one representative test is selected from Exp2 above, and the simulations of MCPs detection are demonstrated by using a time series with nMCPs � 25 ( Figure 12). For the diagnosed data sample (Figure 12(f )), the distributions of resultant MCPs are illustrated by using different CPD models of TST, BST, KS, T, and SSA models, respectively (Figure 12(a)-12(e)). e results of frequency and WED distribution of resultant MCPs ( Figure 13) and mean analyses (Table 4) reveal that the TST is a superior one in all five models because most of the resultant MCPs hit the target MCP positions, and few of them are dealt with as miss or error states. e BST model takes second place due to a smaller hit rate and bigger error rate than TST. For the rest models, KS, T, and SSA get worse one by one because more numbers of resultant MCPs are in the error state. As a result, the hit rate gets lower, and MWED takes bigger as well.
Successive MCPs detection under the FSW framework. In the Exp3 under the FSW framework, the total of 30 data segments was arranged within each sample X i , and each data segment Seg j � { x sj 1 , . . ., X sj Nsj } was randomly generated by the Gaussian distribution N(U j , V j ) of length N sj from 212 to 215, with mean U j from 1.0 to 0.1 × 30 and variance V j from 1 to 2.0 × 30, as well as with the size of fixed slide window N fsw ranging from 2^6 to 215, respectively.
In our simulations, we execute a total of 10 successive tests on MCPs detection under the FSW framework. First, the frequency and WED distribution of resultant MCPs ( Figure 14) are displayed from the negative-NPED field to the positive-NNED field in the x-axis. Generally, for a certain CPD model, the resultant MCPs are much closer to the central axis x � 0, and their WED values are much smaller. e results ( Figure 14 and Table 5) indicate that for the TST model, most of the resultant MCPs detected are located near the central axis x � 0, and it has the biggest hit rate, the smallest values of error rate, MWED, and computing time; therefore, it has the highest MWTD in all five CPD models. As for BST, KS, T, and SSA models, the  common feature is that most of the resultant MCPs are randomly scattered through the whole field ranging from −1 to 1 in the x-axis. For KS, it has a smaller miss rate and MWED and a bigger MWTD than the others. Although BST has a bigger hit rate and shorter time, it has a bigger MWED and smaller MWTD than TST and KS. T and SSA have much bigger values of MWED, error rate, and smaller MWTD, especially SSA has the smallest MWTD and the biggest values of error rate and time in all five models.
Meanwhile, these simulations illustrate that the TST has the best convergency, in terms of the highest hit, the lowest error, and the shortest time in all five models. For the other four models, the BST model is much better than the rest ones, because it has a relatively higher hit level, lower error rate, and much shorter time than others. Unfortunately, SSA has the worst convergency in all five models, due to the lowest hit level, the biggest error rate, and the longest convergent time out of the other four models. Second, the performance evaluation on five CPD models is demonstrated respectively by a series of successive MCPs detection tests in Exp3. Generally, the dynamic tracks and histogram analyses ( Figure 15) show that all five CPD models present respective instability in response to the size of the fixed slide window, N fsw ranging from 2^6 to 215, especially for the TST, BST, and KS models. Despite the TST model having the biggest miss rate with drastic fluctuations, it still keeps a better efficiency due to the highest hit rate and the lowest levels of error rate and MWED out of the other four models. As for the rest ones, BST seems better than KS, T, and SSA, because of the higher hit rate and the slightly decreasing level of error rate. Although KS reversely keeps decreasing hit rate and increasing error rate with big fluctuation, it seems better than T and SSA, on account of lower levels of miss rate and MWED. Both Tand SSA present inefficiency and insensitivity in response to the increasing N fsw , especially for the SSA model, with the lowest hit rate and the highest levels of error rate and MWED out of other ones.
Last, taking the TST model as an example, five representative simulations are selected from the total 10 tests in the FSW framework of Exp3 (Figure 16(a)-16(e)), and then the performance evaluation is listed under the values of N fsw � 2^6, 2^8, 212, 214, and 215, respectively (Table 6). Given one data sample with N MCPs � 30 (Figure 16(f )), the results of MCPs detection show that the TST model presents the best performance as N fsw � 212, in terms of the biggest values of    Figure 15: e simulations of MCPs detection on the total of 10 tests in Exp3 under the FSW framework, with random parameters of sample size N sj from 2^12 to 215, the fixed number of tCPs N MCPs � 30, mean U j from 1 to 0.1 × 30, and variance V j from 1 to 2 × 30, respectively. For the different CPD models of TST, BST, KS, T, and SSA, the performance analyses are denoted in (a) hit rate, (b) miss rate, (c) error rate, and (d) mwed, respectively. Furthermore, the mean analyses are illustrated in (e) hit rate, (f ) miss rate, (g) error rate, and (h) MWED, in which, "1," "2," "3," "4," and "5" in x-axis refer to TST, BST, KS, T, and SSA, respectively. hit rate and MWTD, and the smallest values of miss and error rates and MWED in all five tests. However, the efficiency of TST tends to be worse as the value of N fsw takes too bigger or too smaller. erefore, the size of the fixed slide window is a key factor for the FSW framework during the MCPs detection. In all, these results in the two experiments above suggest that the proposed WED method can visually present the distribution of resultant eCPs in the error state and the normalized distance from the target position of zero in the xaxis. e simulation results suggest that the mean analyses of MWED can generally count the mean value of error ratio against total tests and then measure the efficiency of a certain model in the successive MCPs detection. e performances of different CPD models can be evaluated, and the better ones can be discerned from the others.

Conclusions and Discussion
In this study, a novel WEDM method is proposed for evaluating the overall performance of a CPD model across not only repetitive tests on single CP detection, but also successive tests on multiple change-point (MCP) detection on synthetic time series under different RSW and FSW frameworks. In this WEDM method, a concept of normalized error distance was introduced that allows comparisons of the distance between the estimated change-point (eCP) position and the target change-point (tCP) in the synthetic time series. Especially, both positive-and negativeerror distances between resultant eCPs and actual tCPs are weighted or normalized for creating WED metrics.
As opposed to previous methods, our WEDM allows comparison when CPD is used across multiple time-series  samples with different lengths and variances, especially cross multiple data segments in an identical time series, with different patterns, such as data distributions, segment sizes, and number and positions of targets tCPs. In the successive MCPs detection, our WEDM method first divides the original sample into a series of data segments in terms of assigned target change points and then calculates a normalized error distance (NED) value for each segment. Next, WEDM presents the frequency and WED distribution of the resultant eCPs from all data segments in the normalized positive-error distance (NPED) and the normalized negative-error distance (NNED) intervals in the same coordinates. Last, the mean WED (MWED) and MWTD (1-MWED) were obtained and dealt with as important performance indexes. In our simulations, a series of MCPs detection tests were executed by using synthetic time-series datasets in the Matlab platform, and the proposed method was applied to the evaluation of the CPD utilizing TST, BST, KS, T, and SSA models under repetitive single CP detection in Exp1, successive MCPs detection under the RSW in Exp2, and FSW framework in Exp3, respectively. e results of the study showed its ability to compare the results from the CPD models working with a series of synthetic tests on multiple time-series samples. e WED metrics offer a new way of evaluating CPD performance. It allows better visualization of the distribution of the resultant eCPs when the CPD models work on multiple time series with different data features, as well as multiple data segments of a time-series sample with different data patterns. Meanwhile, the convergence of different CPD models was analyzed in terms of the dynamic tracks and mean analyses on the value of WED, as well as other measurements, including the rates of hit, error, and miss, and the computational cost. Our WEDM method can not only offer a visualizable and overall measure but also give better advice for users as to what CPD models to use based on the application.

Data Availability
Some synthetic time-series datasets were generated in the Matlab simulation platform, and no real datasets are used specially for the experimental validations in this study.

Conflicts of Interest
All authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.