A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic

Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals.


Introduction
Abrupt change detection is to identify abrupt changes in the statistical properties of a signal series, which occur at unknown instants [1][2][3]. These changes are interesting because they are indicative of qualitative transitions in the data generation mechanism (DGM) underlying the signals. Currently, CP detection has attracted considerable attention in the fields of data mining and statistics, and it has been widely studied in many real-world problems, such as atmospheric and financial analyses [1], fault detection in engineering system [4,5], climate change detection [6], genetic time series analyses [7], signal segmentation [8,9], and intrusion detection in computer network [4].
In community of statistics, some nonparametric approaches for CP detection have been widely explored. For example, KS statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution or between the empirical distribution function of two samples [10,11]. Also, KS statistic and its modified versions are broadly investigated on many application fields, for example, testing hypotheses regarding activation in blood oxygenation level-dependent functional MRI data [12], modeling the cumulative distribution function of rub-induced AE signals, quantifying the goodness of fit to offer a suitable signal feature for diagnosis [13], as well as abrupt change detecting on EEG signals [14], and gene expression time series [15]. Meanwhile, as for the model-related statistic approaches, some modified cumulative sum (CUSUM) methods provide the asymptotic distributions of test statistics and the consistency of procedures and behave better in finite samples and have a higher stability with respect to the time of change than ordinary CUSUM procedures [16]. The CUSUM method and its revised versions have been widely applied to detect the structural breaks in the parameters of stochastic models, as well as the abrupt changes in the regression parameters of multiple time series regression models, such as multiple 2 Computational Intelligence and Neuroscience CP detection in biological sequences [17], abrupt change detection in the regression parameters of a set of capital asset pricing data related to the Fama-French extension of the CAPM [16], and abrupt change detection in a shaperestricted regression model [18].
On the other hand, SSA is a powerful technique for time series analyses. SSA is nonparametric and requires no prior knowledge on the properties of time series signal [19]. The main idea of SSA is applied in the principal component analyses on the trajectory matrix with subsequent reconstruction of the original time series. SSA has been proved to be very successful and has already become a standard tool in the analyses of climatic [10], meteorological, and geophysical time series [11,19]. Currently, SSA has been successfully applied in the real time series recordings, for example, abrupt change analyses on EMG-onset detection [12] and CP detection in time series [13]. Although SSA is a model-free method, it is not scalable to large-scale datasets, because it is time-consuming and sometimes invalid for time series analyses with less significant data fluctuation.
In addition, Wavelet Transform (WT) is another important tool for time series analyses [14,15,[20][21][22][23]. WT has been widely applied in anomaly detection, time series prediction, image processing, and noise reduction [15,[23][24][25]. WT can represent general function at different scales and positions in a versatile and sophisticated manner, so that the data distribution features can be easily extracted from different time or space scales [25,26]. As a simple WT, Haar Wavelet (HW) owns some attractive features including fast implementation and ability to analyze the local features. HW is very useful to find abrupt changes of discontinuity and high frequency in time series, so it is a potential candidate in modern electrical and computer engineering applications, such as signal and image compression, eye detection [27], abnormality detection on time series [28,29], and abrupt change detection on autoregressive conditional heteroscedastic processes [30].
However, all of these methods above are time-consuming and sometime invalid for abrupt change detection near the left or the right boundary, especially for insignificant data fluctuation in large-scale time series. To resolve these problems, we propose a fast framework for CP detection based on binary search trees and a modified KS statistic, termed BSTKS for short. In this novel method, first, two BSTs are derived from a diagnosed time series. Second, three search criteria are introduced in terms of the statistic and variance fluctuations between two adjacent time series segments, and then an optimal search path is detected from the root to leaf nodes of two BSTs. Last, the proposed BSTKS and other KS, , and SSA methods are tested on both the synthetic time series and real EEG recordings and evaluated in terms of computation time, hit rate, error, accuracy, and area under curve (AUC) of Receiver Operating Characteristic (ROC) curve analyses.
In general, for a certain bioelectric signal, an abrupt change means an important transition of biological functions or health states before and after a strong attack or an acute perturbation from internal or external environment. Therefore, it is very necessary to not only discern abrupt change from all kinds of physiological and psychological time series signals, but also inspect the significant fluctuation between adjacent time series segments with different scales. The following sections focused on not only presenting the framework of the proposed BSTKS method through theoretical foundation, simulation, and evaluation, but also discussing how it can more quickly and efficiently detect abrupt change on both synthetic and real bioelectric EEG signals than other existing KS, , and SSA methods. The rest of this paper is organized as follows. Section 2 gives the preliminary of abrupt change by introducing the statistic and variance fluctuations between two adjacent time series segments. Section 3 implements the integrated framework of the BSTKS method in terms of three search criteria in detail. Section 4 provides some representative experiments by using the synthetic time series and real EEG recordings and then analyzes the performance of BSTKS by comparing with other KS, , and SSA methods. Section 5 gives summary and conclusion from previous sections.

Statistic Fluctuation.
KS statistic is sensitive to differences in both location and shape of the cumulative distribution functions (c.d.f) of two samples. The null distribution of KS statistic is calculated under the null hypothesis that the two samples are drawn from the same distribution or one sample is drawn from the reference distribution. To detect an abrupt change from a diagnosed time series , we define the statistic fluctuation between two adjacent segments within by means of KS statistic as follows [1,4,19].
where = { } =1,..., is a set of the discrete and centred i.i.d random variables and is a noisy mean signal with unknown distribution. The statistic fluctuation between two adjacent segments = { , . . . , } and = { +1 , . . . , } is defined as Computational Intelligence and Neuroscience 3 where ( ) and ( ) count the proportion of the sample points below level .

Hypothesis 1.
In order to discern an abrupt change on in terms of statistic fluctuation defined above, we introduce KS test for two adjacent segments and in as ( 1 ) if ( ) > , abrupt change occurs in , in which ∈ is a threshold of the statistic fluctuation within belonging to an identical distribution. Then, we test ( 0 ) against ( 1 ) from observations. If an abrupt change occurs in , there exists a value satisfying ( ) > , ∈ [ 1 , ], and ∈ . In this hypothesis, we assume that the number, the location, and the size of the function in (1) are unknown, and the upper bound of the statistic fluctuation is supposed to be known.

Variance Fluctuation.
Provided the statistic fluctuation defined in (2) is insignificant enough, it is difficult to detect abrupt change near the left or the right boundary within , especially when sample size gets smaller. Therefore, we need to introduce another variable to calculate the variance fluctuation between two adjacent parts within a time series sample.
in which = − , = − − 1, and + ≤ . Here, ∈ is a variance threshold of time series which obeys an identical distribution. If there exists a value satisfying ( ) > , ∈ [ 1 , ], then an abrupt change occurs at in .

Two BSTs' Construction.
In the first part of the proposed BSTKS method, two BSTs, that is, BSTcA and BSTcD, are constructed from a time series sample , by using multilevel HWT. Generally, as shown in Figure 1, a discrete time series signal = { 1 , 2 , . . . , } can be decomposed into the th-level trend cA and -level fluctuations, that is, cD 1 , cD 2 , . . . , cD , = 1, 2, . . . , log 2 . The -level HWT is the mapping defined as [13] → (cA | cD | cD −1 | ⋅ ⋅ ⋅ | cD 2 | cD 1 ) ,  Figure 1: The diagram of a discrete time series decomposition by -level HWT, which is composed of -level cA and cD, that is, the average and difference coefficient vectors. and then, the mapping can be represented by the approximation and detail coefficient matrices, termed McA and McD as follows: where 0 ≤ ≤ = log 2 and 1 ≤ ≤ /2 . Supposing the size of a diagnosed is divisible times by 2, the th element cA , in cA and the th element cD , in cD can be denoted as where 1 ≤ ≤ log 2 and 2 ( − 1) + 1 ≤ ≤ * 2 ; = 2 ( − 1) + 1, = 2 ( − 1) + 2 ( −1) , and = 2 * . During two BSTs' construction, as shown in Figure 2, the non-leaf nodes in BSTcA and BSTcD are assembled by thelevel coefficient vectors of McA and McD, respectively; and then the leaf nodes are derived directly from the original time series . Therefore, the features of abrupt change in can be reflected and distributed into the different non-leaf nodes of BSTcA and BSTcD, in accordance with the level coefficient vectors in McA and McD.

CP Detection Based on Three Search Criteria.
To find an optimal path towards the potential CP within a given time series quickly and efficiently, some search criteria need to be introduced, and then the data exceptions can be detected from the root to leaf nodes of two BSTs. As for the statistic 4 Computational Intelligence and Neuroscience fluctuation within BSTcA, first, a new variable , is defined according to a current non-leaf node cA , in BSTcA, where 1 ≤ ≤ log 2 , 1 ≤ ≤ /2 ; = 2 ( − 1) + 1, = 2 * , and ≤ ≤ . Then, the statistic fluctuation between two adjacent segments = { , . . . , } and = { +1 , . . . , } can be defined by a modified KS statistic as in which , is a new element defined in (8); and stand for the sizes of and , respectively; 1 ≤ ≤ log 2 , 1 ≤ ≤ /2 ; = 2 ( − 1) + 1, = 2 , and = 2 ( − 1) + 2 ( −1) .
( , ) measures the e.c.d.f difference between and , and the larger ( , ) means the more significant statistic fluctuation between and . Therefore, a potential abrupt change might occur at in with more probability.
Proof. For a selected non-leaf node cA , in BSTcA, as shown in Figure 3, the original time series is divided equally into two adjacent segments and , which are covered by two non-leaf child nodes cA −1,2 −1 and cA −1,2 , respectively. According to the definitions of , ; and , ; in (10), the satisfied , ; > , ; indicates that the statistic fluctuation within is more significant than that one within ; that is, a potential abrupt change might be contained in with more probability than in , and vice versa. Furthermore, if , ; > ( ) holds true, then ( 1 ) of Hypothesis 1 is satisfied; that is, abrupt change occurs in , and vice versa, where ( ) is the critical value predefined in an identical distribution and is the significance level. Therefore, one of the two child nodes cA −1,2 −1 and cA −1,2 is selected and involved into the current search path; meanwhile, the remaining one is discarded. Once the statistic fluctuation is significant enough, an optimal search path can be detected by Criterion 1 from the top to the last non-leaf level in BSTcA. However, the search procedure is probably forced to cease because the statistic fluctuation is so insignificant that Criterion 1 is invalid for detecting it, especially for the left or the right boundary when sample is with smaller size . Therefore, it is necessary to introduce another search criterion based on the variance fluctuations within BSTcD.
Definition 4. For a current non-leaf node cD , in BSTcD, with its left and right-child nodes cD −1,2 −1 and cD −1,2 , respectively, the variance fluctuations , ; and , ; are defined in terms of (4) as , ; = ( , ; − 1, 2 − 1) = ( + ) , ; = ( , ; − 1, 2 ) = ( + ) Figure 3: The scheme of Criterion 1 based on the statistic fluctuations within BSTcA. In terms of this criterion, the left or right-child node, that is, cA ,2 −1 or cA ,2 , might be selected to be involved in the current search path; meanwhile the remaining one is discarded. Thereafter, an optimal path towards the potential abrupt change in is expected to be obtained from BSTcA, after log 2 binary search steps.
Proof. Similarly, as illustrated in Figure 4, the satisfied ( , ; > , ; ) in Criterion 2 means that the variance fluctuations within are stronger than that one within , in terms of the definitions of , ; and , ; in (11). That is, a potential abrupt change might exist in with more probability than in , and vice versa. Meanwhile, if , ; > ( ) holds true, then ( 1 ) in Hypothesis 2 is satisfied; that is, abrupt change occurs in , and vice versa, where ( ) is the critical value predefined in an identical 6 Computational Intelligence and Neuroscience Based on Criterions 1 and 2 above, a search path can be obtained from the top root to the last non-leaf levels of BSTcA. In order to estimate an abrupt change from the original elements of , another criterion needs to be introduced to discern which one can be selected from two adjacent leaf nodes in BSTcA.  Consider that the largest statistic fluctuation between ( ) and ( ) is achieved either before or after one of the jumps, that is, Then, another two variables − and − are defined as Therefore, the maximal statistic fluctuations and can be selected from − and , as well as − and . Then, the third search criterion is introduced in terms of and as follows.  (b) if (max( , ) = )∧( > ( )) holds true, then the right leaf node 2 in is taken as the estimated CP, and the left one is neglected; (c) otherwise, no abrupt change is detected from .
Proof. Obviously, if max( , ) > ( ) is satisfied in Criterion 3, then the statistic fluctuation overtakes the critical value ( ) which is given in an identical data distribution, and is the significance level. Therefore, one of the two leaf nodes 2 −1 and 2 is taken as the estimated CP within .
Supposing a non-leaf node cA , is selected in BSTcA, the statistic and variance fluctuations are accordingly calculated Computational Intelligence and Neuroscience 7 between two adjacent segments and . Meanwhile, the search procedure is implemented from the root to non-leaf nodes in the last second level of BSTcA, in terms of Criterions 1 and 2. Then, the estimated CP can be obtained from the leaf nodes in BSTcA, by using Criterion 3. Thereafter, an optimal path towards a potential CP within is detected from BSTcA, after about log 2 binary search steps.

Methods Compared with BSTKS.
There are many methods proposed for abrupt change detection in time series, and the following are some typical methods, to evaluate the proposed BSTKS framework.
KS Statistic (see [31]). In this method, a diagnosed time series is divided into two adjacent segments = { 1 , 2 , . . . , } and = { +1 , +2 , . . . , }, and then KS statistic is applied to calculate the statistic distance between and as where ( ) and ( ) stand for the e.c.d.f of , and , respectively; = + , is the total length of , and refers to a current test position within .
t-Statistic (see [32]). also known as Welch's -test is used only when the two population variances are assumed different (the two sample sizes may or may not be equal) and hence must be estimated separately. Suppose a diagnosed is divided into = { 1 , 2 , . . . , } and = { +1 , +2 , . . . , }. Then, -statistic is calculated as where and are the sample means of and , respectively; is an unbiased estimator of the standard deviation, = + , and and are the sizes of two segments and , respectively.
SSA (see [12,13,33]). In SSA method, a windowed portion is chosen within a time series = { 1 , 2 , . . . , }, where is large enough and a window width and the lag parameter are set such that = /2, = − + 1. For each = 0, 1, . . . , − − , this method takes an interval of the time series [ + 1, + ] and then defines the × trajectory matrix and describes the structure of the windowed portion as an -dimensional subspace. If the structure changes further, it will not be well described by the computed subspace. Then, the distance between this subspace and the new trajectory vectors will increase; therefore, this increase will signal that an abrupt change occurs in .

Results and Discussion
In this section, the proposed BSTKS is evaluated on the synthetic time series and real EEG recordings with different size . By comparing with existing KS, , and SSA methods, the efficiency, sensitivity, and performance are analyzed in terms of the computation time, error and accuracy, hit rate, and AUC of ROC analyses. Furthermore, the novelty of our algorithm and necessity for real application are discussed in the following paragraphs.

CP Detection on Synthetic Time Series.
In our simulations, some typical time series samples were derived from the normally distributed datasets (mean, = 0, and standard deviation, sd = 1). Each diagnosed sample of size is composed of a normal segment of size and an adjacent segment of size − , in which the abnormal part is simulated by adding a constant variation V into the random numbers of size − . The proposed BSTKS and other three methods, namely, KS, , and SSA, were tested, respectively, on 200 samples which were derived from each time series group with = 2 ∧ (4+ ), = 1, 2, . . . , 7, and V = (1+log 2 ( −4)), where = log 2 ( ) and = 1.0. For each sample in , a series of test positions were arranged by = * (2 ∧ ( − 4)), ≥ 5, and = 1, 2, . . . , 15.
First, simulations were carried out according to different value of sample size and test position . The average analyses on four methods were listed in Table 1, and the results of simulations on datasets 1 -7 were illustrated in Figure 5. In general, our BSTKS is the most promising with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of all four methods. Particularly, as sample size increases from 1 to 7 , all four methods take longer time for bigger , and BSTKS is always the fastest one. Meanwhile, BSTKS owns the highest level of hit rate against the low tracks of other three methods; and BSTKS is much more efficient with the smallest error and the highest accuracy, though all four methods tend to be better with increasing. However, BSTKS has smaller AUC of ROC analyses, that is, bigger search space, than SSA and KS.
Second, simulations were carried out based on the datasets 1 , 4 , and 7 . The proposed BSTKS and other three methods were tested according to the different value of variance V = (1 + log 2( − 4)), = 5, 8, 11, and = 0.5, 1.0, 2.0, 3.0, respectively. The average results of four methods on 1 , 4 , and 7 were summarized in Table 2, and the typical simulations were selected on 4 and represented in Figure 6. Generally, when V gets larger, all four methods get better hit rate, accuracy, and AUC of ROC analysis, except for longer computation time for bigger size . Compared with other three methods, the proposed BSTKS is more encouraging because of the shortest computation time, especially when gets bigger, as well as the highest hit rate and accuracy,  The average analyses on BSTKS and other three methods. In the histograms, "1," "2," "3," and "4" stand for BSTKS, KS, , and SSA, respectively. especially when gets smaller. Moreover, the simulations on 4 with different variance V ( Figure 6) explicitly illustrate that BSTKS has the best performance when V gets larger, in terms of the shortest time and the biggest increase of the hit rate out of four methods. For the accuracy and AUC, both BSTKS and KS keep higher sensitivity than and SSA, as V increases from 0.5 to 3.0. Moreover, the simulations on 1 and 7 were omitted, because similar results can be obtained like 4 above.
Third, simulations were implemented based on different CP test positions within 1 and 4 . The proposed BSTKS and other three methods were analyzed according to the different value of test position CPK and variance V. The results of simulations on 1 and 4 were illustrated in Figure 7, and the results near the left and right boundaries in 1 and 4 were summarized in Table 3. In general, all four methods tend to be better, when increases under a fixed V, or when V increases under a fixed . Meanwhile, for test position CPK near the left and right boundaries, the proposed BSTKS produces better performance than other three methods, because of the highest hit rate, the smallest error in all four methods, and higher accuracy and AUC than and SSA. Moreover, the simulations on 1 and 4 near the left and right boundaries were illustrated in Figure 8 in detail. In terms of the distribution of estimated CP (e-CP), PDF of e-CP, and AUC of ROC analysis, these simulations indicate that BSTKS is more sensitive for both left and right boundaries than other three methods, especially when sample size and variance V get smaller. Therefore, all simulation results above suggest that our proposed BSTKS is an encouraging and efficient method for abrupt change detection from the synthetic time series datasets, because of the shortest computation time, the highest hit rate, and accuracy out of four methods, especially for less significant statistic fluctuation when gets smaller, as well as for less significant variance fluctuation when gets bigger, and V gets smaller.   [34,35]. In this CHBMIT EEG database, some subjects were monitored up to several days after withdrawal of antiseizure medication in order to characterize their seizures and assess their candidacy for surgical intervention. Based on these EEG recordings in the CHBMIT EEG database, as well as some existing experiments in [36][37][38][39], the proposed BSTKS and other three methods were tested according to different value of test position CPK and sample size . First, a diagnosed EEG sample = [ , ] was assembled from two significantly different segments, in which = { 1 , . . . , } and = { +1 , . . . , } were derived from chb01 04 edfm and chb01 05 edfm, respectively. Then, BSTKS and other three methods were tested on the assembled EEG recordings 1 -8 , respectively, according to the different value of assigned test position CPK and sample size . The results of abrupt change detection on these assembled EEG samples were illustrated in Figure 8 and summarized in Table 4. Generally speaking, all four methods can roughly estimate the assigned test position from each assembled EEG recording and then divide it into two adjacent segments and . It is worth stressing that the proposed BSTKS can discern the different EEG segments accurately with the smallest error and the highest accuracy out of four methods. Also, BSTKS is the most efficient and encouraging with the shortest time in all four methods.
Moreover, for CPK near the left and right boundaries in 1 -8 , BSTKS has much better sensitivity than other KS, , and SSA methods because of the smallest error and the highest accuracy, especially for less statistic fluctuation when gets smaller, as well as less significant variance fluctuation when gets bigger. Supposing the assembled EEG sample indicates that a sharp transition of one's mental situation occurs before and after a sudden attack or acute stimulation, it is meaningful to estimate the location of the abrupt change and the maximal difference of data distribution exists between two adjacent EEG segments. These experiments above suggest that the proposed BSTKS can successfully detect the change position where a sudden change occurs under a potential mental shock, more quickly and efficiently than KS, , and SSA methods.
Second, the original EEG samples 1 -6 were selected directly from different recordings in the chb01 05 edfm; then the proposed BSTKS and other three methods were tested according to different sample size . Because the distance of e.c.d.f (V.e.c.d.f) can partly reflect the data fluctuation between two adjacent EEG segments, we use this V.e.c.d.f variable to distinguish different performance of BSTKS and other three methods. The results of abrupt change analyses on these original EEG recordings were shown in Figure 9 and summarized in Table 5. For all methods above, they can estimate an abrupt change from each of these original EEG samples 1 -6 and then divide it into two adjacent EEG segments. Compared with other three methods, the proposed BSTKS is encouraging for the shortest time out of four methods. Moreover, BSTKS has bigger V.e.c.d.f than and SSA, which means that it can more reasonably distinguish two adjacent EEG segments with different state of mental health. Although KS has the biggest V.e.c.d.f in all four methods, it takes much more search time than BSTKS, especially when  In the -axis of (b-e), the methods "1," "2," "3," and "4" stand for BSTKS, KS, , and SSA, respectively.  is detected from the root to leaf nodes of two BSTs in terms of three search criteria. The novelty of the proposed method is addressed by comparing with other KS, , and SSA methods, and simulations on the synthetic time series indicate that the proposed BSTKS is more efficient due to the shortest time, the highest hit rate, and the smallest error and highest accuracy out of four methods. Meanwhile, BSTKS has better sensitivity than KS near the left and right boundaries, because of shorter search time, higher hit rate, and bigger AUC, especially when sample size gets smaller with less significant statistic fluctuation. In addition, the necessity of the proposed method in the real domain is analyzed on real EEG recordings, and the results indicate that the proposed method can successfully discern an abrupt change and then obviously distinguish two adjacent EEG segments from the real EEG recordings. Through inspecting the significant fluctuation between adjacent segments signals, it is encouraging further for useful information inspection on all kinds of physiological and psychological time series signals. In a word, our BSTKS is a novel, efficient, and promising method for abrupt change analysis, and it is very helpful for useful information inspection on all kinds of real time series with different scales.