ChangePoint (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from largescale bioelectric signals. Currently, most of the existing methods, like KolmogorovSmirnov (KS) statistic and so forth, are timeconsuming, especially for largescale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS,
Abrupt change detection is to identify abrupt changes in the statistical properties of a signal series, which occur at unknown instants [
In community of statistics, some nonparametric approaches for CP detection have been widely explored. For example, KS statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution or between the empirical distribution function of two samples [
On the other hand, SSA is a powerful technique for time series analyses. SSA is nonparametric and requires no prior knowledge on the properties of time series signal [
In addition, Wavelet Transform (WT) is another important tool for time series analyses [
However, all of these methods above are timeconsuming and sometime invalid for abrupt change detection near the left or the right boundary, especially for insignificant data fluctuation in largescale time series. To resolve these problems, we propose a fast framework for CP detection based on binary search trees and a modified KS statistic, termed BSTKS for short. In this novel method, first, two BSTs are derived from a diagnosed time series. Second, three search criteria are introduced in terms of the statistic and variance fluctuations between two adjacent time series segments, and then an optimal search path is detected from the root to leaf nodes of two BSTs. Last, the proposed BSTKS and other KS,
In general, for a certain bioelectric signal, an abrupt change means an important transition of biological functions or health states before and after a strong attack or an acute perturbation from internal or external environment. Therefore, it is very necessary to not only discern abrupt change from all kinds of physiological and psychological time series signals, but also inspect the significant fluctuation between adjacent time series segments with different scales. The following sections focused on not only presenting the framework of the proposed BSTKS method through theoretical foundation, simulation, and evaluation, but also discussing how it can more quickly and efficiently detect abrupt change on both synthetic and real bioelectric EEG signals than other existing KS,
KS statistic is sensitive to differences in both location and shape of the cumulative distribution functions (c.d.f) of two samples. The null distribution of KS statistic is calculated under the null hypothesis that the two samples are drawn from the same distribution or one sample is drawn from the reference distribution. To detect an abrupt change from a diagnosed time series
Supposing a time series sample,
In order to discern an abrupt change on
if
if
in which
Provided the statistic fluctuation defined in (
Supposing two adjacent segments
Here,
In the first part of the proposed BSTKS method, two BSTs, that is, BSTcA and BSTcD, are constructed from a time series sample
The diagram of a discrete time series
Supposing the size of a diagnosed
During two BSTs’ construction, as shown in Figure
The diagrams of two binary trees, BSTcA and BSTcD, which are constructed by McA and McD, as well as the original time series
To find an optimal path towards the potential CP within a given time series
For a current nonleaf node
Given two statistic fluctuation variables
if
if
For a selected nonleaf node
The scheme of Criterion
For a current nonleaf node
Suppose Criterion
Given two variance fluctuation variables
if
if
Similarly, as illustrated in Figure
The scheme of Criterion
Based on Criterions
Supposing the current node
Consider that the largest statistic fluctuation between
Given
if
if
otherwise, no abrupt change is detected from
Obviously, if
Supposing a nonleaf node
There are many methods proposed for abrupt change detection in time series, and the following are some typical methods, to evaluate the proposed BSTKS framework.
In this section, the proposed BSTKS is evaluated on the synthetic time series and real EEG recordings with different size
In our simulations, some typical time series samples were derived from the normally distributed datasets (mean,
First, simulations were carried out according to different value of sample size
The averaged results on four methods with datasets
Time  Hit rate  Error  Accuracy  AUC  







KS  .3537  .0841  38.7321  .8804  .8984 

1.0068  .0168  56.3036  .8878  .7960 
SSA  1.5218  .0583  41.9464  .8762  .9941 
The simulations on
Performance analysis on different
Averaged analysis
Second, simulations were carried out based on the datasets
The summary of simulations according to different variances in
Items  Methods  






KS 

SSA 

KS 

SSA 

KS 

SSA  

Time 









6.96  17.2  24.6 
Hit rate 

.005  .010  .038 

.093  .005  .025 

.056  .006  .034  
Accuracy 

.515  .792  .748 

.995  .905  .899 

.999  .944  .884  
AUC 

.694  .644  .997 

.951  .978  .971 

1.00  .999  1.00  



Time 

.034  .223  .113 

.356  1.97  2.98 

7.09  18.1  26.5 
Hit rate 

.013  .007  .035 

.001  .099  .142 

.045  .000  .035  
Accuracy 

.552  .839  .756 

.998  .939  .974 

.999  .940  .986  
AUC 

.695  .851  .997 

.992  .997  .991 

1.00  .998  1.00  



Time 

.035  .229  .115 

.345  1.91  2.88 

7.60  19.7  29.1 
Hit rate 

.061  .007  .049 

.090  .000  .049 

.028  .000  .053  
Accuracy 

.737  .958  .765 

.997  .971  .983 

.999  .984  .997  
AUC 

.754  .908  .997 

.999  .996  1.00 

1.00  .999  1.00  



Time 

.037  .245  .125 

.382  2.08  3.28 

8.08  20.5  31.4 
Hit rate 

.086  .002  .037 

.084  0.00  .046 

.035  0.00  .045  
Accuracy 

.818  .952  .773 

.996  .986  .983 

.999  .991  .998  
AUC 

.938  .655  .997 

.996  .728  .100 

.999  .467  1.00 
The simulations on 200 samples in
Time
Hit rate
Error
Accuracy
AUC
Third, simulations were implemented based on different CP test positions within
The summary of simulations on
Items  Methods  







KS 

SSA 

KS 

SSA 

KS 

SSA 

KS 

SSA  

Hit rate 









.160  0.0  0.0 

.165  0.0  .065 
Error 

20  8  3 

7  8  12 

1  134  49 

21  102  33  
Accuracy 

.375  .750  .906 

.781  .750  .625 

.996  .476  .808 

.918  .601  .871  
AUC 

.963  .641  .978 

.987  .599  .978 

.946  .780  .797 

.884  .564  .797  



Hit rate 

.160  .040  .295 

.190  .005  0.0 

.195  0.0  0.0 

.175  .025  .100 
Error 

10  4  2 

3  8  12 

0  112  10 

1  90  1  
Accuracy 

.687  .875  .937 

.906  .750  .625 

1.0  .562  .960 

.996  .648  .996  
AUC 

.883  .641  .978 

.927  .599  .978 

.986  .780  .829 

.988  .657  .911  



Hit rate 

.240  0.0  .220 

.175  0.0  0.0 

.160  0.0  0.0 

.150  .005  .065 
Error 

6  1  2 

1  4  12 

0  107  5 

1  36  4  
Accuracy 

.812  .968  .937 

.968  .875  .625 

1.0  .582  .980 

.996  .859  .984  
AUC 

.979  .770  .978 

.875  .808  .978 

.985  .780  .999 

.990  .752  .995  



Hit rate 

210  0.0  .265 

.215  0.0  0.0 

.195  0.0  0.0 

.145  0.0  .060 
Error 

6  1  2 

1  1  13 

0  119  5 

2  11  4  
Accuracy 

.812  .968  .937 

.968  .968  .593 

1.0  .535  .980 

.992  .957  .984  
AUC 

.960  .770  .978 

.822  .808  .978 

.940  .780  .999 

.990  .752  .998 
The simulations on
Results of simulation on
Results of simulation on
Results of simulation with
Results of simulation on
The results of CP detection on the assembled EEG samples
Therefore, all simulation results above suggest that our proposed BSTKS is an encouraging and efficient method for abrupt change detection from the synthetic time series datasets, because of the shortest computation time, the highest hit rate, and accuracy out of four methods, especially for less significant statistic fluctuation when
To verify the proposed method further, we take some representative samples from the CHBMIT Scalp EEG Database. In the PhysioBank platform, the CHBMIT Scalp EEG Database (CHBMIT) was collected at the Children’s Hospital Boston; it consists of EEG recordings from pediatric subjects with intractable seizures [
First, a diagnosed EEG sample
The summary of abrupt change detection on
M 

Mean  


2^{7}  2^{8}  2^{9}  2^{10}  2^{7}  2^{8}  2^{9}  2^{10}  

25  50  100  200  100  200  400  900  
eCP 










KS  29  36  95  206  34  92  301  795  NA  

24  255  31  1023  31  199  33  1023  NA  
SSA  32  55  398  1007  106  208  500  907  NA  


Err 










KS  4  14  5  6  66  108  99  105  50.9  

1  205  69  823  69  1  367  123  207.3  
SSA  7  5  298  807  6  8  100  7  154.8  


Acc 










KS  .97  .94  .99  .99  .48  .57  .81  .89  .83  

.99  .20  .86  .20  .46  .99  .28  .88  .61  
SSA  .94  .97  .42  .21  .94  .97  .80  .99  .78  


Time 










KS  .019  .021  .038  .049  .020  .029  .039  .052 



.03  .063  .088  .170  .031  .050  .081  .174 


SSA  .037  .071  .126  .239  .035  .065  .118  .245 

Moreover, for
Second, the original EEG samples
The summary of CP detection from the original EEG samples
M 

Mean  

2^{9}  2^{10}  2^{11}  2^{12}  2^{13}  2^{14}  
eCP 








KS  348  317  1342  2252  4673  5947 



511  314  17  4095  10  16383 


SSA  426  854  90  2634  408  11271 




V.e.c.d.f 








KS  .4603  .3829  .4407  .3050  .3325  .2234 



0  .1257  .5384  0  0  0 


SSA  .1368  .0850  .1260  .0745  .0212  .0012 




Time 








KS  .016  .041  .112  .466  1.461  5.638 



.072  .137  .281  .913  1.726  4.709 


SSA  .107  .209  .415  1.103  1.769  3.548 

The analyses of abrupt change on the original EEG samples, by BSTKS, KS,
For these original EEG recordings with intractable seizures, it is of great concern to predict when and where a significant change happens from these EEG signals. This abrupt change probably indicates that a patient encounters a vertical transition from a previous mental status, and it is very important and helpful for diagnosing the patients with intractable seizures. These experiments on original EEG samples above indicate that the proposed BSTKS can not only accurately detect the change position, but also estimate the maximal difference of data distribution existing between two adjacent EEG segments, more quickly and efficiently than existing KS,
In this paper, a novel BSTKS method is proposed based on binary search trees and a modified KS statistic. In this method, two BSTs were constructed from a diagnosed time series by multilevel HWT, and then an optimal search path is detected from the root to leaf nodes of two BSTs in terms of three search criteria. The novelty of the proposed method is addressed by comparing with other KS,
The authors declare that they have no competing interests.
The authors would like to thank Professor Mohan Karunanithi in the Australia eHealth Research Centre, CSIRO Computation Informatics, for his assistance, support, and advice for this paper. This paper is supported by National Natural Science Foundation of China (no. 13K10414 and no. 61104154) and Specialized Research Fund for Natural Science Foundation of Shanghai (nos. 16ZR1401300 and 16ZR1401200).