MPE Mathematical Problems in Engineering 1563-5147 1024-123X Hindawi 10.1155/2018/8371085 8371085 Research Article A Fast Screen and Shape Recognition Algorithm for Multiple Change-Point Detection http://orcid.org/0000-0002-8394-5709 Zhuang Dan 1 http://orcid.org/0000-0002-5465-5243 Liu Youbo 2 Cuevas Erik 1 School of Statistics Southwestern University of Finance and Economics China swufe.edu.cn 2 School of Electrical Engineering and Information Sichuan University China scu.edu.cn 2018 11102018 2018 02 08 2018 13 09 2018 11102018 2018 Copyright © 2018 Dan Zhuang and Youbo Liu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

A Fast Screen and Shape Recognition (FSSR) algorithm is proposed with complexity down to O(n) for the multiple change-point detection problems. The proposed FSSR algorithm includes two steps. First, by dividing the data into several subsegments, FSSR algorithm can quickly lock some small subsegments that are likely to contain change-points. Second, through a point by point search in each selected subsegment, FSSR algorithm determines the precise location of the change-point. The simulation study shows that FSSR has obvious speed and stability advantages. Particularly, the sparser the change-points is, the better result will be achieved from FRRS. Finally, we apply FSSR to two real applications to demonstrate its feasibility and robustness. One is the problem of DNA copy number variations identifying; another is the problem of operation scenarios reduction for renewable integrated electrical distribution network.

National Natural Science Foundation of China 11471264 11401148 51437003
1. Introduction

The change-point detection has been studied in various fields including environtology , climatology , agricultural economy , bioinformatics , and public economics . In this paper, the basic and canonical normal model with multiple mean change-points  is considered as follows:(1)Xi=μi+εi,εi~N0,σ2,  i=1,,n,where μ1=μ2==μτ1μτ1+1==μτ2μτ2+1==μτNμτN+1==μn) is assumed and {μ1,μ2,,μn}T is piecewise-constant vector. Besides, τ=τ1,,τN is the location vector of the change-points and N is the number of change-points. It is also assumed that Nn and any two change-points are “not too close to each other”.

A class of classical methods is to estimate the number and locations of change-points by fitting criterion, such as AIC  and BIC [6, 10, 11]. However, the computational complexity of these methods is very high. Braun et al.  and Bai and Perron  employed a dynamic programming algorithm to reduce the computational cost to the order of O(n2). Based on a minimum description length information criterion, Lu et al.  proposed an information theory approach from a nontraditional view, by using genetic algorithms tool to optimize the objective function. But it is still unfavourable for large n . Nevertheless, several algorithms are available to detect multiple change-points for big data. Antoch and Jaruskova  focus on an effective calculation of critical for large sample, by minimizing costs over segmentation and using dynamic modelling principles, some other methods by segmenting data to speed up algorithms (such as [16, 17]) or using regularization techniques .

To reduce the computational complexity, some stepwise approaches are proposed. Since being proposed, LASSO  has become a very popular statistical approach. After a reparametrization θi=μi+1-μi, Huang et al.  used LASSO-type model and the LARS algorithm to find the solution in time complexity of O(nlog(n)). Moreover, some LASSO-type methods were proposed to improve the adaptability and robustness (see, e.g., ).

Binary segmentation (BS) algorithm is another classical stepwise technique for multiple change-point detection by combining with a CUSUM statistic . Due to its low computational complexity of O(nlog(n)) and the fact that the execution of this algorithm is easy, BS has been widely studied and used. However, the stopping rule was difficult to compute in practice due to influence by the previously detected change-points. For example, Chen and Gupta  studied the problem of covariance change-point by embedding SIC into the BS procedure. In theoretical side, BS is only consistent when the minimum spacing between any two adjacent change-points is of order almost n3/4 . Circular Binary Segmentation (CBS) and Wild Binary Segmentation (WBS) were proposed to overcome the defect that BS cannot detect a small changed segment buried in the middle of large segments [8, 26, 27]. Some authors proposed many multidimensional approaches based on CUSUM statistics . By extending the CUSUM to kernel function, Cabrieto et al.  detected correlation changes in multivariate systems.

Recently, by using a sliding fixed window approach, Niu and Zhang  proposed a very efficient (the computational complexity can even reach O(n)) and effective screening method known as Screening and Ranking algorithm (SaRa). Chu  presented two online, sliding window segmentation algorithms for single change-point detection problem. As far as we know, SaRa is the fastest algorithm at present for multiple change-points detection, because the local CUSUM statistic and forward scan algorithm are used. However, the bandwidth h is a crucial parameter for accurately identifying the change-points. To select a good bandwidth h, Niu and Zhang  suggested that one can try several bandwidths, respectively. The performance of SaRa will be disturbed if the bandwidth is too large or too small . Xiao et al.  proposed a modified SaRa (mSaRa) by applying the quantile normalization and a mixture of model-based clustering. Yau and Zhao  also used a scan method to construct confidence intervals for multiple change-points in time series.

Although the computational complexity of SaRa is down to O(n), there are still some rooms for reducing the computation cost under the assumption that Nn. In the existing algorithms, it is necessary to determine one point is change-point at least to compare all CUSUM statistics near this point. When this point becomes the maximum value and exceeds the threshold value, it can be finally confirmed as a change-point. Computation and comparison of CUSUM statistics are the main computational cost of those algorithms. However, for data series, if we can quickly lock the areas of change-points through some simple methods, a lot of calculation and comparison of CUSUM statistics will be avoided.

In this paper, we make two contributions. First, we show that our FSSR algorithm can make the computational complexity of the algorithm far less than O(n). If Nn, the computational complexity of FSSR can be reduced to n. Second, in order to enhance the robustness, we embed a single-peak recognition mechanism into our algorithm. Furthermore, we also found that the proposed method has a more favorable performance when the change-points are sparser.

The paper is organized as follows. Our motivation is described in Section 2. In Section 3, the FSSR algorithm is introduced. The performances of FSSR, SaRa, and mSaRa are compared by a simulation study in Section 4. In Section 5, the proposed methodology is used in DNA copy number variations identifying and a practical engineering task involving electric power system, and we validate the effectiveness of our FSSR algorithm.

2. Motivation

Our motivation can be shown by Figure 1. Dividing the data into several small subsegments, we find that, in most small subsegments, the data is normal white noise with no change-point. The shape of data sequence is different only in a few subsegments which cover change-points. It is important to find these subsegments that contain change-points quickly. In addition, it is easy to pick out small subsegments that do not contain a change-point. Excluding these subsegments, the rest subsegments are likely to contain a real change-points.

A time series with four change-points.

Because two adjacent subsegments which do not contain a change-point have common mean, the difference between two CUSUM statistics of these two adjacent subsegments should be small. On the other hand, if a small subsegment covers a change-point, the difference between the CUSUM statistics of this small subsegment and the adjacent subsegment will be significant. Then we can identify subsegments with change-points through a suitable threshold. Let K be the number of subsegments. Therefore, to lock the subsegments containing change-points, we only need to calculate CUSUM statistics K times. Once we find out these small subsegments that contain change-points, we only need to search for change-points in these small subsegments.

3. Fast Screen and Shape Recognition Algorithm

In this section, we give a brief description of the FSSR.

3.1. FSSR Algorithm

First, for a given positive integer K, we split the data series X=X1,X2,,Xn into K+1 subsegments Y1,Y2,,YK+1 with almost equal length where Yi=Xni,,Xni+1-1 and 1=n1<n2<<nK+1<nK+2=n+1. By setting Ui={Yi,Yi+1}(i=1,2,,K), we get a set of subsegments U={U1,U2,,UK}.

Second, for each pair of two adjacent subsegments, the local CUSUM statistic is defined as follows. (2)Di=MUi+1-MUi,i=1,2,,K-1where MUi is the mean of subsegment Ui. We select an index set S={sj}(j=1,,T) by a given thresholding rule Dsj>λ, where S{1,2,,K-1}, λ is usually a quantile of Di under the assumption that there is no change-point. In the paper, we let λ=K+1U1-ασ^/8n where U1-α is the upper 1-α quantile of standard normal distribution, σ^=i=1Kσ^i/K, and σ^i is the sample standard deviation of subsegment Yi. The selected sj implies that change-point is likely to be covered by the subsegment Ysj+1.

Third, based on the front screen, there is no change-point in the most subsegments, then we only need to search change-points in each selected subsegment Ysj+1(j=1,2,,T). To detect the exact location of a change-point, it is needed to search in the each selected subsegment point by point. Let h=[n/K+1] mean that h is largest integer less than n/K+1. Let C(x,h)=|i=1hXx+1-k/h-i=1hXx+k/h| be a local CUSUM statistic to detect change-points. For all points x(nsj+1,nsj+2), if C(τ^k,h) is the h-local maximizer, τ^k is an estimator of change-point τk. Put all h-local maximizers together; we get the final estimator for location vector of change-points τ^=τ^1,,τ^N^ where N^ is the estimator of N after single-peak recognition.

A flow chart of FSSR algorithm is given in Figure 2.

Flow chart of FSSR algorithm.

3.2. Robustness

The good performance of CUSUM statistic is based on the normal assumption of error. In practice, the data does not necessarily obey a normal distribution. Xiao et al.  used the Quantile normalization (QN) on the original intensities to seek the requirement of normality. Then, two robust processes embedded into our FSSR algorithm.

First, QN is used to make the data close to follow a normal distribution at each subsegment. In the procedure of FSSR, we rank the data in each subsegment. Then a sample with the same size as each subsegment from the standard normal distribution N(0,1) is simulated. At last, we replace the data of each subsegment by the simulated sample from N(0,1) and run our algorithm on the new data series.

Second, a single-peak recognition is used to enhance the robustness of the local maximizer. In most algorithms (such as BS, WBS, SaRa, and mSaRa), local maximum principle and threshold are used to confirm the change-point. In practice, the choice of threshold is very sensitive and has great influence on the result. From Figure 3, we can see that the local CUSUM statistic indicates a single-peak at each change-point. In this paper, to further improve the robustness of change-point detection, we define a simple single-peak principle. For any local maximum point x, let Dxl=i=1hI(C(x-i+1,h)>C(x-i,h)) and Dxr=i=1hI(C(x+i,h)>C(x+i+1,h)). If Dxl+Dxr/2h>γ, a cut-off value, the point x is confirmed as a change-point. Obviously, the bigger γ, the stricter our rule. We find that FSSR algorithm performs well when γ(0.7,0.75) through some simulation experiments. In practice, we use γ=0.7 in order to identify as many potential change-points as possible.

A sketch for local CUSUM statistic with 8 change-points.

3.3. Computational Complexity

The time complexity in the FSSR is twofold. First, in the scan step, it is only needed to calculate K local CUSUM statistics. Then the computational complexity of this step is O(K). In the second step, to detect the exact location of change-point, we need to calculate the local CUSUM statistic at each point of each selected subsegment. The computational complexity of this step is O(nT/K). Then the computational complexity of this algorithm is O(nT/K+K). If the change-points are sparse enough to satisfy N=o(n), we can assume that T=o(n) because T is the number of the selected subsegments and is very close to N. For example, if T=[Mlog(n)] (where M>0 is a constant), the computational complexity reduces down to O(n) by setting K=n. In practice, we use K=[n].

4. Simulation Study

Many papers show that SaRa and mSaRa are better than those BS-type methods, such as Niu and Zhang , Xiao et al. , and Song et al. . Then, in this section, the performance of FSSR against SaRa and mSaRa should be useful to examine.

4.1. An Example

Before conducting large-scale simulation experiments, we first demonstrate the implementation process and effect of our FSSR algorithm through an example. We consider an example with n=2000 and N=5. In Figure 4, the top plot is the initial result based on screening and the second lower plot is the final result after single-peak recognition. By the screening process, we identify 17 points which are very close to change-points. Based on these points, we carry out local search and finally get 5 change-points through single-peak recognition. The all detected change-points are marked by vertical lines.

Process of FSSR algorithm.

From this example, we can see that our FSSR algorithm can quickly and accurately find the change-points. In order to show more comparisons, we consider the normal error case in Section 4.3 and t error case in Section 4.4, respectively.

4.2. Simulation Design

Before presenting the detailed comparison, we give the simulation design.

First, the generation of basic data comes from the standard normal distribution and a student t(3) distribution.

Second, the jump size of change-point δi=μτi-μτi+1 is generated by a random mechanism. We set δi=(2Bi-1)(qUi+(1-q)) where q(0,1) is a variable that controls the degree of heterogeneity of δi (in this paper we chooce q=0.2), Ui~N(0,1), Bi~b(1,0.5), and Bi and Ui are independent of each other.

Third, we consider four sample sizes (n=500, 3000, 5000, 8000) and five change-points numbers (N=5, 10, 15, 20, 30, 50). The change-points are scattered in the data segment according to a random mechanism. N+1 random numbers are extracted from the uniform distribution on interval (1, 5) and are recorded as L1,L2,,LN+1. We let the location of change-point τi be [j=1iLj/j=1N+1Ljn] (i=1,...,N).

4.3. Performance on Normal Data

In this case, because the error is normal, the QN process is not embed into our algorithm. From Table 1, there are some observations as follows.

Distribution of N^-N for the various competing algorithms and sample sizes (100 simulated sample paths), with BIC values and run times (normal error case).

 N ^ - N Sample Size Method ≤ - 3 -2 -1 0 1 2 ≥ 3 BIC Time n=500 N=5 FSSR 6 20 48 23 3 0 0 0.0662×103 0.0113 SaRa 0 1 5 14 22 32 26 0.1495×103 0.0231 mSaRa 48 30 11 10 0 1 0 0.1644×103 0.3802 n=3000 N=10 FSSR 1 5 37 45 10 2 0 0.1445×103 0.0583 SaRa 0 0 2 13 7 2 76 0.6343×103 0.1339 mSaRa 31 8 11 9 12 8 21 0.6626×103 1.3432 N=15 FSSR 20 22 29 24 5 0 0 0.2644×103 0.0593 SaRa 5 3 9 15 6 8 54 0.9871×103 0.1101 mSaRa 88 6 2 1 1 0 2 0.8995×103 1.0129 n=5000 N=10 FSSR 0 1 19 67 10 3 0 0.1357×103 0.0794 SaRa 0 0 1 10 4 2 83 0.9874×103 0.1869 mSaRa 11 2 5 10 7 5 60 0.8727×103 2.3090 N=20 FSSR 15 22 28 27 6 1 1 0.3125×103 0.1280 SaRa 7 3 4 24 6 6 50 1.5053×103 0.2049 mSaRa 86 1 2 2 1 0 8 1.4307×103 2.0621 N=30 FSSR 44 6 9 16 7 9 9 0.5242×103 0.1317 SaRa 30 8 12 20 10 5 15 3.1588×103 0.1973 mSaRa 100 0 0 0 0 0 0 1.9524×103 1.4564 n=8000 N=10 FSSR 0 0 25 55 17 3 0 0.1759×103 0.0920 SaRa 0 0 6 9 5 6 74 1.9034×103 0.3024 mSaRa 2 1 1 7 5 3 81 1.0359×103 2.3814 N=20 FSSR 13 11 28 37 10 1 0 0.3645×103 0.2373 SaRa 7 4 3 18 6 4 58 2.2859×103 0.4487 mSaRa 64 8 7 3 2 0 16 1.8060×103 3.2446 N=30 FSSR 12 4 19 11 12 7 35 0.5457×103 0.2480 SaRa 17 3 11 32 5 3 29 3.9068×103 0.3850 mSaRa 95 0 1 0 1 0 3 2.7051×103 2.5822 N=50 FSSR 95 3 1 0 1 0 0 1.0258×103 0.3051 SaRa 51 12 14 14 2 4 3 6.9394×103 0.3918 mSaRa 99 1 0 0 0 0 0 3.3326×103 3.0239

( 1 ) It is obvious that FSSR has a significant speed advantage. For fixed n, the speed advantage of FSSR is more significant as N becomes smaller. For fixed N, the speed advantage of FSSR is more significant as n becomes larger. In summary, the more sparse the change-points are, the more obvious the speed advantage of FSSR is.

( 2 ) For fixed n, the consistency of change-point detection becomes better as N becomes smaller. For example, the probability of |N^-N|1 is up to 0.97.

( 3 ) Under the BIC criterion, the change-point detection result based on our FSSR algorithm is always the best one for segmental fitting data.

4.4. Robustness on t-Distribution

To investigate the effect of our FSSR on the thick tail errors, we set the errors to obey the t distribution with 3 degrees of freedom.

Besides the advantages similar to the normal case, we get some new discoveries in Table 2.

Distribution of N^-N for the various competing methods and sample sizes (100 simulated sample paths) with BIC values and run times (t error case).

 N ^ - N Sample Size Method ≤ - 3 -2 -1 0 1 2 ≥ 3 BIC Time n=500 N=5 FSSR 23 21 27 20 8 0 1 0.3016×103 0.0266 SaRa 9 11 37 21 18 2 2 0.4719×103 0.0233 mSaRa 39 23 26 11 1 0 0 0.4054×103 0.2920 n=3000 N=10 FSSR 18 32 19 24 5 0 2 1.6737×103 0.1009 SaRa 37 9 5 12 3 12 22 2.7135×103 0.1253 mSaRa 47 11 8 8 8 3 15 2.0120×103 0.6151 N=15 FSSR 55 28 12 2 2 1 0 1.6977×103 0.1240 SaRa 68 3 9 5 2 1 12 3.1348×103 0.1283 mSaRa 80 6 6 3 4 1 0 2.1958×103 0.6368 n=5000 N=10 FSSR 18 19 30 20 11 2 0 2.7748×103 0.1771 SaRa 35 8 6 5 5 4 37 4.3186×103 0.2001 mSaRa 38 4 8 10 6 4 30 3.3002×103 0.8318 N=20 FSSR 66 23 9 7 1 0 0 2.7637×103 0.1921 SaRa 77 4 4 1 6 2 6 5.0548×103 0.1948 mSaRa 91 2 4 0 2 0 1 3.5509×103 0.8705 N=30 FSSR 99 1 0 0 0 0 0 2.8936×103 0.2050 SaRa 90 0 4 1 0 2 3 6.3680×103 0.1879 mSaRa 99 1 0 0 0 0 0 4.2754×103 0.8828 n=8000 N=10 FSSR 10 15 37 27 9 2 0 4.4204×103 0.2831 SaRa 17 5 3 5 6 5 59 6.8527×103 0.3153 mSaRa 15 7 6 4 4 9 55 4.9348×103 1.3940 N=20 FSSR 30 30 28 10 2 0 0 4.3621×103 0.3262 SaRa 61 5 5 4 4 5 16 8.6200×103 0.3047 mSaRa 76 4 4 5 2 1 8 5.6735×103 1.2837 N=30 FSSR 93 5 2 0 0 0 0 4.4635×103 0.3484 SaRa 90 2 4 1 1 0 2 10.5598×103 0.3003 mSaRa 95 2 0 0 2 0 1 6.6622×103 1.1897 N=50 FSSR 100 0 0 0 0 0 0 4.7080×103 0.3608 SaRa 98 1 0 1 0 0 0 12.0999×103 0.3109 mSaRa 100 0 0 0 0 0 0 7.3868×103 1.2347

( 1 ) As the QN process is used, the speed advantage of our FSSR is weakened and sometimes it is even slower than SaRa. Compared to mSaRa, the speed advantage of our FSSR algorithm is still very obvious.

( 2 ) Under the same conditions except for the error distribution, the consistency of change-point detection of all algorithms is not as good as those in Table 1.

5. Real Data 5.1. Application to Coriel Data

Several methods based on change-point (e.g., [7, 26]) have been widely studied and applied in copy number variation (CNV) detection.

Generally, as a new source of genetic variation, copy number variation (CNV) plays an important role in phenotypic diversity and evolution. Moreover, many studies have shown that CNV is related to the pathogenicity mechanism of some diseases, including cancer, schizophrenia, and so on . Compared with a reference genome assembly , CNV usually refers to the deletion or amplification of a region of DNA sequences. Recently, with the significant advances in DNA array technology to detect DNA CNV, various techniques and platforms have been developed for analyzing DNA copy number, including array comparative genomic hybridization (aCGH), single nucleotide polymorphism (SNP) genotyping platforms, and next-generation sequencing, which provided lots of data. The goal of analyzing of DNA copy number data is to divide the whole genome into segments where copy number vary between contiguous segments and then quantify for each segment. Hence, the target of change-point based methods is to identify the exact locations of copy number changes.

For demonstrating the high efficiency and precision of FSSR, we use the FSSR to analyze the Coriel data set (Download at http://www.nature.com/ng/journal/v29/n3/suppinfo/ng754S1.html), which is firstly studied by Snijders et al. . The well-known data set has been widely used in evaluating CNV detection algorithms ([7, 11, 20, 23, 26, 43, 44] and among others). The data sets consist of a logarithmic ratio of normalized intensities from the disease versus control samples, which are indexed by the physical location of the probes on the genome. The goal is to identify segments of concentrated high or low log ratios. The experiment on 15 fibroblast cell lines makes up the data sets. Each fibroblast cell line contains measurements for 2700 BACs (bacterial artificial chromosome) spotted in triplicate. There are 15 chromosomes with partial alterations and 8 whole chromosomal alterations. All of these alterations but one (Chromosome 15 on GM07801) were confirmed by spectral karyotyping. As shown in Figure 5, we apply FSSR to four chromosomes. They are Chromosome 1 of GM13330, Chromosome 7 of GM07081, Chromosome 11 of GM05296, and Chromosome 14 of GM01750. In the diagram, the points are normalized log ratios, and the dashed lines are locations of change-points detected by our proposed method. As the results show, FSSR identifies all. The results of SaRa or some other methods applied in this real data can consult references ([7, 44] and among others).

A FSSR analysis of the fibroblast cell line on Chromosome 1 of GM13330, Chromosome 7 of GM07081, Chromosome 11 of GM05296, and Chromosome 14 of GM01750.

5.2. Application to Electric Power System

In this section, we apply the proposed FSSR approach in a real industry application to the electric power system. In the data analysis, the FSSR algorithm can be seen to overperform the SaRa and mSaRa algorithms.

In recent years, the electric distribution network (DN) faces a new challenge to the integration of distributed generations (DGs), after access of distributed scenario energy in the power system. A reasonable and appropriate plan needs to be considered to secure DN for future years. However, in order to save cost, few typical scenarios, which are used to guide in future years, are required to extract from existing massive scenarios. The power load data in the electric power system is typically time series, so the typical scenario reduction can be treated as a problem of detecting change-points.

The real data are collected from the 220kv grade DN of Sichuan province in China. Because the real data can only store for three months in practice, so we intercept data from April 20, 2016, 0:04:00 am, to May 31, 2016, 23:59:00 pm. An observation is recorded every 5 minutes; therefore the sample size is n=11802.

We apply FSSR, SaRa, and mSaRa algorithms to the time series of the power data on two transformers, respectively. The results of active power and reactive power are presented in Figures 6 and 7, respectively. In Figures 6 and 7, the vertical line represents the location of change-point given by the algorithms.

FSSR compared with SaRa and mSaRa applied in power time series data of transformer 1 to extract typical scenarios, with vertical lines corresponding to change-point locations given by the algorithms.

FSSR combined with SaRa and mSaRa applied in power time series data of transformer 2 to extract typical scenarios, with vertical lines corresponding to change-point locations given by the algorithms.

Tables 3 and 4 show the fitting effect, number of change-points selected, and running time of three algorithms. The BIC value of FSSR is lowest and the number of change-points given by FSSR is smallest, while the running time of FSSR is almost as short as SaRa and is obviously shorter than mSaRa.

The performance of FSSR, SaRa, and mSaRa on transformer 1.

BIC Number of Change-points Time
FSSR 1.5014×104 44 0.3148
SaRa 1.6664×104 138 0.3055
mSaRa 1.5255×104 119 5.2266

The performance of FSSR, SaRa, and mSaRa the transformer 2.

BIC Number of Change-points Time
FSSR 7.0256×103 42 0.2950
SaRa 7.2773×103 148 0.2214
mSaRa 7.7619×103 115 7.2700
6. Concluding Remarks

For the multiple change-point detection problems, an optimal method is mainly evaluated with two aspects: the detecting criterion of change-point and the design of algorithm.

For the criterion of detecting change-points, most of the existing methods are based on the maximization criterion of global CUSUM statistic (such as BS and CBS) or local CUSUM statistic (such as SaRa and mSaRa). From Figure 3, we note that a change-point not only is the local maximum but also should be the local single-peak of the CUSUM statistic distribution. Therefore our FSSR algorithm based on single-peak recognition is more robust than the traditional one by the maximization of the CUSUM statistic. In addition, we use QN on raw data to further enhance robustness.

During the algorithm design, a fast and efficient screening process is considered. We can select the approximate subsegments including change-points at very low computational cost.

Finally, the proposed FSSR has a good performance compared to the comparable existing algorithms according to our simulation and practical application results.

Data Availability

The data used to support the findings of Subsection 5.1 are included within the article, and the data used to support the findings of Subsection 5.2 are included within the supplementary information file.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Youbo Liu gives the practical motivation on the change-point detecting and offers the real data of application in electric power system. Moreover, he provides many good suggestions to revise the manuscript.

Acknowledgments

This research project was supported by the National Natural Science Foundation of China (nos. 11471264, 11401148, and 51437003).

Supplementary Materials

The real data is the power data, which is collected from the integration of distributed generations of Sichuan province in China. The gathering time of the data is from April 20, 2016, 0:04:00 am, to May 31, 2016, 23:59:00 pm. An observation is recorded every 5 minutes, and the sample size is n = 11802. The data contains two parts: active power and reactive power according to the property of the data. Then, we apply FSSR, SaRa, and mSaRa algorithms to the time series of the power data on two transformers, respectively.

Jarušková D. Change-point detection methods to environmental data Environmetrics 1997 8 5 469 483 10.1002/(SICI)1099-095X(199709/10)8:5<469::AID-ENV265>3.0.CO;2-J Lu Q. Lund R. Lee T. C. An MDL approach to the climate segmentation problem The Annals of Applied Statistics 2010 4 1 299 319 10.1214/09-AOAS289 MR2758173 Jin H. J. Miljkovic D. An analysis of multiple structural breaks in US relative farm prices Applied Economics 2010 42 25 3253 3265 2-s2.0-77957043901 10.1080/00036840802600368 Caron F. Doucet A. Gottardo R. On-line changepoint detection and parameter estimation with application to genomic data Statistics and Computing 2012 22 2 579 595 2-s2.0-81955161255 10.1007/s11222-011-9248-x Zbl1322.62085 Pezzatti G. B. Zumbrunnen T. Bürgi M. Ambrosetti P. Conedera M. Fire regime shifts as a consequence of fire policy and socio-economic development: An analysis based on the change point approach Forest Policy and Economics 2013 29 7 18 2-s2.0-84875614924 10.1016/j.forpol.2011.07.002 Yao Y.-C. Au S. T. Least-squares estimation of a step function Sankhya: The Indian Journal of Statistics, Series A 1989 51 3 370 381 MR1175613 Niu Y. S. Zhang H. The screening and ranking algorithm to detect DNA copy number variations The Annals of Applied Statistics 2012 6 3 1306 1326 10.1214/12-AOAS539 MR3012531 Fryzlewicz P. Wild binary segmentation for multiple change-point detection The Annals of Statistics 2014 42 6 2243 2281 10.1214/14-AOS1245 MR3269979 Zbl1302.62075 Ninomiya Y. Information criterion for Gaussian change-point model Statistics & Probability Letters 2005 72 3 237 247 2-s2.0-15844402690 10.1016/j.spl.2004.10.037 Zbl1075.62003 Yao Y. C. Estimating the number of change-points via schwarzs criterion Statistics & Probability Letters 1988 6 3 181 189 Zhang N. R. Siegmund D. O. A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data Biometrics 2007 63 1 22 32 10.1111/j.1541-0420.2006.00662.x MR2345571 Braun J. V. Braun R. K. Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation Biometrika 2000 87 2 301 314 10.1093/biomet/87.2.301 MR1782480 Bai J. Perron P. Computation and analysis of multiple structural change models Journal of Applied Econometrics 2003 18 1 1 22 2-s2.0-0037286212 10.1002/jae.659 Jackson B. Scargle J. D. Barnes D. Arabhi S. Alt A. Gioumousis P. Gwin E. Sangtrakulcharoen P. Tan L. Tsai T. T. An algorithm for optimal partitioning of data on an interval IEEE Signal Processing Letters 2005 12 2 105 108 2-s2.0-19944431581 10.1109/LSP.2001.838216 Antoch J. Jaruskova D. Testing for multiple change points Computational Statistics 2013 28 5 2161 2183 10.1007/s00180-013-0401-1 MR3107296 Killick R. Eckley I. A. Changepoint: An R package for changepoint analysis Journal of Statistical Software 2014 58 3 1 19 2-s2.0-84911215702 Maidstone R. Hocking T. Rigaill G. Fearnhead P. On optimal multiple changepoint algorithms for large data Statistics and Computing 2017 27 2 519 533 10.1007/s11222-016-9636-3 MR3599687 Zbl06697671 2-s2.0-84958236650 Maciak M. Mizera I. Regularization techniques in joinpoint regression Statistical Papers 2016 57 4 939 955 10.1007/s00362-016-0823-2 MR3571182 Zbl1351.62140 2-s2.0-84988417402 Tibshirani R. Regression shrinkage and selection via the lasso Journal of the Royal Statistical Society Series B 1996 58 1 267 288 MR1379242 Huang T. Wu B. Lizardi P. Zhao H. Detection of DNA copy number alterations using penalized least squares regression Bioinformatics 2005 21 20 3811 3817 2-s2.0-27544504676 10.1093/bioinformatics/bti646 16131523 Tibshirani R. Wang P. Spatial smoothing and hot spot detection for CGH data using the fused lasso Biostatistics 2008 9 1 18 29 2-s2.0-37249032736 10.1093/biostatistics/kxm013 17513312 Zbl1274.62886 Shen J. Gallagher C. M. Lu Q. Detection of multiple undocumented change-points using adaptive Lasso Journal of Applied Statistics 2014 41 6 1161 1173 10.1080/02664763.2013.862220 MR3268886 Zbl1352.86020 2-s2.0-84897975542 Li Q. Wang L. Robust change point detection method via adaptive LAD-LASSO Statistical Papers 2017 1 1 13 2-s2.0-85021707854 Venkatraman E. S. Consistency Results in Multiple Change-Point Situations, [Ph.D. thesis] 1992 Department of Statistics, Stanford University Chen J. Gupta A. K. Statistical inference on covariance change points in Gaussian model Statistics. A Journal of Theoretical and Applied Statistics 2004 38 1 17 28 10.1080/0233188032000158817 MR2039203 Olshen A. B. Venkatraman E. S. Lucito R. Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data Biostatistics 2004 5 4 557 572 2-s2.0-3543105225 10.1093/biostatistics/kxh008 Zbl1155.62478 Lai W. R. Johnson M. D. Kucherlapati R. Park P. J. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data Bioinformatics 2005 21 19 3763 3770 2-s2.0-27544483495 10.1093/bioinformatics/bti611 16081473 Batsidis A. Robustness of the likelihood ratio test for detection and estimation of a mean change point in a sequence of elliptically contoured observations Statistics 2010 44 1 17 24 10.1080/02331880902758029 MR2674404 Cho H. Fryzlewicz P. Multiple-change-point detection for high dimensional time series via sparsified binary segmentation Journal of the Royal Statistical Society: Series B 2015 77 2 475 507 10.1111/rssb.12079 MR3310536 Hao N. Niu Y. S. Zhang H. Multiple change-point detection via a screening and ranking algorithm Statistica Sinica 2013 23 4 1553 1572 MR3222810 Zbl06232421 2-s2.0-84884168489 Chen Z. Hu Y. Cumulative sum estimator for change-point in panel data Statistical Papers 2017 58 3 707 728 10.1007/s00362-015-0722-y MR3686847 Zbl1383.62204 2-s2.0-84945295970 Cabrieto J. Tuerlinckx F. Kuppens P. Wilhelm F. H. Liedlgruber M. Ceulemans E. Capturing correlation changes by applying kernel change point detection on the running correlations Information Sciences 2018 447 117 139 10.1016/j.ins.2018.03.010 Chu C.-S. J. Time series segmentation: A sliding window approach Information Sciences 1995 85 1-3 147 173 2-s2.0-0029344128 10.1016/0020-0255(95)00021-G Zbl0869.62059 Xiao F. Min X. Zhang H. Modified screening and ranking algorithm for copy number variation detection Bioinformatics 2015 31 9 1341 1348 2-s2.0-84946170256 10.1093/bioinformatics/btu850 25542927 Yau C. Y. Zhao Z. Inference for multiple change points in time series via likelihood ratio scan statistics Journal of the Royal Statistical Society Series B 2016 78 4 895 916 10.1111/rssb.12139 MR3534355 Song C. Min X. Zhang H. The screening and ranking algorithm for change-points detection in multiple samples The Annals of Applied Statistics 2016 10 4 2102 2129 10.1214/16-AOAS966 MR3592050 Zbl06688770 Diskin S. J. Hou C. Glessner J. T. Attiyeh E. F. Laudenslager M. Bosse K. Cole K. Mossé Y. P. Wood A. Lynch J. E. Pecor K. Diamond M. Winter C. Wang K. Kim C. Geiger E. A. McGrady P. W. Blakemore A. I. F. London W. B. Shaikh T. H. Bradfield J. Grant S. F. A. Li H. Devoto M. Rappaport E. R. Hakonarson H. Maris J. M. Copy number variation at 1q21.1 associated with neuroblastoma Nature 2009 459 7249 987 991 2-s2.0-67649289900 10.1038/nature08035 Kirov G. The role of copy number variation in schizophrenia Expert Review of Neurotherapeutics 2010 10 1 25 32 2-s2.0-74949088599 10.1586/ern.09.133 Ibáñez P. Bonnet A.-M. Débarges B. Lohmann E. Tison F. Pollak P. Agid Y. Dürr A. Brice P. A. Causal relation between α-synuclein gene duplication and familial Parkinson's disease The Lancet 2004 364 9440 1169 1171 10.1016/s0140-6736(04)17104-3 2-s2.0-4644236043 Lee J. A. Carvalho C. M. B. Lupski J. R. A DNA Replication Mechanism for Generating Nonrecurrent Rearrangements Associated with Genomic Disorders Cell 2007 131 7 1235 1247 2-s2.0-37349109667 10.1016/j.cell.2007.11.037 Redon R. Ishikawa S. Fitch K. R. Feuk L. Perry G. H. Andrews T. D. Fiegler H. Shapero M. H. Carson A. R. Chen W. Cho E. K. Dallaire S. Freeman J. L. González J. R. Gratacòs M. Huang J. Kalaitzopoulos D. Komura D. MacDonald J. R. Marshall C. R. Mei R. Montgomery L. Nishimura K. Okamura K. Shen F. Somerville M. J. Tchinda J. Valsesia A. Woodwark C. Yang F. Zhang J. Zerjal T. Zhang J. Armengol L. Conrad D. F. Estivill X. Tyler-Smith C. Carter N. P. Aburatani H. Lee C. Jones K. W. Scherer S. W. Hurles M. E. Global variation in copy number in the human genome Nature 2006 444 7118 444 454 2-s2.0-33751329250 10.1038/nature05329 Snijders A. M. Nowak N. Segraves R. Blackwood S. Brown N. Conroy J. Hamilton G. Hindle A. K. Huey B. Kimura K. Law S. Myambo K. Palmer J. Ylstra B. Yue J. P. Gray J. W. Jain A. N. Pinkel D. Albertson D. G. Assembly of microarrays for genome-wide measurement of DNA copy number Nature Genetics 2001 29 3 263 264 2-s2.0-0035179871 10.1038/ng754 Fridlyand J. Snijders A. M. Pinkel D. Albertson D. G. Jain A. N. Hidden Markov models approach to the analysis of array CGH data Journal of Multivariate Analysis 2004 90 1 132 153 10.1016/j.jmva.2004.02.008 MR2064939 Yin X.-L. Li J. Detecting copy number variations from array cgh data based on a conditional random field model Journal of Bioinformatics and Computational Biology 2010 8 2 295 314 2-s2.0-77951610981 10.1142/S021972001000480X 20401947