^{1}

^{2}

^{1}

^{2}

A Fast Screen and Shape Recognition (FSSR) algorithm is proposed with complexity down to

The change-point detection has been studied in various fields including environtology [

A class of classical methods is to estimate the number and locations of change-points by fitting criterion, such as AIC [

To reduce the computational complexity, some stepwise approaches are proposed. Since being proposed, LASSO [

Binary segmentation (BS) algorithm is another classical stepwise technique for multiple change-point detection by combining with a CUSUM statistic [

Recently, by using a sliding fixed window approach, Niu and Zhang [

Although the computational complexity of SaRa is down to

In this paper, we make two contributions. First, we show that our FSSR algorithm can make the computational complexity of the algorithm far less than

The paper is organized as follows. Our motivation is described in Section

Our motivation can be shown by Figure

A time series with four change-points.

Because two adjacent subsegments which do not contain a change-point have common mean, the difference between two CUSUM statistics of these two adjacent subsegments should be small. On the other hand, if a small subsegment covers a change-point, the difference between the CUSUM statistics of this small subsegment and the adjacent subsegment will be significant. Then we can identify subsegments with change-points through a suitable threshold. Let

In this section, we give a brief description of the FSSR.

First, for a given positive integer

Second, for each pair of two adjacent subsegments, the local CUSUM statistic is defined as follows.

Third, based on the front screen, there is no change-point in the most subsegments, then we only need to search change-points in each selected subsegment

A flow chart of FSSR algorithm is given in Figure

Flow chart of FSSR algorithm.

The good performance of CUSUM statistic is based on the normal assumption of error. In practice, the data does not necessarily obey a normal distribution. Xiao et al. [

First, QN is used to make the data close to follow a normal distribution at each subsegment. In the procedure of FSSR, we rank the data in each subsegment. Then a sample with the same size as each subsegment from the standard normal distribution

Second, a single-peak recognition is used to enhance the robustness of the local maximizer. In most algorithms (such as BS, WBS, SaRa, and mSaRa), local maximum principle and threshold are used to confirm the change-point. In practice, the choice of threshold is very sensitive and has great influence on the result. From Figure

A sketch for local CUSUM statistic with 8 change-points.

The time complexity in the FSSR is twofold. First, in the scan step, it is only needed to calculate

Many papers show that SaRa and mSaRa are better than those BS-type methods, such as Niu and Zhang [

Before conducting large-scale simulation experiments, we first demonstrate the implementation process and effect of our FSSR algorithm through an example. We consider an example with

Process of FSSR algorithm.

From this example, we can see that our FSSR algorithm can quickly and accurately find the change-points. In order to show more comparisons, we consider the normal error case in Section

Before presenting the detailed comparison, we give the simulation design.

First, the generation of basic data comes from the standard normal distribution and a student

Second, the jump size of change-point

Third, we consider four sample sizes (

In this case, because the error is normal, the QN process is not embed into our algorithm. From Table

Distribution of

| |||||||||||

| |||||||||||

Sample Size | Method | | -2 | -1 | 0 | 1 | 2 | | BIC | Time | |

| |||||||||||

n=500 | N=5 | FSSR | 6 | 20 | 48 | 23 | 3 | 0 | 0 | 0.0662× | 0.0113 |

SaRa | 0 | 1 | 5 | 14 | 22 | 32 | 26 | 0.1495 | 0.0231 | ||

mSaRa | 48 | 30 | 11 | 10 | 0 | 1 | 0 | 0.1644 | 0.3802 | ||

| |||||||||||

n=3000 | N=10 | FSSR | 1 | 5 | 37 | 45 | 10 | 2 | 0 | 0.1445 | 0.0583 |

SaRa | 0 | 0 | 2 | 13 | 7 | 2 | 76 | 0.6343 | 0.1339 | ||

mSaRa | 31 | 8 | 11 | 9 | 12 | 8 | 21 | 0.6626 | 1.3432 | ||

N=15 | FSSR | 20 | 22 | 29 | 24 | 5 | 0 | 0 | 0.2644 | 0.0593 | |

SaRa | 5 | 3 | 9 | 15 | 6 | 8 | 54 | 0.9871 | 0.1101 | ||

mSaRa | 88 | 6 | 2 | 1 | 1 | 0 | 2 | 0.8995 | 1.0129 | ||

| |||||||||||

n=5000 | N=10 | FSSR | 0 | 1 | 19 | 67 | 10 | 3 | 0 | 0.1357 | 0.0794 |

SaRa | 0 | 0 | 1 | 10 | 4 | 2 | 83 | 0.9874 | 0.1869 | ||

mSaRa | 11 | 2 | 5 | 10 | 7 | 5 | 60 | 0.8727 | 2.3090 | ||

N=20 | FSSR | 15 | 22 | 28 | 27 | 6 | 1 | 1 | 0.3125 | 0.1280 | |

SaRa | 7 | 3 | 4 | 24 | 6 | 6 | 50 | 1.5053 | 0.2049 | ||

mSaRa | 86 | 1 | 2 | 2 | 1 | 0 | 8 | 1.4307 | 2.0621 | ||

N=30 | FSSR | 44 | 6 | 9 | 16 | 7 | 9 | 9 | 0.5242 | 0.1317 | |

SaRa | 30 | 8 | 12 | 20 | 10 | 5 | 15 | 3.1588 | 0.1973 | ||

mSaRa | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 1.9524 | 1.4564 | ||

| |||||||||||

n=8000 | N=10 | FSSR | 0 | 0 | 25 | 55 | 17 | 3 | 0 | 0.1759 | 0.0920 |

SaRa | 0 | 0 | 6 | 9 | 5 | 6 | 74 | 1.9034 | 0.3024 | ||

mSaRa | 2 | 1 | 1 | 7 | 5 | 3 | 81 | 1.0359 | 2.3814 | ||

N=20 | FSSR | 13 | 11 | 28 | 37 | 10 | 1 | 0 | 0.3645 | 0.2373 | |

SaRa | 7 | 4 | 3 | 18 | 6 | 4 | 58 | 2.2859 | 0.4487 | ||

mSaRa | 64 | 8 | 7 | 3 | 2 | 0 | 16 | 1.8060 | 3.2446 | ||

N=30 | FSSR | 12 | 4 | 19 | 11 | 12 | 7 | 35 | 0.5457 | 0.2480 | |

SaRa | 17 | 3 | 11 | 32 | 5 | 3 | 29 | 3.9068 | 0.3850 | ||

mSaRa | 95 | 0 | 1 | 0 | 1 | 0 | 3 | 2.7051 | 2.5822 | ||

N=50 | FSSR | 95 | 3 | 1 | 0 | 1 | 0 | 0 | 1.0258 | 0.3051 | |

SaRa | 51 | 12 | 14 | 14 | 2 | 4 | 3 | 6.9394 | 0.3918 | ||

mSaRa | 99 | 1 | 0 | 0 | 0 | 0 | 0 | 3.3326 | 3.0239 |

To investigate the effect of our FSSR on the thick tail errors, we set the errors to obey the

Besides the advantages similar to the normal case, we get some new discoveries in Table

Distribution of

| |||||||||||

| |||||||||||

Sample Size | Method | | -2 | -1 | 0 | 1 | 2 | | BIC | Time | |

| |||||||||||

n=500 | N=5 | FSSR | 23 | 21 | 27 | 20 | 8 | 0 | 1 | 0.3016 | 0.0266 |

SaRa | 9 | 11 | 37 | 21 | 18 | 2 | 2 | 0.4719 | 0.0233 | ||

mSaRa | 39 | 23 | 26 | 11 | 1 | 0 | 0 | 0.4054 | 0.2920 | ||

| |||||||||||

n=3000 | N=10 | FSSR | 18 | 32 | 19 | 24 | 5 | 0 | 2 | 1.6737 | 0.1009 |

SaRa | 37 | 9 | 5 | 12 | 3 | 12 | 22 | 2.7135 | 0.1253 | ||

mSaRa | 47 | 11 | 8 | 8 | 8 | 3 | 15 | 2.0120 | 0.6151 | ||

N=15 | FSSR | 55 | 28 | 12 | 2 | 2 | 1 | 0 | 1.6977 | 0.1240 | |

SaRa | 68 | 3 | 9 | 5 | 2 | 1 | 12 | 3.1348 | 0.1283 | ||

mSaRa | 80 | 6 | 6 | 3 | 4 | 1 | 0 | 2.1958 | 0.6368 | ||

| |||||||||||

n=5000 | N=10 | FSSR | 18 | 19 | 30 | 20 | 11 | 2 | 0 | 2.7748 | 0.1771 |

SaRa | 35 | 8 | 6 | 5 | 5 | 4 | 37 | 4.3186 | 0.2001 | ||

mSaRa | 38 | 4 | 8 | 10 | 6 | 4 | 30 | 3.3002 | 0.8318 | ||

N=20 | FSSR | 66 | 23 | 9 | 7 | 1 | 0 | 0 | 2.7637 | 0.1921 | |

SaRa | 77 | 4 | 4 | 1 | 6 | 2 | 6 | 5.0548 | 0.1948 | ||

mSaRa | 91 | 2 | 4 | 0 | 2 | 0 | 1 | 3.5509 | 0.8705 | ||

N=30 | FSSR | 99 | 1 | 0 | 0 | 0 | 0 | 0 | 2.8936 | 0.2050 | |

SaRa | 90 | 0 | 4 | 1 | 0 | 2 | 3 | 6.3680 | 0.1879 | ||

mSaRa | 99 | 1 | 0 | 0 | 0 | 0 | 0 | 4.2754 | 0.8828 | ||

| |||||||||||

n=8000 | N=10 | FSSR | 10 | 15 | 37 | 27 | 9 | 2 | 0 | 4.4204 | 0.2831 |

SaRa | 17 | 5 | 3 | 5 | 6 | 5 | 59 | 6.8527 | 0.3153 | ||

mSaRa | 15 | 7 | 6 | 4 | 4 | 9 | 55 | 4.9348 | 1.3940 | ||

N=20 | FSSR | 30 | 30 | 28 | 10 | 2 | 0 | 0 | 4.3621 | 0.3262 | |

SaRa | 61 | 5 | 5 | 4 | 4 | 5 | 16 | 8.6200 | 0.3047 | ||

mSaRa | 76 | 4 | 4 | 5 | 2 | 1 | 8 | 5.6735 | 1.2837 | ||

N=30 | FSSR | 93 | 5 | 2 | 0 | 0 | 0 | 0 | 4.4635 | 0.3484 | |

SaRa | 90 | 2 | 4 | 1 | 1 | 0 | 2 | 10.5598 | 0.3003 | ||

mSaRa | 95 | 2 | 0 | 0 | 2 | 0 | 1 | 6.6622 | 1.1897 | ||

N=50 | FSSR | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 4.7080 | 0.3608 | |

SaRa | 98 | 1 | 0 | 1 | 0 | 0 | 0 | 12.0999 | 0.3109 | ||

mSaRa | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 7.3868 | 1.2347 |

Several methods based on change-point (e.g., [

Generally, as a new source of genetic variation, copy number variation (CNV) plays an important role in phenotypic diversity and evolution. Moreover, many studies have shown that CNV is related to the pathogenicity mechanism of some diseases, including cancer, schizophrenia, and so on [

For demonstrating the high efficiency and precision of FSSR, we use the FSSR to analyze the Coriel data set (Download at

A FSSR analysis of the fibroblast cell line on Chromosome 1 of GM13330, Chromosome 7 of GM07081, Chromosome 11 of GM05296, and Chromosome 14 of GM01750.

In this section, we apply the proposed FSSR approach in a real industry application to the electric power system. In the data analysis, the FSSR algorithm can be seen to overperform the SaRa and mSaRa algorithms.

In recent years, the electric distribution network (DN) faces a new challenge to the integration of distributed generations (DGs), after access of distributed scenario energy in the power system. A reasonable and appropriate plan needs to be considered to secure DN for future years. However, in order to save cost, few typical scenarios, which are used to guide in future years, are required to extract from existing massive scenarios. The power load data in the electric power system is typically time series, so the typical scenario reduction can be treated as a problem of detecting change-points.

The real data are collected from the 220kv grade DN of Sichuan province in China. Because the real data can only store for three months in practice, so we intercept data from April 20, 2016, 0:04:00 am, to May 31, 2016, 23:59:00 pm. An observation is recorded every 5 minutes; therefore the sample size is

We apply FSSR, SaRa, and mSaRa algorithms to the time series of the power data on two transformers, respectively. The results of active power and reactive power are presented in Figures

FSSR compared with SaRa and mSaRa applied in power time series data of transformer 1 to extract typical scenarios, with vertical lines corresponding to change-point locations given by the algorithms.

FSSR combined with SaRa and mSaRa applied in power time series data of transformer 2 to extract typical scenarios, with vertical lines corresponding to change-point locations given by the algorithms.

Tables

The performance of FSSR, SaRa, and mSaRa on transformer 1.

BIC | Number of Change-points | Time | |
---|---|---|---|

FSSR | 1.5014 | 44 | 0.3148 |

SaRa | 1.6664 | 138 | 0.3055 |

mSaRa | 1.5255 | 119 | 5.2266 |

The performance of FSSR, SaRa, and mSaRa the transformer 2.

BIC | Number of Change-points | Time | |
---|---|---|---|

FSSR | 7.0256 | 42 | 0.2950 |

SaRa | 7.2773 | 148 | 0.2214 |

mSaRa | 7.7619 | 115 | 7.2700 |

For the multiple change-point detection problems, an optimal method is mainly evaluated with two aspects: the detecting criterion of change-point and the design of algorithm.

For the criterion of detecting change-points, most of the existing methods are based on the maximization criterion of global CUSUM statistic (such as BS and CBS) or local CUSUM statistic (such as SaRa and mSaRa). From Figure

During the algorithm design, a fast and efficient screening process is considered. We can select the approximate subsegments including change-points at very low computational cost.

Finally, the proposed FSSR has a good performance compared to the comparable existing algorithms according to our simulation and practical application results.

The data used to support the findings of Subsection 5.1 are included within the article, and the data used to support the findings of Subsection 5.2 are included within the supplementary information file.

The author declares that there are no conflicts of interest regarding the publication of this paper.

Youbo Liu gives the practical motivation on the change-point detecting and offers the real data of application in electric power system. Moreover, he provides many good suggestions to revise the manuscript.

This research project was supported by the National Natural Science Foundation of China (nos. 11471264, 11401148, and 51437003).

The real data is the power data, which is collected from the integration of distributed generations of Sichuan province in China. The gathering time of the data is from April 20, 2016, 0:04:00 am, to May 31, 2016, 23:59:00 pm. An observation is recorded every 5 minutes, and the sample size is n = 11802. The data contains two parts: active power and reactive power according to the property of the data. Then, we apply FSSR, SaRa, and mSaRa algorithms to the time series of the power data on two transformers, respectively.