Evaluation of Performance of Different Methods in Detecting Abrupt Climate Changes

We compared and evaluated the performance of five methods for detecting abrupt climate changes using a time series with artificially generated abrupt characteristics. Next, we analyzed these methods using annual mean surface air temperature records from the Shenyang meteorological station. Our results show that the moving t-test (MTT), Yamamoto (YAMA), and LePage (LP) methods can correctly and effectively detect abrupt changes in means, trends, and dynamic structure; however, they cannot detect changes in variability. We note that the sample size of the subseries used in these tests can affect their results. When the sample size of the subseries ranges from one-quarter to three-quarters of the jump scale, these methods can effectively detect abrupt changes; they perform best when the sample size is one-half of the jump scale. The Cramer method can detect abrupt changes in the mean and trend of a series but not changes in variability or dynamic structure. Finally, we found that the Mann-Kendall test could not detect any type of abrupt change. We found no difference in the results of any of the methods following removal of the mean, creation of an anomaly series, or normalization. However, detrending and study period selection affected the results of the Cramer and Mann-Kendall methods; in the latter case, they could lead to a completely different result.


Introduction
Climate change includes not only continuous or gradual changes in climate that can be defined by a trend but also discontinuous or abrupt changes in climate [1].Effectively detecting and identifying points of abrupt climate change in climate records are important for understanding climate change and deducing causal relationships as well as predicting future climate change.
Abrupt climate changes can be defined as abrupt shifts in climate from one stable state (or stable and continuous trend) to another stable state (or stable and continuous trend).These shifts are associated with changes in the statistical characteristics of climate variables in time and space.Common types of abrupt climate change include changes to the mean value, variability, seesaw behavior, and transitional change [2].Since the theory of abrupt change was established by Thom in the 1960s [3], it has been widely used in various fields [4][5][6][7].Beginning in the 1990s, many climate regime shifts have been detected in various regions of the world [8][9][10], and studies have continued to focus on characterizing such abrupt changes [11][12][13][14][15]. Abrupt climate changes have received extensive attention because of the abrupt shift identified in the North Pacific Ocean in 1977 [16,17]; this abrupt change could be related to a decadal-scale change in the Pacific Ocean [18].Similar to the shift in the North Pacific Ocean, an abrupt change also occurred in the East Asian summer monsoon at the end of the 1970s [19,20].Many studies have been conducted on abrupt climate changes in China.For example, Fu and Wang [21] analyzed the abrupt climate change in the South Asian summer monsoon in the 1920s and discussed three major abrupt climate shifts in the 20th century from several perspectives.Ding and Zhang [22] investigated trends and change points in temperature and precipitation on the Qinghai-Tibet Plateau and six other regions in China, finding that rapid warming in northeastern China occurred earlier than in other regions and that rapid changes on the Qinghai-Tibet Plateau lagged changes in the region north of the Yangtze River in eastern China.Similarly, Zhao and Xu [23] and Jia et al. [24] analyzed abrupt changes in climate in the region north of Lanzhou along the Yellow River and in the Hexi Corridor.
Since abrupt climate change has become the focus of many studies, several statistical methods have been used to detect abrupt climate changes and discontinuous points in climate data [25][26][27][28][29][30][31].Karl and Riebsame [32] used Student's -test to study abrupt climate change in the United States.Yamamoto et al. [33] analyzed abrupt changes in temperature in Japan using the signal-to-noise ratio.Goossens and Berger [34] used the Mann-Kendall method [35,36] to analyze global warming and abrupt climate changes during the 20th century.Other studies have compared the capability of various statistical methods to detect breakpoints.Easterling developed a new breakpoint detection method and compared it to several common statistical methods [37,38].Rodionov compared the ability of the L method and the R method to detect several discontinuous points, showing that the two methods yielded relatively consistent results when the time series had no linear trend; when a time series had a linear trend, the R method yielded better results [39].Reeves et al. [40] also systematically compared several breakpoint detection methods; they found that the two-phase regression and Sawa's Bayes methods performed best.In China, Zhang [41] also used several methods to detect abrupt climate change.However, these comparative analyses mainly addressed the methods of detecting discontinuous points in meteorological data and focused on detecting abrupt changes in the mean.They did not systematically compare the methods traditionally used to detect abrupt climate changes or assess their ability to identify different types of abrupt climate change.
The climate system is nonlinear and nonstationary and has many components.Abrupt climate change has multidimensional characteristics and generally combines two or more types of abrupt change [2].It is difficult to determine the presence of abrupt changes by simply analyzing measured data, and this limits the ability to compare the strengths and weaknesses of various detection methods.Therefore, in the present study, we constructed artificial time series with different types of abrupt climate changes.We used various methods to detect abrupt changes in these time series and explored their accuracy and effectiveness in identifying different types of abrupt change.We also analyzed the effects of data preprocessing on the performance of these methods.We aimed to comprehensively compare the performance of several methods in detecting abrupt climate changes and to provide a basis for selecting an appropriate method.
The paper is organized as follows.In Section 2, we briefly introduce the abrupt change detection methods and data used in this paper.Section 3 compares five abrupt change detection methods.Section 4 provides a summary and discussion.

Methods and Data
The most common methods for detecting abrupt climate changes are the moving -test (MTT), Cramer's test (CRA), the Yamamoto test (YAMA) [33], LePage's test (LP) [42], and the Mann-Kendall test (MK) [43].The MTT and CRA share similar principles; they both use the -statistic to determine whether an abrupt change has taken place.The MTT detects abrupt change by determining whether there is a significant difference between the average values of two subseries groups.The CRA detects abrupt changes by determining the difference between a subseries and the general series and concludes that abrupt change has taken place if the difference exceeds a certain degree of significance.YAMA detects abrupt change in a series by defining the signal-tonoise ratio.This method has been used to detect abrupt changes in temperature, precipitation, and sunshine duration time series in Japan.The LP and MK methods are both nonparametric statistical methods.The LP method considers two subseries to be two independent entities and determines whether abrupt change has occurred at a control point of the subseries by evaluating whether the two subseries are significantly different.The MK method was initially used to detect the trend of a time series; following improvements, it is now used for detecting abrupt changes in a time series.The strength of the MK method is that it does not require a time series to follow a certain distribution and its use is not limited by abnormal values.All five methods depend to some extent on the choice of subseries and require several tests to improve the reliability of the results.Because the MK method uses an entire time series and does not require the artificial selection of subseries, it is widely applied.
The present study attempts to analyze the effectiveness of the five methods for detecting abrupt climate change and evaluate the capability of each to detect different types of abrupt change.We aim to provide a guide that will help researchers select an appropriate method for detecting abrupt change.For further details about the methods, see Modern Climatological Statistic Diagnostic and Predictive Technologies [44].
The atmosphere is a complex system that is nonstationary and nonlinear.There are two major types of abrupt climate change: the first is characterized by abrupt change in climate statistics but no change in dynamic structure.That is, the climatic elements before and after a change point follow a similar distribution; examples include abrupt changes in the mean value or standard deviation.The second type is abrupt change in the climate state that involves changes in the dynamic state of the climate system; in this type of abrupt change, the climatic elements may not have the same distribution before and after a change point.To assess the capability of each method for detecting abrupt climate changes, we constructed an artificial time series with a time span of 1000 years.We included abrupt changes in dynamics, mean, standard deviation, and tendency (Figure 1).The series includes two distributions: the logistic distribution and normal distribution.Samples  = [1,100] and  = (800, 1000] follow a logistic model distribution [45], whereas samples  = (100, 800] follow a normal distribution.The mean and standard deviation are M = 0 and SD = 1 (100-200); M = 2 and SD = 1 (200-300); M = 2 and SD = 4 (300-400); and M = 0 and SD = 1 (400-500).The linear trend of the normal distribution is 0.18/10 years (for samples 500-600) and −0.18/10 years (for samples 600-700).Thus, there are 8 abrupt changes in this time series: two changes in dynamic

Comparison of the Detection Capabilities of the Five Methods
Of the five methods for detecting abrupt change, the moving -test (MTT), Cramer's test (CRA), Yamamoto test (YAMA), and LePage test (LP) use subseries of varying lengths to detect abrupt changes.Therefore, for these four methods, we report results for subseries with lengths of 50 and 100; we separately analyze the effects of subseries length on the ability to detect abrupt changes.The MTT, YAMA, CRA, and LP tests detect abrupt changes by constructing certain statistics.When a statistic passes a significance test, an abrupt change is considered to have occurred at a given point.Figure 2 shows the results of abrupt change detection in the constructed series using the MTT, YAMA, CRA, and LP methods for subseries with a length of  = 50 and  = 100.We note that when  = 50, the MTT, YAMA, and LP methods yield highly consistent results.Except for the abrupt change in standard deviation at  = 300, they detect all the other points of abrupt change, identifying the change in dynamic structure at  = 100 and 800; the change in trend at  = 500, 600, and 700; and the change in mean at  = 200 and 400.However, none of the three methods detects the change in standard deviation, and their ability to detect a simultaneous change in mean and standard deviation (at  = 400) is reduced.Compared with the MTT, YAMA, and LP methods, the CRA method yields poorer results.The CRA method cannot detect changes in dynamic structure ( = 100 and 800); although it detects the change in trend at  = 600 and 700, it fails to detect the change at  = 500.As for the abrupt changes in mean value at  = 200 and 400, the CRA method does not detect the points in the correct place.Like the other three methods, the CRA method fails to detect the change in standard deviation.For  = 100, due to the shortening of the effective period of detection, neither the MTT, the YAMA, nor the LP method detects the abrupt change at  = 100.However, all three methods present a high value at this point, consistent with the results for a subseries length of  = 50.The three methods can detect the abrupt changes in mean and trend but detect the abrupt changes at  = 700 and 800 with lower accuracy.The CRA method can accurately detect the abrupt changes in mean at  = 200 and 400 and in trend at  = 500 and  = 700; however, it does not detect the changes in dynamic structure.Figure 3 shows the abrupt changes in the artificially constructed time series detected using the MK method.In the MK method, if the value of UF or UB is greater than 0, the series has an increasing trend; if the value is smaller than 0, the series has a decreasing trend.When these parameters exceed critical values, a significant increase or decrease is indicated, and the amount by which they exceed the critical value indicates the time over which the abrupt change occurred.If the UF and UB curves intersect and the point of intersection is within the critical line, the time at which they intersect is the time when an abrupt change began [44].Using these criteria demonstrates that the MK test does not perform well in detecting abrupt changes.UF and UB do not intersect within the confidence interval, and the MK test fails to detect any abrupt changes in the artificially constructed time series.However, because the period under study has a large effect on the MK test's results, when different periods are selected for study, the MK test does detect some abrupt changes.We discuss the effects of the study period on the results of the MK test in a later section.
In summary, of the five methods used to detect abrupt change, the MTT, YAMA, and LP methods are more effective, with more consistent and accurate results.All three methods can detect abrupt changes in the mean, dynamic structure, and trend, though they cannot detect abrupt changes in standard deviation.The CRA method cannot detect abrupt changes in dynamic structure or standard deviation; it can identify changes in the mean and trend, though it has lower accuracy when detecting change in the mean.Of the five methods, the MK test is least effective, failing to detect any abrupt changes in the artificial series.

Effect of Subseries Length on the Detection of Abrupt
Changes.In the previous section, we showed that the MTT, YAMA, and LP methods are sensitive to the choice of subseries length.To further elucidate how to select an appropriate subseries length when testing for abrupt changes, we use the YAMA method as an example in this section.In the same artificially constructed time series with a length of 1000, one abrupt change in mean value takes place every 100 years, as shown in Figure 4. Therefore, there are 9 abrupt changes in the mean of the series, at  = 100, 200, 300, 400, 500, 600, 700, 800, and 900.
Figure 5 shows the results of using 9 different subseries lengths.We note that, for a shorter subseries length, there are more peaks in the statistics of the signal-to-noise ratio given by the YAMA method.In contrast, for a longer subseries length, the curve is smoother and there are fewer peak values.When the subseries length is  = 10 (1/10 of the dimension of abrupt change), the YAMA method can only detect abrupt changes at  = 800 and 900; when the subseries length increases to 15, the method also detects the changes at  = 100 and 300 and the peak values at 200, 400, and 500 begin to emerge, though they do not pass the significance test.When  = 25 (1/4 of the dimension of abrupt change), all the abrupt changes are detected except those at  = 400, 600, and 700 and the testing efficiency begins to rise.When the subseries length is 50 (1/2 of the dimension of abrupt change), the YAMA method can accurately detect all the change points and has a testing efficiency of 100%.When  = 75 (3/4 of the dimension of abrupt change), the YAMA method can still detect all the change points but its recognition is poorer than at  = 50; in particular, the abrupt change at  = 600 would be easily missed if this point was not known to be a change point in the artificially constructed time series.At  = 85, the testing curve is smoother; the peak value at  = 600 disappears, and the peak values at  = 200, 300, 500, and 700 are not obvious, and the difficulty of identifying change points increases; when the subseries length is 100, the effective testing period is shortened and the abrupt changes at 100 and 900 are no longer within the scope of the test.Except for an obvious peak value at  = 800, the peak values of other change points are not clear.When  = 125 (exceeding the dimension of abrupt change), the peak values of the testing curve begin to drift; the peak values at 200 and 300 appear at approximately  = 275, and the peak values at 400 and 500 appear at approximately  = 430.We conclude that subseries length has a significant effect on the testing efficiency and the ability to determine where abrupt changes occur.When the subseries is 1/2 of the dimension of abrupt change, the testing efficiency of the YAMA method is maximized and the difficulty in identifying abrupt changes is minimized.For a subseries length ranging from 1/4 to 3/4 of the dimension of abrupt change, the testing efficiency is lower but points of abrupt change can still be accurately determined.When the subseries length is less than 1/4 or greater than 3/4 of the dimension of abrupt change, the detection efficiency is reduced.In particular, when the subseries length is greater than the dimension of abrupt change, the effective test period is shortened and the probability of misrecognition is increased.Therefore, the subseries length should be from 1/4 to 3/4 of the dimension of abrupt changes in the series, with the best length being 1/2 of the dimension of abrupt change.

Effects of Data Preprocessing on the Detection of Abrupt
Changes.In the above section, we analyzed the effectiveness and reliability of five methods for detecting various types of abrupt changes in an artificially constructed time series.To further understand the effects of data preprocessing (detrending, normalization, and selection of the study period) on the results of the five methods, we analyze measured meteorological data from the Shenyang station.The previous section emphasizes the effect of subseries length on the MTT, CRA, YAMA, and LP tests.A longer subseries resulted in a shorter effective testing period.These results imply that datasets must be of a sufficient length to reliably detect abrupt changes [39].Because the Shenyang station has a longer record compared to the artificially constructed time series, it is easy to use these data to analyze the effects of detrending and normalization on the detection of change points.Therefore, the Shenyang station has been used as an example for analysis.Figure 6 shows the original series of average temperature in Shenyang from 1906 to 2013, an anomalous series (compared to the period 1961-1990), a normalized series (compared with the period 1961-1990), and a detrended series.

Effects of Detrending on Different Methods of Detecting
Abrupt Changes. Figure 7 shows abrupt changes detected in the original and the detrended Shenyang series by the five different methods.As in the above analysis of abrupt change points, the subseries length affects the results of the MTT, CRA, YAMA, and LP methods; here, we define a subseries length as a period of 10 years.
The results demonstrate that, for the original series, the MTT, CRA, YAMA, and LP methods detected abrupt changes in temperature in 1917, whereas the MK method did not identify a change point at this time.In addition to the obvious abrupt change in 1917, the CRA method detected a change point between 2004 and 2007.Because of the selection of subseries, the period 2004-2013 is missing for the MTT, YAMA, and LP methods, and therefore, they do not identify an abrupt change during this period.We note that although the MK method tests the entire time period, it does not detect an abrupt change during this period either.Therefore, further study is required to determine whether an abrupt change occurred approximately in 2005.In the detrended series, the MTT method can still detect the abrupt change in temperature near 1917; whereas the YAMA and LP methods identify a peak near 1917, the peak value does not pass the significance test.Detrending has a greater effect on the CRA method than on the MTT, YAMA, and LP methods.Throughout the test period, the CRA method does not detect any change points.Finally, whereas the MK method does not detect any change points in the original series, it does detect a change point near 1915 in the detrended series; this change point is earlier than the points detected by the other methods.The UF and UB parameters also intersect at approximately 1908 and 2010, but the period before and after these intersections is too short to define an abrupt change.We note that the MTT, CRA, YAMA, and LP methods all detect an abrupt change in temperature in approximately 1917.However, this change point may result from climate variation or the relocation of the station, instrument replacement, or a change in the time of observation.The purpose of this study was not to differentiate between human versus natural causes of abrupt change but, rather, to compare the ability of five different methods to detect abrupt changes and to assess the effects of detrending.

Effects of Normalization on Different Methods of Detecting Abrupt Changes.
To analyze the effects of removing the mean and normalization on the detection of abrupt changes, we examined the characteristics of abrupt change resulting from testing the original Shenyang temperature series, an anomalous series (with respect to the period 1961-1990), and a normalized series (see Figure 8).The results are similar for all three series, suggesting that normalization of the series does not affect the detection of abrupt changes.

Effects of the Study Period on Different Methods of
Detecting Abrupt Changes.We selected two study periods, 1906-2013 and 1961-2013, to assess the effect of changing the study period on the detection of abrupt changes in the Shenyang time series.Figure 9 shows that the MTT, YAMA, and LP tests are not sensitive to the choice of study period, giving similar statistical results for both periods.The CRA method yields similar curves for the different time periods, but the statistical results vary.For the period 1906-2013, the CRA method detects abrupt changes in 1917 and 2004, but for the period 1961-2013, it identifies abrupt changes in 1972 and 2004.The choice of study period strongly affects the results of the MK method.For the period 1906-2013, the MK method fails to detect any change points; for the period 1961-2013, the MK method detects a change point in approximately 1977.This change point is not identified by the other methods.
To evaluate whether this study period effect always applies to the results of the MK test, we applied the MK method to the artificially constructed series shown in Figure 1 and selected 8 different study periods for analysis.The results (Figure 10) show that different study periods indeed exert a strong effect on the results of the MK test.For a study period of 1-200, the MK test identifies the abrupt change in dynamic structure at  = 100; for study periods of 1-300 and 1-400, the MK test detects the abrupt change in mean at  = 200; for study periods of 1-600 and 1-700, the MK test detects the abrupt change in trend at  = 600, though with low accuracy;  for study periods of 1-800 and 1-900, the method detects the abrupt changes in trend at  = 600 and  = 500.We conclude that selection of the study period has a strong effect on the results of the MK test.Different study periods can lead to completely different results, a conclusion consistent with previous studies [46].

Summary and Discussion
In this study, we constructed an artificial time series with different abrupt change characteristics.We used this series to assess the strengths and weaknesses of five common methods for detecting abrupt climate change: the moving -test, Yamamoto, LePage, Cramer's, and Mann-Kendall methods.We also used a temperature series from the Shenyang station to analyze the effects of data preprocessing on the five abrupt change detection methods.Our results show the following: (1) The moving -test (MTT), Yamamoto (YAMA), and LePage (LP) methods are more effective and more accurate; the Cramer (CRA) and Mann-Kendall (MK) methods are less effective and accurate.The MTT, YAMA, and LP methods can accurately detect abrupt changes in mean, trend, and dynamic structure in a time series, though they cannot detect abrupt changes in standard deviation.For these three methods, the results are more sensitive to the method used than to the subseries length.The CRA method cannot detect abrupt changes in dynamic structure or standard deviation but performs relatively well in identifying abrupt changes in mean and trend.The MK method is very sensitive to the selection of study period, and therefore, the method is relatively ineffective and its results are inconsistent.Compared to the other four methods (MTT, CRA, YAMA, and LP), the MK method performs the worst in detecting abrupt changes when there are two or more change points in a time series.This poor performance may occur because the MK method is based on two series of ranked values that are sorted in forward and reverse directions.When a series includes two or more change points, the ranked series may be affected by the change points, causing the method to fail.
(2) The chosen subseries length affects the effectiveness of the MTT, YAMA, and LP methods; longer subseries   lengths imply shorter effective testing periods.When the subseries length ranges between 1/4 and 3/4 of the dimension of abrupt change in a series, the methods can effectively detect the change points subseries that are too long or too short which will increase the probability of omission.Therefore, the subseries length should be within 1/4-3/4 of the dimension of abrupt change in the series; ideally, 1/2 of the dimension of abrupt change should be chosen.
(3) Different data preprocessing methods have a relatively minor influence on the results of the MTT, YAMA, and LP methods.Detrending reduces the statistical values of the CRA method and may give rise to completely different results for the MK method.
Removal of the mean and normalization had no effect on the results of any of the five methods.The choice of study period had no effect on the MTT, YAMA, and LP methods but caused inconsistent results for the CRA method.Because the MK method strongly depends on the selected study period, different study periods led to completely different results.Relocation of the Shenyang meteorological station may have caused some of the observed change points in the time series.The station was relocated in 1970, 1976, 1989, and 2006.However, the MTT, CRA, YAMA and LP methods indicate that an abrupt change occurred in 1917, suggesting that this change point is not related to station relocation.The goal of this study was to compare five methods of detecting abrupt climate changes, and we therefore do not focus on the causes of the change points detected in our analyses.These causes will be discussed in a future study.
In this study, we compared the effectiveness of five different abrupt change detection methods in an attempt to guide researchers in selecting an appropriate abrupt change detection method.We note, though, that our results are preliminary and largely based on an artificially constructed time series.Because the climate system is complex and nonlinear, it may include a variety of abrupt changes on different timescales.Therefore, accurately determining abrupt change points in climatic data series requires the use of historical data and climate information.Many new methods of detecting abrupt change have been developed in recent years [47][48][49].He et al. [50] used approximate atrophy and detrending fluctuations to analyze various types of abrupt changes in dynamic structure.Feng et al. [51] used the heuristic segmentation algorithm to detect abrupt climate change.These studies achieved relatively good results.The abrupt climate change detection methods analyzed in the present study are all traditional statistical methods.In a future study, we will compare these traditional methods with newer methods to provide a more reliable theoretical basis for the study of abrupt climate change.

Figure 1 :
Figure 1: Time series with different abrupt characteristics generated artificially.

Figure 3 :
Figure 3: Abrupt changes detected in the artificial time series by the Mann-Kendall test.

Figure 4 :
Figure 4: Artificially generated time series with abrupt changes every 100 time steps.

Figure 5 :
Figure 5: Abrupt changes in the artificial time series detected using different subseries lengths.

Figure 6 :
Figure 6: Time series of temperature data from the Shenyang station from 1906 to 2013: (a) original series, (b) anomalous series, (c) normalized series, and (d) detrended series.

Figure 7 :
Figure 7: Effects of detrending on abrupt changes detected by five methods: (a) MTT, (b) CRA, (c) YAMA, (d) LP, (e) MK for the original series, and (f) MK for the detrended series.

Figure 8 :
Figure 8: Effects of removing the mean and normalization on five detection methods: (a) MTT, (b) CRA, (c) YAMA, (d) LP, and (e) MK for the original series and (f) MK for the normalized series.