Set Pair Analysis Based on Phase Space Reconstruction Model and Its Application in Forecasting Extreme Temperature

In order to improve the precision of forecasting a time series, set pair analysis based on phase space reconstruction (SPA-PSR)model is established. In the newmodel, by using chaos analysis, we reconstruct the phase space with delay time and embedding dimension. Based on it, we rebuilt history sets and current sets in the SPA-PSR model. Two cases of forecasting extreme temperature in Mount Wutai andDatong are taken to examine the performance of SPA-PSRmodel.The results indicate that themean relative error (MRE) of SPA-PSRmodel has decreased by 65.97%, 59.32%, and 7.79% in the case ofMountWutai and 29.11%, 32.82%, and 9.03% in the case of Datong, respectively, compared with autoregression (AR) model, rank set pair analysis (R-SPA) model, and Back-Propagation (BP) neural network model. It gives a theoretical support for set pair analysis and improves precision of numerical forecasting.


Introduction
The global and regional climates have already begun changing [1], and as the important factor of climate change, temperature plays a significant role in human's daily life [2,3].It is important to forecast extreme temperature accurately [4].Temperature change process is usually nonlinear, complex, and dynamic, so the accurate prediction of extreme temperature is faced with a high degree of scientific uncertainty, which traditional deterministic mathematical model cannot solve perfectly.And numerical simulation method can solve the problem better [4][5][6].
Auto regression (AR) model is the traditional method used to deal with the time series forecasting [7]; for example, Bańbura et al. used Bayesian vector autoregressions for commercial forecasting [8].In recent years, artificial neural network (ANN) algorithms are widely used to deal with forecasting meteorological objects [9].Based on the genetic algorithm (GA) and particle swarm algorithm, Yang designed the Back-Propagation (BP) neural networks to establish the multifactor time series forecasting model [10].At the same time, set pair analysis (SPA) model which is easy to operate and gives good prediction results is also popular in meteorological forecast field [5,6].Yang et al. gave the set pair analysis based on similarity forecast (SPA-SF) model for forecasting water resources changing process, and the application results showed that the statistic and physical concepts of SPA-SF were distinct and its precision was high [9].Recently, Mei et al. used SPA to find an optimal choice of Bioretention media [11] and Guo et al. employed it to assess the ecoenvironment quality for uncertain problems [12].Because SPA model does not provide a unified standard of quantifying the set element symbols, the rank set pair analysis (R-SPA) model is presented by using the rank set pair analysis [5,9,10], and the results are proved to be better.
However, for SPA (including R-SPA) [5,9], there is no accurate method to determine the dimension of history sets and currant sets, which is used to calculate the connection degree and affects prediction results.And by finding the most similar history set, SPA model uses subsequent value as the forecasting value of current set, while it does not give a satisfying theoretical proof.Phase space reconstruction, based on chaos time series analysis, is the newest development to deal with nonlinear time series [13][14][15][16][17].It has an excellent description of the system's dynamic behavior by applying the nonlinear dynamics theory and fractal theory.In particular chaos systems are dynamical systems that defy synchronization.They are ubiquitous in nature, and most of them do not have an explicit dynamical equation and can be only understood through the available time series [18].Under some circumstances, such processes can create time series that appear to be completely randomthe corollary of this is that some seemingly random series are in fact chaotic and thus to a certain extent predictable [14].In 2012, Khatibi et al. studied the chaos in river time series [16] and Di et al. discussed the chaos control and synchronization of a nonlinear system [17].Based on multiple criteria decision making (MCDM), Yang et al. presented using the chaotic Bayesian method for forecasting nonlinear hydrological time series [19].Compared with the results of the add-weighted one-rank local-region method (AOLM), the method can improve the forecast accuracy of daily runoffs.And by analyzing the chaos of time series, She and Yang also used the new adaptive local linear prediction in hydrological time [20].
As discussed above, to have a more accurate numerical forecasting of extreme temperature, by using chaos time series analysis, a set pair analysis based on phase space reconstruction (SPA-PSR) model is proposed in the paper: we use the Takens embedding theory [21] to embed the rebuilt history sets and current sets in SPA so that the parameter (dimension of history sets) can be calculated and both kinds of sets have theoretical meanings for prediction.Two cases of forecasting extreme temperature in Mount Wutai and Datong stations are taken to examine the efficiency of SPA-PSR model.

The Method of Phase Space Reconstruction (PSR)
For a scalar time series,  1 ,  2 , . . .,   , according to Takens embedding theory [21], the phase space can be reconstructed to a multidimensional space and the coordinate delay method is commonly used.The constructed  dimension state vector   is where  is the length of time series,  is the delay time and  is the embedding dimension  = 1, 2, . . ., ,  is the total points number of the phase-space, and  =  − ( − 1).

The Determination of Delay Time.
In this study, we use the autocorrelation function to determine the delay time  [22][23][24].As for time series  1 ,  2 , . . .,   , the autocorrelation coefficient function with the lag time  is [20,25,26] where  and  are the mean and standard variation of the time series respectively.Drawing the plot of − (), the delay time  is selected when the autocorrelation coefficient has dropped to (1 − 1/) [22] of its initial value ( is the base of natural logarithm).
In order to have a good predicting result, we choose the minimum prediction error method [27] to assure embedding dimension.
For a scalar time series  1 ,  2 , . . .,   , reconstruct it with (1).According to Takens embedding theory, when  and  are the best delay time and best embedding dimension, respectively, there is a mapping that existed:  :   →   , such that At this time, the average prediction error (also can be the maximum prediction error, or other prediction errors): should be the minimum.In formula (4), () is assured by finding the nearest neighbor spot  () of   , and the way is shown as follows: where According to the existence of the largest Lyapunov exponents and noise, with embedding dimension  increasing, the value of (, ) will be heavily influenced.So we make  increase from two, taking the first minimum point in (, ) −  curve as the best embedding dimension [27].
The PSR algorithm is shown as follows.
(4) Find the first minimum value of (, ), and the corresponding value of  is the embedding dimension.

Verification of the Chaos of Time
Series.In order to verify whether time series is chaotic, we calculate the largest Lyapunov exponent of our studied time series by using the delay time and embedding dimension above [22].The largest Lyapunov exponent is defined by the value of its nearest neighbor divergence rate on average where   () represents the distance between th point and its nearest neighbor after  time units, and where | − | >  and   is the initial distance.So the slope  can represent the largest Lyapunov exponent, which can be calculated by using the least square method.
If the largest Lyapunov exponent  is greater than 0, the time series is chaotic.If not, the Takens embedding theory is not tenable, and the set pair analysis based on phase space reconstruction model (SPA-PSR) model could not be used.

Set Pair Analysis Based on Phase Space Reconstruction (SPA-PSR) Model
By analyzing the chaotic time series, the reconstructed phase space is applied to set pair analysis.The set pair analysis based on phase space reconstruction (SPA-PSR) model is shown as follows.
Step 1 (the phase space reconstruction).According to the methods mentioned in Section 2, the reconstructed phase space is established, and the corresponding state vector   is according to Takens embedding theory: when  and  are the best delay time and best embedding dimension, respectively; another corresponding smooth function existed,  1 :   → : where  1 represents the state transformation of studied time series in  dimension space.
Step 2 (rebuilt set pair (  , )).According to the similarity for development theory, the SPA model uses the history sets' subsequent value to predict future values.So based on the phase space reconstruction with delay time  and embedding dimension  in Step 1, we can obtain the dimension  and rebuilt history sets and the current set: the dimension  = , the history set   is   = (  ,  + , . . .,  +(−1) ),  = 1, 2, . . ., current set  = (  ,  + , . . .,  +(−1) ), and all are shown in Table 1.
Then, we got the rank transformation of the rebuilt sets  1 ,  2 , . . .,  −1 .If some elements have the same rank, we mark them according to their average rank and round off the value.We could obtain the rank set   1 ,   2 , . . .,   −1 .
By combining the set   with  − 1 set    separately, we get the rank set pair (   ,   ).The sets (   ,   ) are the rebuilt set pair in constructed phase space.Simply, we make (  , ) represent (   ,   ).
Step 3 (the calculation of connection degree).According to the connection degree formula of   ,  [5,9], where  is the number of elements in set   or ,  represents the number of identical elements,  represents the number of contrary elements,  represents the number of discrepant elements,  and  represent discrepancy degree and contrary degree, respectively.Use (  ,   ) = |  −   |, (  ,   ) ∈ (  , ) to describe differences between sets (  , ), and the codomain of (  ,   ) is [0,  − 1].So the    − is calculated as follows.
According the three principle above, take  = ; using formula (12), the    − can be obtained.
Step 4 (the determination of similar set and prediction).According to the connection degree maximum principle [5,9], some   similar to  are chosen from all the history sets.Thus, the forecast value of  +(−1) , namely, the  +1 , is x+1 as follows: where   denotes the ratio of mean of elements in  to mean of elements in   .

Case 1
(1) Study Area and Data Description.The SPA-PSR model is applied to the highest temperature prediction of Mount Wutai in July 1956∼2010, which is relatively more accurate in measurement, easy to get, and useful in practice [32,33].We use the temperatures in 1956-1999 as known data and thus obtain the time series  1 ,  2 , . . .,  44 .Then we will forecast the highest temperature of Mount Wutai in July 2000-2010.
(2) The Phase Space Reconstruction.For the scalar time series of the highest temperature prediction of Mount Wutai in July  1 ,  2 , . . .,  44 , we use the autocorrelation method and minimum prediction error method to calculate Mount Wutai highest temperature in July's corresponding delay time and embedding dimension, which is shown in Figure 1.In Figure 1, we calculate the autocorrelation of the highest temperature in July of Mount Wutai.When the autocorrelation coefficient has dropped to (1 − 1/) of its initial value, the corresponding  < 1, it indicates that the best delay time is less than 1.However, in reality, the highest temperature in July appears in every other year, which means that  ≥ 1; thus, we take the nearest time to be the delay time, which is  = 1.
From Figure 2, it is obvious that when  = 1,  = 3, the prediction error function (, ) attains its first minimum point, which indicates that  = 3 is the best embedding dimension for the reconstruction of phase space.Taking  = 2, with the largest Lyapunov exponent  = 0.00086 > 0,

Sets
Elements in   ,  Subsequent value which indicates that the time series of Mount Wutai is chaotic and satisfies the Takens embedding series.
In sum, with the delay time  = 1 and embedding dimension  = 3, the phase space of the highest temperature in July of Mount Wutai is reconstructed, and the corresponding state vector   = (  ,  +1 ,  +2 ).
(3) Set Pair Analysis Based on Phase Space Reconstruction (SPA-PSR).Based on the reconstruction of phase space in formula (1), the number of reconstructed spots  =  − ( − 1) ⋅  = 44 − 2 = 42 and the rebuilt set pairs of (  , ) of the highest temperature of Mount Wutai in July time series  1 ,  2 , . . .,  44 is shown as follows:  1.Moreover, to verify the prediction effect of SPA-PSR model, we take R-SPA model, AR model, and BP model to make comparisons.To be the same, in AR model [7], it uses the temperatures in 1946∼1999 in every 6 years to obtain the regression linear equation of Mount Wutai, and by using the equation, we can predict the corresponding data in 2000∼ 2010; in BP model [10], the temperatures in 1956∼1999 are used as trained sets, and the data in 2000∼2010 are tested sets, with 500 trained times and the study rate 0.3; in R-SPA model [9], it uses the pervious 6 years as the history sets to predict the temperatures in 2000∼2010.
Compared with the results of AR model, R-SPA model, and BP model, SPA-PSR model has the smallest deviation (RE) with the measured value in those 11 years in as shown in Table 2. AR model and R-SPA model does not have the prediction in which relative errors are below 10%, while in R-SPA model and SPA-PSR model are 5 and 8, respectively.According to those points, SPA-PSR model has a better forecasting result.
To make it clear, the precision of prediction is evaluated by two measurement indices in this paper, namely, the mean relative error (MRE) and Mean absolute error (MAE).The results are calculated in Table 3.
In Table 3, we can find that AR model and BP model do not have good forecasting results of Mount Wutai, for that their MREs are all above 20%, and MAEs are above 50, while for R-SPA and SPA-PSR, the prediction error is reduced a lot.Compared with AR model, BP model, and R-SPA model, the MRE of SPA-PSR model is relatively decreased by 65.97%, 59.32%, and 7.79%, respectively, and the MAE is also the lowest, which relatively decreased by 65.71%, 59.32%, and 6.42%.The two measurement indices both indicate that SPA-PSR model has the best forecasting results.

Case 2
(1) Study Area and Data Description.We also apply SPA-PSR model to predict results of the highest temperature in Datong in July 2000∼2010, known data in 1955-2010.
(2) The Phase Space Reconstruction.For the scalar time series of the highest temperature prediction of Mount Wutai in July  1 ,  2 , . . .,  45 , the same methods are used to reconstruct phase space.The corresponding delay time and embedding dimension are shown in Figures 3 and 4.
From Figure 3, we find that similar to Mount Wutai, the Datong corresponding autocorrelation coefficient sharpen decreased from  = 0; when it has dropped to (1 − 1/) of  its initial value, the corresponding  < 1, in the same way, we take  = 1 as the delay time.In Figure 4, when  = 1 and  = 5, the prediction error function (, ) attains its first minimum point, so the best embedding dimension for Datong is  = 5.And taking  = 2, the largest Lyapunov exponent  = 0.00024 > 0, which indicates that the Datong's time series is chaotic and satisfies the Takens embedding series.
In sum, with the delay time  = 1 and embedding dimension  = 5, the phase space of the highest temperature in July of Datong is reconstructed, and the corresponding state vector is   = (  ,  +1 ,  +2 ,  +3 ,  +4 ).Being similar to Mount Wutai, the set pairs are rebuilt according to phase space reconstruction and used to forecast the highest temperature of Datong in July 2000-2010.
(3) The Prediction Results and Analysis.Prediction results of the highest temperature (P) in Datong in July 2000∼2010 and the corresponding relative error (RE) are shown in Table 4.The computational methods of the other three models are similar to Case 1.
Compared with the results of AR model, R-SPA model, and BP model, SPA-PSR model has the smallest deviation (RE) with the measured value in those 11 years in general.The numbers of relative errors below 10% in three models are, respectively, 6, 6, 8, and 8.And the majority of forecasting results in SPA-PSR model is below 5%.According to those points, SPA-PSR model is better to be used to forecast.
In Table 5, it indicates that AR model and BP model have better forecasting results for Datong than for Mount Wutai, due to that the corresponding MRE is within 10% and the MAE is also decreased.What is more, the BP model is not better than AR model; the reason is likely to be train function or other factors.At the same time, the two indices all indicate that SPA-PSR model has the best forecasting results.Compared with AR model, BP model, and R-SPA model, the MRE of SPA-PSR model is relatively decreased by 29.11%, 32.82%, and 9.03%, respectively, and the MAE is relatively decreased by 29.93%, 32.56%, and 9.53%.

Conclusions
To obtain the optimum prediction results, set pair analysis based on phase space reconstruction (SPA-PSR) model is proposed, in which the phase space with embedding dimension and the delay time is reconstructed and prediction results are gotten by combing SPA and phase space reconstruct methods.Two cases of forecasting the highest temperature in July of Mount Wutai and Datong stations are studied by using the new model.The main conclusions are shown as follows.
(1) By combining the chaotic time series analysis and SPA analysis in the SPA-PSR model, we have a more clearly depiction of the complexity nonlinear dynamical behavior of original forecasting system of extreme temperature, which is a good foundation to make nonlinear forecasting.
( (3) Due to the widely use of SPA [5,6] and ubiquity of chaotic time series [18], the SPA-PSR model can also be used in predicting other nonlinear time series in the future and it will be further studied.

Figure 1 :
Figure 1: Autocorrelation coefficient in Case 1 (the dash line presents the situation where the autocorrelation function value is (1 − 1/) of its initial value).
Prediction error function E(m, )

Figure 3 :
Figure 3: The autocorrelation coefficient in Case 2 (the dash line presents the situation where the autocorrelation function value is (1 − 1/) of its initial value).

Table 1 :
The rebuilt history sets   and the current set .

Table 2 :
Prediction results and the corresponding relative errors (REs).

Table 3 :
The error analysis in four models.

Table 4 :
Prediction results and the corresponding relative errors (REs).

Table 5 :
The error analysis in four models.
) Cases applications indicate that the SPA-PSR model can have a better result for forecasting the highest temperature in July of Mount Wutai and Datong stations.In two cases, compared with AR model, BP model, and R-SPA model, the MRE of SPA-PSR model is relatively decreased by 65.97%, 59.32%, and 7.79% in the case of Mount Wutai and 29.11%, 32.82%, and 9.03% in the case of Datong, respectively, and the MAE is relatively decreased by 65.71%, 59.32%, and 6.42% in the case of Mount Wutai and 29.93%, 32.56%, and 9.53% in the case of Datong, respectively.The results indicate that SPA-PSR model has the better forecasting values.