Autoregressive Prediction with Rolling Mechanism for Time Series Forecasting with Small Sample Size

Reasonable prediction makes significant practical sense to stochastic and unstable time series analysis with small or limited sample size. Motivated by the rolling idea in grey theory and the practical relevance of very short-term forecasting or 1-step-ahead prediction, a novel autoregressive (AR) prediction approach with rollingmechanism is proposed. In themodeling procedure, a new developed AR equation, which can be used to model nonstationary time series, is constructed in each prediction step. Meanwhile, the data window, for the next step ahead forecasting, rolls on by adding the most recent derived prediction result while deleting the first value of the former used sample data set. This rolling mechanism is an efficient technique for its advantages of improved forecasting accuracy, applicability in the case of limited and unstable data situations, and requirement of little computational effort. The general performance, influence of sample size, nonlinearity dynamic mechanism, and significance of the observed trends, as well as innovation variance, are illustrated and verified with Monte Carlo simulations. The proposed methodology is then applied to several practical data sets, including multiple building settlement sequences and two economic series.


Introduction
Many planning activities require prediction of the behavior of variables, such as economic, financial, traffic, and physical ones [1,2].Makridakis et al. [3] concluded that predictions supported the strategic decisions of organizations, which in turn sustained a practical interest in forecasting methods.To obtain a reasonable prediction, certain laws governing the phenomena must be discovered based on either natural principles or real observations [4].However, seeking available natural principles, in the real world, is extremely difficult either in the physical system or in the generalized system.Thus, forecasting the future system development directly from the past and current datum becomes a feasible means [5].Pinson [6] further pointed out that statistical models based on historical measurements only, though taking advantage of physical expertise at hand, should be preferred for short lead time ahead forecasts.Since time series method can be used to forecast the future based on historical observations [3], it has been widely utilized to model forecasting systems when there is not much information about the generation process of the underlying variable and when other variables provide no clear explanation about the studied variable [7].
In the literature, considerable efforts have been made for time series prediction, and special attention should be given to time series approaches [8,9], regression models [10,11], artificial intelligence method [12], and grey theory [13].Time series methods mainly involve the basic models of autoregressive (AR), moving average (MA), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), and Box-Jenkins.Most analyses are based on the assumption that the probabilistic properties of the underlying system are time-invariant; that is, the focused process is steady.Although this assumption is very useful to construct simple models, it does not seem to be the best strategy in practice.The reason is that systems with time-varying probabilistic properties are common in practical engineering.Although we can construct regression model with a few data points, accurate prediction cannot be achieved from the simplicity of linear model.Therefore, linear methodology is sometimes inadequate for situations where the relationships between the samples are not linear with time, and then artificial intelligence techniques, such as expert system and neural network, have been developed [14].However, abundant prediction rule and practical experience from specific experts and large historical data banks are requisite for precise forecasting.Meanwhile, grey theory constructs a grey differential equation to predict with as few as four data points by accumulated generating operation technique.Though the grey prediction model has been successfully applied in various fields and has demonstrated satisfactory results, its prediction performance still could be improved, because the grey forecasting model is constructed of exponential function.Consequently, it may derive worse prediction precise when more random data sets exist.
Furthermore, recent developments in Bayesian time series analysis have been facilitated by computational methods such as Markov chain Monte Carlo.Huerta and West [15,16] developed a Markov chain Monte Carlo scheme based on the characteristic root structure of AR processes.[17] extended this methodology by allowing for ARMA components and adopting a frequency domain approach.The fruitfully developed applications of predictability methods to physiological time series are also worth noticing.The K-nearest neighbor approach is conceptually simple to pattern recognition problems, where an unknown pattern is classified according to the majority of the class memberships of its K-nearest neighbors in the training set [18,19].Moreover, local prediction, proposed by Farmer and Sidorowich [4], derives forecasting based on a suitable statistic of the next values assigned L previous samples.Porta et al. [20] further established an improved method to allow one to construct a reliable prediction of short biological series.The method is especially suitable when applied to signals that are stationary only during short periods (around 300 samples) or to historical series the length of which cannot be enlarged.

McCoy and Stephens
We place ourselves in a parametric probabilistic forecasting framework under small sample size, for which simple linear models are recommended, such as AR model and grey prediction model, because these simple linear models are frequently found to produce smaller prediction errors than techniques with complicated model forms due to their parsimonious form [21].However, two issues should be noticed.On one hand, linear approaches may output unsatisfactory forecasting accuracy when the focused system illustrates a nonlinear trend.This new arising problem indicates that model structure is instable.According to Clements and Hendry [22], this structural instability has become a key issue, which dominates the forecasting performance.Mass work on model structural change has been conducted [23,24].To settle this problem, similar with the basic idea of K-nearest neighbor and local prediction approaches, many scholars have recommended using only recent data to increase future forecasting accuracy if chaotic data exist.Based on this point of view, grey model GM(1,1) rolling model, called rolling check, was proposed by Wen [25].In this approach, the GM(1,1) model is always built on the latest measurements.That is, on the basis of  (0) (),  (0) ( + 1),  (0) ( + 2), and  (0) ( + 3), the next data point  (0) ( + 4) can be forecasted.
The first datum is always shifted to the second in the following prediction step; that is,  (0) ( + 1),  (0) ( + 2),  (0) ( + 3), and  (0) ( + 4) are used to predict  (0) ( + 5).The same technique called grey prediction with rolling mechanism (GPRM) is established for industrial electricity consumption forecasted by Akay and Atak [26].However, this rolling grey model can only be utilized in one-step prediction.That is, the onestep-ahead value is always obtained by the observed data.On the other hand, an AR model can only be established for time series that satisfies the stationarity condition; that is, a stationary solution to the corresponding AR characteristic equation exists if and only if all roots exceed unity in absolute value (modulus) [27].Consequently, AR models cannot be established for modeling nonstationary time series.
In addition, we should also notice that very shortterm forecasting or 1-step-ahead prediction is of significant importance in various applications.This is clear in economy.Now we take wind power forecasting and heart period given systolic arterial pressure variations prediction as two further examples to elucidate the practical relevance of very shortterm forecasting or 1-step-ahead prediction in different fields of science.Since transmission system operators require to operate reserves optimally for the continuous balance of the power system, very short-term predictions of the power fluctuations at typical time steps of 1, 10, and 30 min are recognized as a current challenge [28].More specifically for the case of Denmark, 10-min lead time has been defined as most important, because power fluctuations at this timescale are those that most seriously affect the balance in the power system [29].The characterization of the relation between spontaneous heart period (HP) and systolic arterial pressure (SAP) variabilities provides important information about one of the short-term neural reflexes essentially contributing to cardiovascular homeostasis, namely, baroreflex [30].The difficulty in predicting HP given SAP variations depends on the forecasting time: the longer the forecasting time is, the more unreliable the prediction of HP given SAP changes is and the larger the information carried by HP given SAP modifications is [31].More specifically, 1-step-ahead predictive information of heart period series given systolic arterial pressure dynamics better correlates with the activity of autonomic nervous system in cardiovascular physiology.
Motivated by the GPRM approach and the practical relevance of very short-term forecasting or 1-step-ahead prediction elucidated above, the first objective of this study is to construct a novel prediction model with the rolling mechanism to improve the forecasting precision.Therefore, the sample data set and model parameters are evolved in each prediction step.The second objective of this study is to develop an autoregressive model that can be used to model nonstationary time series.Consequently, this autoregression is different from the AR model in the time series analysis literature.We also call it autoregression, because the current value of the series is also a linear combination of several most recent past values of itself plus an "innovation" term that incorporates everything new in the series that is not explained by the past values, but it can be used to model nonstationary time series.
The remainder of the paper is organized as follows.We start from short introduction of AR method in Section 2. Section 3 involves the basic idea of ARPRM, model establishment, and parameter estimation.The best unbiased property of prediction is also discussed.In Section 4, simulation studies, including general verification, performance impact of sample size, system nonlinearity dynamic mechanism, and variance, are conducted to assess the validity of the approach, whilst Section 5 contains applications to real data sets, including building settlement sequences and economic series.In Section 6, we close this paper with a discussion and conclusion.

AR Model Introduction
Let  1 ,  2 , . . .,   denote a time series with sample size .Autoregressive models, in the literature, are created with the idea that the present value of the series   can be explained as a function of  past values  −1 ,  −2 , . . .,  − , where  determines the number of steps into the past needed to forecast the current value.The AR() model can be given as [8] where Φ() = 1− 1 −⋅ ⋅ ⋅−    specifies the lag-polynomial with model order  and  0 is a constant relating to series mean.It is well known in the literature that a stationarity condition has to be satisfied for the AR() process; that is, subject to the restriction that   is independent of  −1 ,  −2 , . . .and that  2  > 0, a stationary solution to (1) exists if and only if the root of the AR characteristic equation exceeds 1 in absolute value (modulus).
According to least-squares method, model parameters can be calculated by where mean   ,  = 0, 1, . . .,  can be obtain by and L  = (  ) × is a -order matrix, and L  = ( 1 ,  2 , . . .,   ) T is a -order column vector, whose elements can be determined through Based on the estimated coefficients φ0 , φ1 , φ2 , . . ., φ , the AR() prediction equation can be determined as where x+| denotes the -step-ahead prediction at time  + .

ARPRM Model Construction
where  1 is the 1-step-ahead ARPRM model order that can be determined by Akaike Information criterion rule [32], { 1 } is white noise series of 1-step-ahead ARPRM model, and  1  ( = 0, 1, . . .,  1 ) are autoregressive coefficients of the 1-stepahead ARPRM equation.Then the 1-step-ahead prediction result can be calculated by Then for the 2-step-ahead forecasting, according to the basic idea described in Section 3.1, a new sample  2 ,  3 , . . .,   , x+1| is first constructed, and the 2-step-ahead ARPRM( 2 ) model can be found as where  2 , { 2 }, and  2 ( = 0, 1, . . .,  2 ) are model order, white noise series, and the autoregressive coefficients of the 2-step-ahead ARPRM equation, respectively.Analogically, considering the -step-ahead prediction, one can first form a new sample with general notations  *  ,  * +1 , . . .,  * +−1 according to the rolling mechanism mentioned above, where It can be seen that  *  is an original observation data if  ≤ , while it will be a prediction result of a previous step when  > .Accordingly, for the 1st-step-ahead prediction (i.e.,  = 1), the sample will be  * 1 ,  * 2 , . . .,  *  , that is  1 ,  2 , . . .,   based on (9).For the 2nd-step-ahead prediction (i.e.,  = 2), the sample will be  * 2 ,  * 3 , . . .,  * +1 , that is  2 ,  3 , . . .,   , x+1| based on (9).This is consistent with the samples described above.Then based on this new sample  *  ,  * +1 , . . .,  * +−1 ARPRM (  ) equation can be established as where   , {  }, and   ( = 0, 1, . . .,   ) are model order, white noise series, and autoregressive coefficients of the step-ahead ARPRM model, respectively.Accordingly, we can obtain the -step-ahead prediction result by From the above description, we can see that a new autoregressive prediction model, with its own model order and autoregressive parameters, is constructed in each step.We should note that the existing AR model in time series analysis literature can only model the series that satisfies the stationarity requirement.In this study, however, we only adopt the basic idea of autoregression in each prediction step.This autoregressive model in each forecasting step, can be used to model series that does not satisfy the stationarity condition.It regresses to the traditional AR model when the focused time series is a stationary one.
While aforementioned procedure is considered to add one forecasting result and delete one sample value in each prediction step, the adding and deleting number can be unequal; that is, one can add one and delete more sample values at each prediction step without departing from the spirit of the proposed method.Facts indicate that redefinition for each step can modify the ARPRM model coefficients in each prediction step according to the metabolic sample, and the prediction accuracy can consequently be effectively improved.

Parameter Estimation.
First, model order   for the step-ahead ARPRM (  ) (11) can be determined by Akaike information criterion rule [32].When   increases from one, the calculated result should enable the flowing formula to achieve its minimum.
Then, according to least-squares method, the autoregressive coefficient  0 can be calculated by where mean   ( = 0, 1, . . .,   ) can be obtain by where  *  is derived by (9).The autoregressive coefficients   ,  = 1, . . .,   , can be derived by where (L  )   ×  = (  )   ×  is a   -order matrix and (L  )   ×1 = ( 1 ,  2 , . . .,    ) T is a   -order column vector.The elements can be determined through In addition, the estimator of  2  = Var(  ) can be obtained by Then, the -step-ahead prediction value x+| can be determined by And its mean square error  2 +| can be derived by where   = 0 when  >   and 3.3.Best Unbiased Nature.The prediction result obtained by the proposed ARPRM method shows best unbiased property, which can be deduced from the least-square error forecasting method.

Simulation Study
ARPRM is a general approach for stochastic and unstable time series forecasting.In this section, we focus on the robustness of its performance.The general simulation study is firstly conducted to verify the general reasonableness of the ARPRM prediction method with 4 different models.ARPRM performance while varying sample size and innovation variance is also checked, because these two aspects are both of significant importance for small sample prediction problem.Meanwhile, statistical analysis has been performed to demonstrate the significance of the observed trends.Results show that the trends do affect the performance of the proposed method, and we further found that the trend nonlinearity plays a much more important role in forecasting performance than trend significance.Consequently, we illustrate the nonlinearity dynamics monitoring in this simulation study.In addition, as ARPRM model has to be constructed independently based on the metabolism sample in each prediction step, it can be concluded that this is a nonlinear forecasting with a stepwise linear form.Accordingly, we also conduct a simulation study to monitor the effect of this rolling mechanism, which can also illustrate the ability of the model to follow nonlinear dynamics.

General Verification.
The proposed ARPRM approach is quite general.We consider and study four simulation models to verify the general reasonableness of the ARPRM prediction method.The models assumed in each of the experiments are summarized in Table 1.We also considered an AR() specification with  > 1 to study the forecasting effect of the established model, and the results were very similar to those reported in this paper.Since we focus on the rationality and validity of ARPRM approach in small sample forecasting problems, we consider conducting  = 5 step ahead prediction based on a observational data set with sample size  = 10.Simulation results are analyzed by the index of percent relative error err = |  − x | × 100/  , which is presented in Table 2.The results are based on 50000 Monte Carlo simulations with innovations drawn from an IID Gaussian distribution.We consider the mean percent relative error result of each forecasting step.It can be concluded, from Table 2, that the percent relative error is upward as forecasting step  increasing from 1 to 5. It can be seen that the 1-stepahead prediction is so accurate that the minimum percent relative error is 0.041%.The maximum percent relative error for the 5-step-ahead forecasting is 7.90%, which is still considered to be good and shows a promise for future applications.

Sample Size and Nonlinearity Dynamics Monitoring.
Simulation studies show that the sample size and class of nonlinear mechanism generating dynamics both play an important role in its prediction performance, especially in small sample problems.Consequently, the sample size checking and class of nonlinearity dynamics monitoring are both included in this section.Two simulation models with intuitive different nonlinear mechanism, listed in Table 3, are considered.To make the illustration more distinct, scale of the generated data set is controlled.The reason is that the same absolute prediction error will lead to a larger relative error for data of lower order of magnitude.Let () =  0.3 be a deterministic function (dynamics generation mechanism of Experiment 1 in Table 3 without white noise); then it is obvious that a linear relationship exists between current value () and the most recent past value ( − 1); that is, () =  0.3 ( − 1).Considering Experiment 2, listed in Table 3, the data generating process is much complicated.An AR(1) series   (satisfying the stationarity condition) is firstly generated; then an accumulated sequence   =  −1 +   is derived by the accumulated generating operation process; finally data set   can be obtained by adding a trend √( 0.8 −  −0.3 )/2 to   .Accordingly, it can be seen that the nonlinearity aptitude of Experiment 2 is much higher than that of Experiment 1.The nonlinearity dynamics can be monitored through the prediction performance of these two models.
Furthermore, the impact of sample size is also checked.For comparing, three samples of   , (1)  = 1, 2, . . ., 10, ( 2 4, is also analyzed for clarifying the changes of the statistical properties while varying the sample size.The results are also based on 50000 Monte Carlo simulations with innovations drawn from an IID Gaussian distribution. From the comparative results summarized in Table 4, the following can be seen.( 1) The percent relative error is upward as forecasting step  increasing from 1 to 5 for one experiment model under the same sample size.(2) For either Experiment 1 or Experiment 2, forecasting performance decreases when sample size becomes smaller.The change is distinct, because this is a small sample problem where 10 or less data is used to conduct 5-step-ahead predictions.(3) Comparing percent relative error derived from the two classes of nonlinear mechanism generating dynamics, precision of Experiment 2 is significantly lower than that of Experiment 1.These conclusions are consistent with our general understanding and deduction.

Empirical Applications
In this section, we focus on the practical performance of the proposed ARPRM approach.Our experiments are presented for building settlement prediction illustrated in Section 5.1 and economic forecasting containing two different data sets shown in Section 5.2.

Building Settlement Prediction.
The prediction of future building settlement, especially for high-rising building, is a hot topic in Structural Health Monitoring.However, it is very difficult to develop good mathematical models and thus yield accurate predictions for building settlement, which is caused by the problems of small sample size (settlement observations are relatively few), nonstationary (the statistical properties of the measurement change over time), and nonlinearity (it is difficult to use mathematical prediction models with linear structure) [33].
The application illustrated in this section is the future settlement forecasting for Changchun branch's business building of China Everbright Bank.The building has a steel structure form, with 28 floors above ground, and a base area of 2176 square meters.There are ten observation points marked 1, 2, . . ., 10.Observation settlement series  ()  ( = 1, 2, . . ., 10,  = 1, 2, . . ., 18), corresponding to each observation point 1, 2, . . ., 10, was got [34].See Figure 1 for the data sets.It can be seen that the problems of small sample size, nonlinear, and nonstationary all exist.
The last 3 settlement values are predicted based on the former 15 data.The GM(1,1) model, GM(1,1) rolling model, and AR model with linear interpolation [34]   can be derived.In can be seen that the 32nd value, obtained interpolation  * () 32 = ( () 15 +  () 16 )/2, contains the information of  ()  16 that we want to predict.Actually, this interpolation value  * () 32 cannot be utilized in the forecasting.

Economic Forecasting.
In this section, our empirical study is also devoted to the comparison of our forecasting procedure with GM(1,1) model, GM(1,1) rolling model, and AR model with linear interpolation.The economic data sets include Chinese annual power load data [35] and Chinese annual gross domestic product (GDP) data [36] from 1987 to 2002, seen in Figures 2 and 3.
We predict the latest two data with the former 14 for Chinese annual power load and GDP.The index of relative percentage error RPE = |  − x | × 100/  is used to demonstrate the accuracy.The comparative results for each economic data set are listed in Tables 7 and 8. Table 7 shows that ARPRM derives the best prediction accuracy, and GM(1,1) model and GM(1,1) rolling model also give good forecast results for the Chinese annual power load forecast.Although AR model with linear interpolation provides the worst prediction, its accuracy is still acceptable.A similar conclusion with Section 5.1 can be obtained for the Chinese annual GDP prediction results.
Based on the empirical study discussed above, it can be concluded that prediction accuracy of ARPRM and AR model with linear interpolation method is relatively stable.Precision of GM(1,1) model and GM(1,1) rolling model is considerably good when data set show an exponential trend, such as Chinese annual power load.Otherwise, the prediction accuracy of GM(1,1) model and GM(1,1) rolling model cannot be satisfied when nonexponential data exist.In addition, GM(1,1) rolling approach derives a better prediction precise compared with GM(1,1) model, and this is the superiority of the rolling mechanism.

Conclusions
This study presents a novel approach to settle the small sample time series prediction problem in the presence of unstable.
One should construct an ARPRM model independently based on the metabolism sample in every prediction step.In other words, the model order and parameters are modified in each new forecasting step.This modification is much helpful to improve the prediction accuracy in small sample circumstance.We should note that the autoregressive in this study, which can model nonstationary sequences, is different from the AR method in the literature, which has to satisfy the stationarity requirement.
Simulation study first conducts a general verification, then monitors the performance of the ARPRM method while varying the sample size and nonlinearity dynamic mechanism, and finally checks whether the reasonability depends on the variance of the innovation.We also performed statistical analysis to demonstrate the significance of the observed trends.Results show that factors including sample size, nonlinearity dynamic mechanism, significance of trends, and innovation variance do impacts on the performance of the proposed methodology.Specifically, we found that the nonlinearity plays a much more important role in forecasting performance than trend significance.For the same simulation model, precision will be enhanced while increasing sample size or reducing the innovation variance.Meanwhile, the established approach will illustrate a better property for the dynamic mechanisms that show a stronger linearity.Comparing the results in simulation study in Section 4, it can be seen that innovation variance (the uncertainty or randomness) has a greater effect on the performance.
Empirical applications with building settlement prediction and economic forecasting are also included.The results show that the proposed method outperforms the Grey prediction model GM(1,1) and GM(1,1) rolling model, and AR model with linear interpolation, and also illustrate a promise for future applications in small sample time series prediction.

Table 1 :
Model specifications by experiments for general verification.

Table 2 :
Results of ARPRM forecasting percent relative error for general verification (%).

Table 3 :
Model specifications by experiments for sample size and nonlinearity dynamics monitoring.

Table 4 :
Results of ARPRM forecasting percent relative error for sample size and nonlinearity dynamics monitoring (%).

Table 5 :
Comparative results of ARPRM forecasting percent relative error for different variance (×10 −5 %).  ∼ NID[0, 1] does make a considerably great randomness for the first few sample values, and bigger innovation variances like 2, 5, 10 lead to lower performance because of the significant uncertainty of the sample data  1 ,  2 , . . .,  10 .The objective of adopting Experiment 1 as the simulation model is to illustrate the variance impact in a more intuitive and obvious way.Difference between err indexes will not be that distinct when scale of sample data is enlarged; for example, model   =  0.5 +   is used to generate data set.
The comparative results are listed in Table6, which show that the most precise forecast is given by ARPRM, the following one is obtained by AR model with linear interpolation, and a considerably unreasonable accuracy is got by the GM(1,1) model and GM(1,1) rolling model.One problem worth noticing is that, in the AR model with linear interpolation processing procedure, the former 32 values of the interpolation series are utilized to conduct a 5-step-ahead prediction.And the forecasting result of the last 3 settlement values at  = 16, 17, 18, which correspond to  * = 33, 35, 37 in the interpolated sequence,

Table 7 :
ARPE of the Chinese annual power load prediction results (%).