An Intelligence Optimized Rolling Grey Forecasting Model Fitting to Small Economic Dataset

and Applied Analysis 3 where a is the development coefficient, b is the driving coefficient, and z = (z(2), z(3), . . . , z(n)) is the generated sequence of z(k) = αx(k) + (1 − α)x(k − 1). In the original GM(1,1), α is set to the mean value of adjacent data z(k) = 0.5 ⋅ x(k) + 0.5 ⋅ x(k − 1). In this paper, we proposed a method by using the PSO algorithm to find a more efficient value of α. Step 3. From (5), we can obtain the following equation: x (0) (2) + az (1) (2) = b, x (0) (3) + az (1) (3) = b,


Introduction
Forecasting can be an important issue to many fields of economy; especially its accuracy was ensured to do a reasonable prediction that could change the economic policy of large companies and governments and ensure a more reasonable behavior by the financial actors.The ideal state is that the prediction error tends to be more and more smaller, but in fact, we can only do our best to research and develop the prediction algorithm as much as possible to improve the prediction accuracy.
Many forecasting models have been proposed; in general, these models can be divided into two categories: causal models and time-series models [1].Causal models assume that historical relationship between dependent and independent variables will remain valid in future.Causal models include multiple linear regression analysis and econometric models which assume that independent variables could explain the variations in dependent variable.However, the limitation of causal models is the availability and reliability of independent variables.Time-series models assume that history will repeat itself and its prediction refers to the process by which the future values of a system are forecasted based on the information obtained from the past and current data points.In the literature, two main techniques for time series prediction are statistical and artificial intelligence (soft computing) based approaches.The well-known statistical models proposed include AR (autoregressive), MA (moving average), ARMA (autoregressive moving average), ARIMA (autoregressive integrated moving average), and Box-Jenkins models.The statistical models are too weak to solve the nonlinear problems and too complex to be used in predicting future values of a time series.
The widely used artificial intelligence approaches include neural network (NN) [2][3][4], support vector machines (SVM) [5][6][7][8], fuzzy systems [9], linear regression, Kalman filtering [10], and hidden Markov models (HMM) [3].All of these approaches are used for updating the model parameters.In the recent years, several hybrid models [11][12][13][14] were proposed to improve the forecast accuracy.However, these artificial intelligence based approaches demand a great deal 2 Abstract and Applied Analysis of training data and relatively long training period for robust generalization [11].For those economic predictions, which are very difficult to construct a model by using neither the conventional linear statistical methods nor the artificial neural networks because the economic time series are highly nonlinear, highly irregular, and highly nonstationary [15].
Grey system theory was introduced and developed by Deng back in 1989 to be used for mathematical analysis on the phenomenon of uncertainty and roughness.It requires a small set of training data, which are discrete or incomplete, to construct a model for future forecast.The uncertainty and roughness training data are "grey" data [16].Similarly, "white" data means that the information is completely clear, while "black" indicates that the information is completely unclear.
Grey system theory has been widely and successfully used to forecast all kinds of data in the many areas such as economic, financial, agricultural, and industrial areas and energy.In the past few years, grey system theory has been employed for solving the forecasting economic problems.The model GM(1,1) built from grey system theory has shown that this approach is very efficient to forecast the irregular and nonlinear economic time series data.A combination of residual modification and residual artificial neural network (ANN) sign estimation is proposed to improve the accuracy of the original GM(1,1) model [17][18][19].However, this approach needs long training period.
Rolling mechanism is one of the most effective methods to improve the performance of grey system model and handle noisy data [7,[20][21][22].The authors in [22] used the rolling mechanism to improve the forecast accuracy of grey model for education expenditure.Zhao et al. [23] proposed rolling mechanism to forecast the per capita annual net income of rural households in China and showed that it outperformed other traditional grey prediction models and a differential evolution algorithm proposed to optimize rolling grey prediction model.The authors in [24][25][26] proposed an improved rolling grey model, which can update the model parameters on the coal production forecast and semiconductor industry production forecast, respectively.However, although these improved rolling mechanism based grey models could adapt to various economic time series data because they considered the recent data that can improve forecast accuracy in future prediction, they did not consider the impact of their model parameters which are fixed through the whole prediction period or only considered a simple change of the model parameters for the prediction which could perform well on noiseless sequence, but it could not adapt to the noisy data.
In this paper, we proposed an improved rolling mechanism based grey model optimized by the particle swarm optimization (PSO for short) to improve the forecast accuracy, especially for the highly irregular and noiseless data.PSO, which belongs to swarm intelligence methods, is considered as a tool for modeling behavior and for optimization of difficult numerical solutions, since it was developed by [27] as an evolutionary computing technology.PSO algorithm had been enormously successful on about 700 applications [28].We choose PSO to optimize our model parameters for two significant reasons: its routinely delivering of good optimization results like NN methods and its simplicity to get better results in a faster and cheaper way that NN methods cannot achieve.
This paper examines a rolling mechanism based grey model with PSO optimization on economic data.Section 2 outlines the original grey model GM(1,1) and the improved GM(1,1) model with rolling mechanism.Section 3 presents the rolling mechanism based grey model with PSO optimization.We also propose a PSO based algorithm that searches the best value for the model parameter.Furthermore, we illustrate that our model gets much better performance on three economic dataset: financial intermediation in Beijing, real estate in Beijing, and semiconductor production in Taiwan, compared with other grey system theory based models.Section 5 concludes this paper.

Grey Model Background
The grey system theory mainly focuses on extracting realistic governing laws of the system from the available data of the system generally with white noise data.A grey model in grey system theory is denoted by GM(, ), where  indicates the order of the difference equation and  indicates the number of variables.
GM (1,1) is the original grey model, which has been widely applied to carry on the short-term prediction because of its computational efficiency.It uses a first order differential equation to predict an unknown system.A GM(1,1) algorithm is described below.
Step 1.The original time sequence is initiated by where  (0) () the time series data at time  and  is the length of sequence which must be equal to or larger than 4.
On the basis of the initial sequence  (0) , a new sequence  (1) = ( (1) (1) ,  (1) (2) , . . .,  (1) is set up through the accumulated generating operator (AGO), which is monotonically increasing to weaken the variation tendency defined as Grey system theory is applied to accumulate generation of  (0) to obtain a new sequence  (1) , which has a clear growing tendency.
In the above,  = [,]  is a sequence of coefficient parameters that can be computed by employing the least squares method: where   is the constant vector and  is the accumulated matrix (1) (2) 1 − (1) (3) 1 . . . . . .
Step 4. Substituting  in ( 6) with (7), the solution of the prediction value of  (1) at time  is After performing an inverse accumulated generating operation on (10), the predicted value of , where 2 ≤  ≤ .
GM(1,1) uses the whole data set for prediction.However, the recent data can improve forecast accuracy in future prediction [21].Rolling mechanism, which is a metabolism technique that updates the input data by discarding old data for each loop in grey prediction, can be applied to perform the perfect prediction.The purpose of RM is that, in each rolling step, the data utilized for next forecast is the most recent data.The RM-GM is an efficient technique to increase the forecast accuracy in the case of having noisy data.The data may exhibit different trends or characteristics at different times, so to address these differences, it is preferable to study such noisy data with the RM-GM, and the RM provides a means to guarantee input data are always the most recent values.

PSO Optimized RM-GM Model
Because  directly influences the calculation of  and  in GM(1,1) model and is one of the most important factors that may decide the performance of the models; we present an algorithm based on RM-GM(1,1) combined with PSO which optimizes the parameter  in each rolling period to improve the forecast accuracy.
In basic GM(1,1) model, the value of  is customarily set to the mean value 0.5 for each  (1) () =  (1) () + (1 − ) (1) ( − 1) in the generated sequence  (1) = ( (1) (2),  (1) (3), . . .,  (1) ()).It means that each data has the equal impact on every future predicted data.However, the authors in [29] found that GM(1,1) model often performs very poor and makes delay errors for quick growth sequences because of the mean value on the generated sequence  (1) .Tan proposed a method that set  to ( − 1)/2, where  = (∑  =2 ( (1) ()/ (1) ( − 1))), in order to widen the adaptability of GM(1,1) model to various kinds of time sequences.The authors in [26] found that the RM-GM with variable  value generates better forecasts than with a fixed  value.They determined the  value by the timely percent change.From this study, we can find that for the trend prediction of nonmonotonous functions, the forecast outcomes are much better if the value of  is set appropriately on the grey predicted results.However, Tan's method used the whole data set to calculate a fixed value of .It did not consider the influence of recent data which would improve accuracy.
In an improved RM-GM(1,1) algorithm, the strategy of finding a value of  could be proposed in a variety of ways.The basic RM-GM(1,1) sets the value of  to 0.5, which does not consider any influence of sequence data.Although Tan's strategy could adapt to various sequences, it did not consider the impact of the recent data from the sequence.Chang's strategy only considered the timely percent change for the prediction.It could perform well on regular and noiseless sequence, but it could not acclimatize itself to the noisy data sequence.In this paper, we select PSO as our strategy to find the value of  in each loop in -RM-GM(1,1).We named our PSO-based algorithm as PRGM(1,1).

Characteristics of PSO.
Two significant reasons that make using PSO to calculate the parameter  are its routinely delivering good optimization results and its simplicity.Compared with another commonly used swarm intelligence method, ant colony optimization (ACO), which is not easy to be used to define variables for the given problems, PSO is not only a metaheuristic that makes few or no assumptions about the problem being optimized, but can also search very large spaces of candidate solutions.It does not require that the optimization problem be differentiable.Since the problem of predicting economic data is partially irregular, noisy, and, changing over time, PSO is a better choice to be employed to optimize parameter .Another one of the most significant advantages of PSO algorithm is its relatively simple coding and low computational cost.Compared with other optimization algorithms, like ACO, which requires massive computation, PSO can get better results in a faster and cheaper way [30].Hence, PSO algorithm can even perform well in the applications that need power-aware computing on smart or personal devices that have limited computational, storage, and energy resources in the case of guarantying the prediction accuracy.

3.2.
Calculating  by PSO.The PSO is a population-based optimization technique in which the optimal solution can be found by iteration and the solution quality is evaluated by the fitness.In the PSO, the potential solutions, called particles, fly through the problem space by following the current optimum particles.Each particle keeps track of its coordinates in the problem space which are associated with the best solution (fitness) that it has achieved so far.First, a dimensional space  with  particles is initialized.The particles' position and velocity are randomly initialized.The position of the th particle is represented as   = ( 1 , . . .,   , . . .,   ) and its velocity is represented as V  = (V 1 , . . ., V  , . . ., V  ), where 1 ≤  ≤  and 1 ≤  ≤ .Then the objective function values (forecast errors) of all particles can be computed.Then, the particles are updated iteratively until the termination condition is satisfied.It includes the particles' own speed and location according to the following two formulas for all particles: where, Best  and Best  are determined as the objective function values fitness which should be set according to the actual problem solving.For the prediction, it can be set to the smallest prediction error.Best  and Best  , respectively, represent the individual extreme value of the th particle found by the particle itself at th dimension and the global optimal value which records the best particle among all the particles in the group;  is the pointer of iterations;  1 and  2 are two positive acceleration constants; rand() is the uniform random value in the range [0, 1]; V   is the velocity of a particle  at iteration ; V  min ≤ V   ≤ V  max is the current position of the th particle at iteration ;  (0 ≤  ≤ 1) is the inertia weight determining how much of the particle's previous velocity is preserved.If the current value is better (with smaller forecast accuracy index value), then update the best position and its objective function value of the particle with the current position and corresponding objective function value.Finally, determine the best particle of the whole population based on their best objective function values.If their objectives function value is smaller than the current global optimal objective function value Fitness Best  , then update the best position and objective function value for the entire swarm with the current best particle's position and objective function value.

Parameter Selection.
In -PSO algorithm, the values for the cognitive weight ( 1 ), social weight ( 2 ), and the inertia weight () having to be selected would have an impact on the convergence speed and the ability of the algorithm to find the optimum.However, different values may be better for different problems.Many works have been done to select a combination of values that works well in a wide range of problems.Both theoretical and empirical studies are available to help in selection of proper values [31][32][33][34].
Generally, the individual and sociality weights  1 and  2 are both set to 2. A proper value of inertia weight provides a balance between global and local explorations.A large inertia weight favors global search, while a small inertia weight favors local search [31,35].In practice,  often reduces linearly from about 0.4 ( min ) to 0.9 ( max ).The authors in [31] suggested that utilizing LDW (linear decreasing weight) policy which improved a lot compared with optimization of the benchmark equation algorithm, but not the most common and suitable for the reason that demanding the searching process is linear.It is suggested that for each iteration setting the inertia weight according to the following equation may be a better choice: A proper value of the inertia weight provides a balance between global and local explorations.A large inertia weight favors global search, while a small inertia weight favors local search.In general, settings near 1 facilitate global search, and settings ranging from [0.2, 0.5] facilitate rapid local search.The linear decreasing weight ( 16) is introduced to dynamically adapt the inertia weight (13). + and  − are usually set to 0.9 and 0.4: The nonlinearly decreasing inertia weight (14) incorporates the hyperbolic tangent function (15) to update   of each particle : where NI   is the neighborhood index of the particle , which is calculated at each iteration as where Worst  is the global worst fitness value at the current iteration.A small NI   indicates the current position is bad and needs global exploration with a large inertia weight.On the contrary, a large NI   indicates the requirement of local exploitation with a small inertia weight.The constriction factor  was used to control the magnitude of the velocities, instead of .The velocity update scheme is replaced with the following: where  =  1 +  2 and generally  = 1.

Experiments and Evaluations
4.1.Datasets.The prediction of the development of tertiary industry is a very important topic in economic and financial areas.However, time series prediction in economic area is generally very difficult because it is nonstationary, nonlinear, and highly noisy.
In order to illustrate that our PRGM(1,1) algorithm gets better performance on both smoothing and noisy data forecasting model by using small set of training data, we used three datasets: financial intermediation in Beijing during 1994 to 2010 which has relatively smoothing trends, real estate in Beijing during 1994 to 2010 which seems much nonlinear, and semiconductor industry production in Taiwan from 1994 to 2002 which seems regular from 1994 to 2000 but irregular since 2000.All datasets are collected from the China Statistical Yearbook, National Bureau of Statistics of China.[36].In this paper, three metrics, namely, mean absolute percentage error (MAPE), mean absolute deviation (MAD), and mean squared error (MSE), which are often adopted for the performance of each model [6,22], are used to evaluate the prediction accuracy.MAPE is a general accepted metric in percent of prediction accuracy.The criterion of MAPE [37] is listed in Table 1:

Evaluation Metrics. Prediction accuracy is an important criterion for evaluating a forecasting technique
MAD and MSE are two metrics of the average magnitude of the forecast errors, but the latter imposes a greater penalty on a large error than several small errors.The smaller the values, the closer the predicted values to the actual values [38]: Besides, the coefficient of determination, denoted as  2 , is also applied to evaluate models in our experiments: where 2 )/,  = 1, 2, . . ., .
The higher the value of  2 is, the more successful the model is at predicting statistical data [23].The maximum value of the coefficient of determination  2 is 1.

Experimental Setup.
The experiments are divided into two parts, Experiment I and Experiment II.Experiment  used the datasets of financial intermediation and real estate in Beijing.The data from 1994 to 2005 were used as sample data, while the data from 2006 to 2010 were used for prediction and test.Experiment I compared three prediction models on these data, GM(1,1), RM-GM(1,1), and PRGM(1,1).Experiment II compared various PRGM(1,1) with different parameter settings.
The values of the parameters for PRGM(1,1) are selected in both experiments.We set the number of candidates of  in particle searching space  to 1, 000 and the maximum number of iterations iter max to 100.For the basic PRGM(1,1), we set the two weights,  1 = 2 and  2 = 2. 2 shows the parameters calculated by the three prediction models, GM(1,1), RM-GM(1,1), and PRGM(1,1).In GM(1,1) which is constructed by all of the data 1994-2005 with the fixed  value 0.5, the parameter  is equal to a fixed value −0.148 and  is also equal to a fixed value 165.061 for all the predicted years in financial intermediation.Similarly,  = −0.254and  = 42.152 for all the predicted years in real estate.

Experiment I. Table
In RM-GM(1,1), we set the sample sequence with  = 12 and  = 1 starting from 1994 to forecast the data from 2006 to 2010.Hence, the rolling number  equals 5.The  value is also fixed to 0.5 in RM-GM(1,1).However, the parameters  and  change for every predicted year because of the rolling mechanism.
In PRGM(1,1), similar with RM-GM(1,1), the sample sequence with  = 12 and  = 1 that starts from 1994 to 2005 was used for predicting the 5 years' data since 2006.However, the value of  is a variable of year that is different among the predictions of 2006-2010.Hence, the parameters  and  change for every predicted year because of both the rolling mechanism and the variety of .
Table 4 shows the forecasting results of the semiconductor industry production from 1998 to 2002 predicted by  values RM-GM(1,1) and PRGM(1,1) using the sample data of 1994-2002.We compared our results of PRGM(1,1) with the results produced by  value RM-GM(1,1) from the literature [26].The MAPE value of PRGM(1,1), that is, 8.3787%, is better than the value of 10.52% from P-RM-GM(1,1), The error of predication, which is defined as that indicates the deviation degree of the predictive data from the actual data for each year among 1998-2000 from PRGM(1,1), is much lower than from P-RM-GM(1,1).The actual value suddenly fell by more than 10%.PRGM catches the trends well, which means that PRGM has remarkable ability to predict the irregular sequence, especially to sense the unexpected changes.However,  value RM-GM(1,1) model gets the predictive value of 2002 with a very small percentage error 0.48%, but the error value of PRGM(1,1) model is 11.839%.The reason is that PRGM(1,1) model can get better results of matching the trends that the production data rebounded from the slump of 2001.PRGM(1,1) is much better than  value RM-GM(1,1) to forecast the trends of time series sequences, which is significant for the economic prediction.

Experiment II.
In this experiment, we estimated PSO variants of different parameter configurations.We evaluated the constant setting and linearly varying settings of  1 and  2 on prediction accuracy.In constant settings, the configuration of  1 =  2 = 1.5 is the best.It is in accordance with most of the previous conclusions.In linearly varying setting (see (12)), there is not much improvement on the metrics compared with the constant setting.We also evaluated the forecasting performances with diverse combinations of the start values  + 1 and  − 2 , and the end values  − 1 and  + 2 ranging from [0.5, 4] with a step of 0.5 and found that there still is not much difference among them for all of the three datasets.
We evaluated three kinds of  settings, constant, linearly decreasing, and nonlinearly decreasing.In constant setting, the optimal setting is  = 0.5 for all of the datasets.We also observed that the performance is exactly the same when We also used the nonlinearly varying ( 14) and the constriction factor  (18) to update particles' velocities (17).Figure 1 shows that the nonlinearly varying setting and the constriction factor setting with linearly varying  1 and  2 in the meantime can improve the prediction performance.The nonlinearly varying method does not require an initial setting of  − or  + .It calculates the  dynamically according to the current situation.A large  is set if current position is far away from the global best position, or a small  is set if current position is near to the global best position.The constriction factor can slow down the velocities but needs to combine with linearly varying method to control the effects of  1 and  2 in order to search much more spaces.
Figure 2 shows an illustration of the evolution of the fitness at the first predicted year in all of the datasets.According to our empirical study, the maximum iteration  max can be set to 60-80 in the single particle PSO. Figure 2 shows the comparison of the convergence speed among variant PSOs.There is no general rule on these PSOs for all of the datasets, but all PSOs converge after 60-80 iterations at most.The time complexity of the PSO is ( max ⋅  ⋅ (Fitness)).The runtime is dependent on both population size and iteration number.

Conclusions
In this paper, we proposed a rolling mechanism based grey model, and its parameter  is optimized by the PSO algorithm, which has the significant impact of the forecast accuracy.The experiments show that the prediction made by PRGM(1,1) model is almost perfect among three economic datasets, which are either regular or noisy.PRGM(1,1) gets much better forecast accuracy compared with three widely used grey models: GM(1,1) that has a fixed  and ignores the impact of recent data, RM-GM(1,1) that considers the impact of recent data but has a fixed  through the whole prediction period, and  value RM-GM(1,1) that not only considers the recent data but also adjusts  in each rolling step.
We evaluated other variant PSOs with different parameter settings.Almost all of metaheuristics are required to set a number of parameters, which might lead to different outcomes, for example, multiple locally optimal solutions in the parameter space in terms of solution quality.An extension of this work includes analyzing the principles of balancing exploitation and exploration of metaheuristics on forecasting.We will focus on the work of the details of comparing the effectiveness of the exploitation or the exploration among

Figure 2 :
Figure 2: The convergence speed comparison of the fitness values among PSOs with various parameter setting methods.

Table 4 :
(13))orecasting data and evaluation metrics produced by PRGM(1, 1) and  value RM-GM(1,1).Figure1: The influence of PSOs with various parameter settings on MAPE and  2 .In linearly decreasing setting(13), we varied the combinations of  − and  + ranging from [0, 1] with a step of 0.1, respectively.The results showed that there is nearly no difference on the metrics among different combinations.It indicates that the historical setting does not have much impact on the forecasting performance by linearly updating .
the population size is 10 with different values of  ranging from [0, 1].