Multistep-Ahead Air Passengers Traffic Prediction with Hybrid ARIMA-SVMs Models

The hybrid ARIMA-SVMs prediction models have been established recently, which take advantage of the unique strength of ARIMA and SVMs models in linear and nonlinear modeling, respectively. Built upon this hybrid ARIMA-SVMs models alike, this study goes further to extend them into the case of multistep-ahead prediction for air passengers traffic with the two most commonly used multistep-ahead prediction strategies, that is, iterated strategy and direct strategy. Additionally, the effectiveness of data preprocessing approaches, such as deseasonalization and detrending, is investigated and proofed along with the two strategies. Real data sets including four selected airlines' monthly series were collected to justify the effectiveness of the proposed approach. Empirical results demonstrate that the direct strategy performs better than iterative one in long term prediction case while iterative one performs better in the case of short term prediction. Furthermore, both deseasonalization and detrending can significantly improve the prediction accuracy for both strategies, indicating the necessity of data preprocessing. As such, this study contributes as a full reference to the planners from air transportation industries on how to tackle multistep-ahead prediction tasks in the implementation of either prediction strategy.


Introduction
Forecasting is an important component of planning in any enterprise, but it is particularly critical in airline revenue management because of the direct influence forecasts have on the booking limits that determine airline profits. Unfortunately, the passengers behaviors and other complicating factors make air passengers traffic prediction extremely difficult.
In the past decades, academic researchers and practitioners have made many contributions to air passengers traffic prediction. Most of the prediction models abounded in the literature can mainly be classified into two categories, namely, econometric models (casual model) and time series models. In the econometrics community, traditional research has been focusing on the economic factors of the take-off and landing point, such as population, income, and price [1][2][3][4][5]. It should be noted that most econometric models aimed to reveal the underlying relationship between air passengers traffic flow and selected variables such as geoeconomic and service-related factors. Due to the complexity of econometric modeling in variables selection, time series approach is a promising alternative in air passengers traffic prediction [6]. However, these traditional time series methods always do not work in practice. The main reason is that the underlying assumption of these traditional time series methods is linearity and they cannot capture the nonlinear patterns hidden and recognize the irregularity well. Recent research focuses on modeling time series with complex nonlinearity, dynamic variation, and high irregularity [7][8][9][10][11][12][13][14].
Recently, the hybrid modeling approach integrating ARIMA, the most widely used linear model, and SVM, the well-established computational intelligence technique, has been proposed and justified in various areas [15][16][17]. Pai established a hybrid ARIMA and support vector machines model for stock price forecasting. Chen proposed a hybrid methodology that exploits the unique strength of the SARIMA (seasonal autoregressive integrated moving average) model and the support vector machines model in forecasting Taiwan's machinery industry production values. 2 The Scientific World Journal And Guo developed a hybrid Seasonal Autoregression Integrated Moving Average and Least Square Support Vector Machine (SARIMA-LSSVM) model to predict the mean monthly wind speed in Hexi Corridor, which shows that the developed method is simple and quite efficient. Additionally, Zhang combined the ARIMA and feedforward neural networks models in forecasting [18].
This hybrid approach arises from the fact that real-world time series are rarely pure linear or nonlinear but containing both linear and nonlinear patterns, and neither ARIMA nor SVM can be adequate in modeling and forecasting time series since the ARIMA model cannot deal with nonlinear relationships while the SVM model alone is not able to handle both linear and nonlinear patterns equally well. As such, this hybrid approach can take full advantages of the unique strength of ARIMA and SVM models in linear and nonlinear modeling, respectively. However, an important point to note from the past studies mentioned above is their preoccupation with one-step-ahead forecasting rather than the multistep-ahead one. In one-step-ahead case, the predictor uses all or some of the observations to estimate a variable of interests for the time step immediately following the latest observation. However, it provides no information as to the long-term future behavior of air passengers traffic.
In summary, this study extends the well-established ARIMA-SVMs model into the case of multistep-ahead prediction of air passengers traffic using the two modeling strategies, including iterated strategy and direct strategy and then goes a step forward by investigating the effectiveness of data preprocessing approaches, such as deseasonalization and detrending, along with the two strategies. The monthly air passengers traffic series from four airlines are used for the purpose of validation. The experimental results are judged on the basis of the prediction accuracy and the computational cost. The first contribution of this paper is that the proposed modeling framework is capable of capturing the complex dynamic of air passenger traffic considered, resulting in a more accuracy multistepahead prediction. The second contribution is to provide the first strong empirical evidence within the air passengers traffic literature on whether the superiority of ARIMA-SVMs modeling framework holds consistently in the case of multistep-ahead prediction. Last but not least, the third contribution is to provide the experimental evidence on the performance of different multistep-ahead prediction strategies, which may shed lights on the selection of modeling strategy while facing the different time horizons upon prediction.
The rest of this paper is organized as follows: Section 2 describes the methodologies used in this study including ARIMA and SVM along with the proposed hybrid methodology and multistep-ahead prediction strategies used in this study. Section 3 details the research design on data source, data preprocessing, input selection, the selected counterparts for comparison, statistical criteria, methodological implementations, and experiment procedures. Following that, in Section 4, the experimental results are discussed. Section 5 finally concludes this paper.
The basic idea of this model is to regard the time series as a random sequence and to use a certain mathematical model to describe the sequence approximately. Once the model is identified, we can use the past value to predict the future.
The basic steps are as follows.
(1) Identify smooth sequence using scatter chart, autocorrelation function, partial autocorrelation function, and ADF unit root test.
(2) Do smooth processing on nonstationary sequence such as difference.
(3) Establish the model according to the time-series model identification rules. If the partial correlation function of a smooth sequence is censored and the autocorrelation function is trailing, sequence can be concluded for the AR model. If the partial correlation of a smooth sequence is trailing and the autocorrelation function is censored, it will be determined for MA model. And if both partial correlation function and autocorrelation function are trailing, ARMA model is for the sequence.
(4) Estimate parameters and justify whether the statistical test is significant.
(5) Do hypothesis test and diagnose whether the sequence is the white noise.
(6) Predict with the model established.

Support Vector Machine (SVM) Model.
As a new method based on structural risk minimization principle, support vector machine is superior to those methods based on experience risk minimization principle. It can be converted to a salvation of a convex quadratic programming and make sure the extreme solution is the global optimal solutions. Support vector machines can handle higher dimension data better even with a relatively low amount of training samples and has a very good generalization. The models select limit The Scientific World Journal 3 support vectors from input data and so have a fast processing speed. For linear and regressive data set { , }, the SVM regression function is formulated as follows: The coefficients and are estimated by minimizing where is called the -intensive loss function and is formulated as follows: By introducing slack variables and * , Equation (3) can be transformed to the following constrained formulation: subject to When solving the above formula, we always use dual theory to transform it into a convex quadratic programming problem. Introducing Lagrange equation, (5) changed into the following form: subject to When the data set cannot be regressed linearly, we also map them to a high dimension feature space and make linear regress. Then the formulation is as follows: According to Karush-Kuhn-Tucker theory, the final SVM regression function can be the following form: It is reasonable to consider a time series to be composed of a linear autocorrelation structure and a nonlinear component. That can then be represented as follows: where denotes the linear component and denotes the nonlinear component. These two components have to be estimated from the data. First, we let ARIMA model the linear component, then the residuals from the linear model will contain only the nonlinear relationship. Let denote the residual at time from the linear model, and̂denotes the forecast value of the ARIMA model, then Zhang illustrated a linear model is not sufficient if there are still linear correlation structures left in the residuals. Moreover, even if a model has passed diagnostic checking, the model may still not be adequate in that nonlinear relationships have not been appropriately modeled [18]. So by modeling residuals using SVM, nonlinear relationships can be discovered. That can be represented as follows: where is a nonlinear function determined by SVM and is the random error. Therefore, the combined forecast is 4 The Scientific World Journal wherẽis the forecast value of the SVM model. In summary, the ultimate forecast value consists of linear forecast part and nonlinear forecast part.

Iterated Strategy and Direct Strategy.
Multistep-ahead time series forecasting can be described as an estimation on future time series +ℎ , (ℎ = 1, 2, . . . , ), while is an integer and more than one, given the current and previous observation , ( = 1, 2, . . . , ). In the present study, iterated strategy and direct strategy are frequently selected for multistep-ahead forecasting. For each selected strategies, there are a large number of variations proposed in the literatures, and it would be a hopeless task to consider all existing varieties. Our guideline was therefore to consider the basic version of each strategy (without the additions, or the modifications proposed by some other researchers). The reason for selecting the following two strategies is that they are some of the most commonly used strategies. The following subsection presents a detailed definition of the selected strategies.

Iterated Strategy.
The first is named as the iterated strategy by Chevillon [19] and is often advocated in standard time series textbooks [20,21]. This strategy constructs a prediction model by means of minimizing the squares of the in-sample one-step-ahead residuals and then uses the predicted value as a "known" input to forecast the next point and continues in this manner until reaching the horizon.
In more detail, iterated strategy first embeds the original series into an input-output format: where ⊂ { , . . . , − +1 }, = +1 . Then the iterated prediction strategy learns one-stepahead prediction model: where denotes the additive noise. After the learning process, the estimation of the next values is returned bŷ

Direct Strategy.
In contrast to the iterated strategy which uses a single model, the other commonly applied strategy, namely, direct strategy first suggested by Cox [22], constructs a set of prediction models for each horizon using only its past observations, where the associated squared multistep-ahead errors are minimized. Direct strategy estimates different models between the inputs and the outputs to predict { +ℎ , ℎ = 1, 2, . . . , }, respectively, which requires a sharply increased computational expense, especially in the case of a longer prediction horizon [23]. The direct strategy first embeds the original series into datasets where ⊂ { , . . . , − +1 }, ℎ = +ℎ . Then, the direct prediction strategy learns direct models on ℎ = { 1 , . . . , }, respectively. Consider where denotes the additive noise.
After the learning process, the estimation of the next values is returned by

ARIMA-SVMs Model for Multistep-Ahead Prediction.
In this study, we consider -step ahead forecasting by ARIMA-SVMs model. The proposed methodology of the multistepahead prediction hybrid system consists of two stages. In the first stage, an ARIMA model is used to analyze the linear part of the problem for each step. In the second stage, SVMs model is developed to model the residuals from the ARIMA model. Since the ARIMA model cannot capture the nonlinear structure of the data, the residuals of linear model will contain information about the nonlinearity. Assuming ( = 1, 2, . . . , ) is the original data set and according to ARIMA-SVMs model, includes the linear part and nonlinear part , namely, = + . First of all,step ahead prediction by ARIMA model will be employed and obtain the -step ahead forecasted valuẽ+ ℎ , then for each time , the residuals of +ℎ can be obtained by +ℎ = +ℎ − +ℎ . And for nonlinear time series ( = 1, 2, . . . , ),step ahead prediction by SVMs model will be employed and obtain the -step ahead forecasted values̃+ ℎ , the ultimate forecasted time series for +ℎ is̃+ ℎ , which consist of̃+ ℎ and̃+ ℎ . In this research, two multistep-ahead forecasting strategies will be employed, namely, iterated strategy and direct strategy, and it should be noted that if the multistepahead strategy has been selected for linear part, the same strategy should be employed in nonlinear part; therefore, the model should be ARIMA (iterated strategy)-SVMs(iterated strategy) or ARIMA(direct strategy)-SVMs(direct strategy).
The flowchart of proposed multistep-ahead ARIMA-SVMs modeling framework is depicted in Figure 1. As shown in Figure 1, the proposed modeling framework includes three parts: data preprocessing, multistep-ahead prediction, and PSO module for SVMs parameters optimization (details on this parameters tuning module can be found in  Section 3.3). In data preprocessing procedure, linear transference, deseasonalization, and detrending are performed on the data, and then the multistep-ahead forecasting model based on ARIMA-SVMs is adopted; in addition, in nonlinear forecasting part, the parameter optimization algorithm PSO is employed to optimize the parameters of SVM.

Data and Preprocessing
3.1.1. Data. In this study, four monthly air passenger traffic series of American airlines are chosen as experimental 6 The Scientific World Journal Considering the Severe Acute Respiratory Syndrome (SARS) which happened in 2003 and brought a heavy blow to American air industry, the four sampling data series cover a period from January 1990 to December 2001, each with a total of 144 observations. The original data of these four airlines are shown in Figures 2(a) to 2(d).
The descriptive statistics details of all the four data series are summarized in Table 1.

Preprocessing.
Normalization is a standard requirement for time series modeling and prediction with neural network. Thus, the data sets are first scaled by linear transference to map onto a range of [0, 1]. As mentioned in Section 1, the air passengers traffic data are monthly and with strong seasonal components and trend patterns. Therefore, after the linear transference, deseasonalization and detrending are performed on the data. Deseasonalization is performed by means of the revised multiplicative seasonal decomposition presented in [24]. In addition, detrending is performed by fitting a polynomial time trend and then subtracting the estimated trend from the series when trends are detected by the Mann-Kendall test [25].

Input Selection.
Filter method was employed for input selection in this study. In the case of the filter method, the best subset of inputs is selected a priori based only on the dataset. The input subset is chosen by a predefined criterion, which measures the relationship of each subset of input variables with the output [26]. Specifically, in terms of input selection criteria, the partial mutual information [27] is used for the prediction models. Mutual information (MI) is a commonly adopted measure of dependence between variables and has been widely used for input selection [26]. However, this raises a major redundancy issue because the MI criterion does not account for the interdependency between candidate variables. To address this issue, Sharma [27] developed an improved algorithm that exploits the concept of partial mutual information (PMI), which is the nonlinear statistical analog of partial correlation. The definitions of PMI are shown as follows: where and are generalized to represent time series ( ) and lagged time ( − ) with time step ( ≤ ) conditional on which is a set of remaining time-lag variables. In performing the PMI, the input variable with highest conditional PMI value at every iteration can be added to the inputs set.

Parameters Selection with PSO.
In this study, RBF has been chosen as the kernel function for SVM models, and thus the parameters , , and are to be optimized based on the training sets. Despite the new variants of PSO for parameter tuning such as in [28], in this study, we chose the standard PSO algorithm to tune the parameters with the lowest MAPE on training sets.
PSO based operators are used to explore the search space. Particle Swarm Optimization (PSO) is a population-based metaheuristic that simulates social behavior such as birds flocking to a promising position to achieve precise objectives (e.g., finding food) in a multidimensional space by interacting among them [29]. To search for the optimal solution, each particle adjusts their flight trajectories by using the following updating equations: where 1 , 2 ∈ R are acceleration coefficients, is inertia weight, and 1 and 2 are random numbers in the range of [0, 1]. V and denote the velocity and position of the th particle in th dimension at th iteration, respectively. is the value in dimension of the best parameters combination (a particle) found so far by particle .
is the value in The Scientific World Journal     dimension of the best parameters combination (a particle) found so far in the swarm ( ); = ⟨ 1 , . . . , ⟩ is considered as the global best (gbest) position. Note that each particle takes individual (lbest) and social (gbest) information into account for updating its velocity and position.
In the search space, particles track the individual's best values and the best global values. The process is terminated if the number of iterations reaches the predetermined maximum number of iterations.

The Selected Counterparts for Comparison.
Single SVR and monthly ARIMA are selected as counterparts for the purpose of comparison. It should be noted the reason for selecting single SVR and monthly ARIMA is to justify the effectiveness of ARIMA-SVM modeling framework. Additionally, the performances on both one-step-ahead (prediction horizon = 1) and multistep-ahead (prediction horizon = 2, 4, 6,8,12,18,24) prediction are compared across all the models to provide more evidences for justification. Note that the iterated strategy and direct strategy are employed in this study for comparison.

Statistical Criteria and Rank-Based Measures.
To compare the effectiveness of the different prediction models, no single accuracy measure can capture all the distributional features of the errors when summarized across data series.
For each forecast horizon ℎ, here, we consider three forecast accuracy measures. The first is the mean absolute percentage error (MAPE) defined as (25), and the second is symmetric mean absolute percentage error (SMAPE) defined as (26), and the last accuracy measure is the mean absolute scaled error (MASE), defined as (27). MAPE has the advantage of being scale-independent and so is frequently used to compare forecast performance across different datasets. Moreover, the MAPE also has the disadvantage that it puts a heavier penalty on positive errors than on negative errors. MASE has recently been suggested by Hyndman and Koehler [30] as a means of overcoming observation and errors around zero existing in some measures. The MASE has some features which are better than the SMAPE, which has been criticized for the fact that its treatment of positive and negative errors is not symmetric [31]. However, the MAPE and SMAPE are still used in this study due to their popularity in forecasting literature. The smaller the values of MAPE, SMAPE, and MASE, the closer the predicted time series values to the actual values. Consider 8 The Scientific World Journal where ( ) denotes the observation at period for time series ,̂( ) denotes the forecasted value of ( ), is the number of time series (in this case, = 4), is the number of observation in the hold-out sample (in this case, = 48), and is the number of observation in the estimation sample for time series .
In this study, we repeat running each model fifty times for airline passengers traffic dataset to even out the fluctuations. Then each of the fifty runs will produce a SMAPE for all 4 time series. Next, the mean and standard deviation of these fifty SMAPE are calculated and listed in the tables for examining the performance of different models. Similarly, the mean and standard deviation of MASE are also computed. Note that the error measures are computed after rolling back of the preprocessing step performed, such as deseasonalization and detrending.
Following the experimental settings in [32], we also conduct a number of statistical tests to compare each model based on the obtained fifty SMAPE and MASE at the 0.05 significance level. For each prediction horizon ( = 1 and 24) and performance measure (i.e., MAPE, SMAPE, and MASE), we perform a one-way analysis of variance (ANOVA) procedure to determine if there exists statistically significant difference among the eight models in out-ofsample forecasting. Then, to further identify the significant difference between any two models, the Tukey honestly significant difference (HSD) test [33] is employed to compare all pairwise difference simultaneously. Note that Tukey HSD test is a post-hoc test; this means that a researcher should not perform Tukey HSD test unless the results of ANOVA are positive. A rank-based performance measure termed as multiple comparisons with the best (MCB) [34] is used to test whether some models perform significantly worse than the best model. [35] is employed for SVR modeling here. We select the Radial basis function (RBF) as the kernel function in the prediction models. To determine the hyperparameters, namely, , , (in the case of RBF as the kernel function), the PSO algorithm is employed in the current study. Due to its simplicity and generality as no important modification was made for applying it to model selection, PSO has been recently established for parameter determination of SVR. In solving hyperparameter selection by the PSO, each particle is requested to represent a potential solution ( , , ). Fortunately, several empirical and theoretical studies have been performed about the parameters of PSO from which valuable information can be obtained [36]. In this study, the parameters are determined according to the recommendations in these studies and selected based on the prediction performance and computational time in a trial-error fashion. Through experiments, the parameter values of PSO are set as follows. Both the cognitive and interaction coefficients are set to 2. The swarm size and number of iterations are set to be 10 and 50, respectively. For monthly ARIMA estimation, the automatic model selection algorithm proposed by Hyndman and Khandakar and implemented in the R software package "forecast" 5 is used in this study. Based on these, we develop our proposed hybrid ARIMA-SVMs programs in Matlab, which is available upon request. Figure 3 shows the experimental procedure using the real time series. Each series is split into the estimation sample and the hold-out sample firstly. Then, the input selection and model selection for each series are conducted using aforementioned filter method, PSO algorithm, and fivefold cross-validation with iterated and direct strategies. Finally, the attained models are tested for hold-out samples, the measures MAPE ℎ , SMAPE ℎ , and MASE ℎ are computed for each prediction horizon ℎ (in our case ℎ = 1, 2, . . . , 24) over four time series. Furthermore, the modeling process for each series is repeated fifty times. Upon the termination of this loop, performance of the examined models with selected strategies at each prediction horizon is judged in terms of the mean, average by fifty, of the MAPE ℎ , SMAPE ℎ , and MASE ℎ . Analysis of variance (ANOVA) test procedures are used to determine if the means of performance measures are statistically different among the five models for each prediction horizon and dataset. If so, Tukey's honesty significant difference (HSD) tests [37] are then employed further to identify the significantly different prediction models in multiple pairwise comparisons at 0.05 significance level.

Results and Discussion
The forecasting performance of the hybrid ARIMA-SVMs and benchmarking method on the four airlines' air passenger series covered a period from January 1990 to December 2001 are shown in Table 2.
For each accuracy measure and prediction horizon, the strategies are rank order from 1 (the best) to 6 (the worst) in Table 2.
(i) As far as the comparison between ARIMA-SVMs with others benchmark methods is concerned, the results are mixing among the prediction measures examined. In terms of MAPE and the data without deseasonalization-detrending, I-ARIMA-SVM gets the best performance when horizon = 1, and D-SVM gets the best performance when horizon = 4, 8, 12 and in the average accuracy measures over the prediction horizon 1 to 24, while D-ARIMA-SVM gets the performance when horizon = 2, 6, 18, 24.
(ii) It is obvious that applying deseasonalization and detrending approach for multistep-ahead prediction obtains better performances.
Following the experimental procedure presented in Figure 3, an ANOVA procedure is performed to determine The Scientific World Journal 9

12
The Scientific World Journal  if there exists a statistically significant difference among the six modeling strategies in the hold-out sample for each of the performance measures and the prediction horizon. Furthermore, to identify the significant difference between any two strategies, the Tukey's HSD test is used to compare all pairwise differences simultaneously. Tukey's HSD test is a post-hoc test, meaning that Tukey's HSD test should not be performed unless the results of the ANOVA procedure are positive. Table 3 shows the results of the multiple comparison tests for three datasets. Several observations can be made from Table 3.
(i) As for the comparison between ARIMA-SVMs and the benchmark methods, the difference in prediction performance is significant at the 0.05 level and ARIMA-SVMs is better than SVM and ARIMA. (ii) Concerning the current two multistep-ahead forecasting strategies, the direct strategy significantly outperforms the iterated strategy for the majority of prediction horizons.
During the experiments, we also found that when we try to improve the forecast accuracy of SVM, there will be an overfitting on training sets. That means, though we can get a better forecast performance of the SVM model, we cannot get a better forecast performance or even get a worse performance of the hybrid ARIMA-SVM model. Therefore, the optimal parameters of the hybrid model should be further researched.

Conclusion
Due to the complex and dynamic pattern with uncertainty, time series prediction still remains as one of the most challenging tasks in field of time series analysis. The hybrid ARIMA-SVMs prediction models have been established recently, which has present good performance in time series prediction. In this study, the hybrid ARIMA-SVMs has been employed in multistep-ahead prediction which is more complex and difficult than one-step-ahead. The contribution of this study is employing ARIMA-SVMs in multistep-ahead prediction; moreover, air passengers traffic data is used for this study and a large scale comparative study has been conducted for validation. Quantitative and comprehensive assessments are performed with the air passengers traffic data on the basis of the several prediction accuracies. Experimental results and comparisons demonstrate the superiority of the proposed ARIMA-SVMs modeling strategy for multistep-ahead time series prediction.