With regard to the nonlinearity and irregularity along with implicit seasonality and trend in the context of air passenger traffic forecasting, this study proposes an ensemble empirical mode decomposition (EEMD) based support vector machines (SVMs) modeling framework incorporating a slope-based method to restrain the end effect issue occurring during the shifting process of EEMD, which is abbreviated as EEMD-Slope-SVMs. Real monthly air passenger traffic series including six selected airlines in USA and UK were collected to test the effectiveness of the proposed approach. Empirical results demonstrate that the proposed decomposition and ensemble modeling framework outperform the selected counterparts such as single SVMs (straightforward application of SVMs), Holt-Winters, and ARIMA in terms of RMSE, MAPE, GMRAE, and DS. Additional evidence is also shown to highlight the improved performance while compared with EEMD-SVM model not restraining the end effect.
Air passenger traffic forecast is of great importance for airlines and civil aviation authorities. For airlines, accurate forecasts play an increasingly important role in the revenue management. It helps to reduce the airlines’ risk by objectively evaluating the demand of the air transportation business [
In the past decades, academic researchers and practitioners have made many contributions to air passenger traffic forecast. Most of the quantitative forecasting models abounded in the literature can fall into two categories, namely, econometric modeling and time series. In the econometric modeling area, pioneering works can be found in [
Compared with econometric modeling, little attention has been paid on time series models in air passenger traffic forecast. The important research work was done by Grubb and Mason [
Usually, the above time series models can provide good forecasts when the air passenger traffic series under study is linear or near linear with explicit seasonality and trend. However, in real work air passenger traffic series, there is a great deal of nonlinearity and irregularity along with implicit seasonality and trend. Poor performance can be found frequently in using the traditional time series methods in practice. The main reason is that the underlying assumption of these traditional time series methods is linearity and they cannot capture the nonlinear patterns hidden and recognize the irregularity well. Recent research efforts on modeling time series with complex nonlinearity, dynamic variation, and high irregularity provided two promising directions. One is to establish emerging artificial intelligence models such as artificial neural networks (ANNs), support vector machines (SVMs), and genetic programming (GP). The earlier literature on air passenger traffic forecast by ANN can be found in [
Under the principle of decomposition-ensemble modeling framework, this study proposes an ensemble empirical mode decomposition (EEMD) based support vector machines (SVMs) modeling approach for air passenger traffic forecast. Specially, the end effect issue ignored in [
To examine the forecast performance of the proposed EEMD-Slope-SVMs, monthly air passenger traffic data of six selected airlines in UK and USA are used in the experiment to see the forecast accuracy measured by RMSE, MAPE, GMRAE, and DS compared with EEMD-SVMs (not restraining the end effect), single SVMs (straightforward application of SVMs), Holt-Winters, and ARIMA.
The rest of this paper is organized as follows. Section
Ensemble empirical mode decomposition as proposed by Wu and Huang [
EEMD is an empirical, intuitive, direct, and self-adaptive time series decomposition technique. It is suitable for decomposing nonlinear, and nonstationary time series. It decomposes original data series into intrinsic mode functions (IMFs) on the basis of local characteristic scale, the distance between two successive local extrema in EEMD. IMF must satisfy the following two requirements. (1) In the whole data series, the number of extrema (sum of maxima and minima) and the number of zero crossings must either be equal or differ at most by one. (2) At any point, the mean value of the envelopes defined by local maxima and minima must be zero.
Based on this definition, IMFs can be extracted from the data series according to the following sifting procedure: identify all the local extrema, including local maxima and local minima; connect the local maxima by a cubic spline to define the upper envelope, compute the point-by-point local envelope mean obtain the component treat the final obtain the residue then treat
Generally, the process from step (1) to step (6) is called the IMF extraction process, and the process from step (1) to step (8) is called the whole sifting process. After finishing the whole sifting process, the data series
The sifting process described above is the core of the EEMD method. In implementation, some algorithm issues arise, such as the stopping criteria for IMF extraction and for the whole sifting process; a recent detailed discussion of these issues can be found in references [
The principal concept of the EEMD approach is as follows: the added white noise presents a uniform reference frame in the time-frequency and time-scale domains for signals of comparable scales to collate into one IMF and then cancel themselves out by ensemble averaging after serving their purpose. Thus, the problem of mode mixing in the original EMD method can be limited significantly.
For a given data series Generate series with added white noise, Decompose the Repeat step (1) and step (2) Obtain the (ensemble) means of the corresponding IMFs of the decompositions as the final result, that is, the
In reality, the number of ensemble members is often set to 100; the standard deviation of the added white noise is set to 0.1 or 0.2.
As discussed in [
Recently, a large number of studies have developed end condition methods for restraining the end effect [
Just as mentioned in Section
The EEMD with slope-based method.
Suppose The original time series Employ SVMs to model each IMF components and the residual component using a rolling origin and a rolling window training strategy to get the model specifications of each components, respectively. For the purpose of seeking the ensemble function, an SVM model is established to model the relationship between the actual value and the forecast values of all extracted components in the same time points. For instance,
We name the proposed approach above as EEMD (decomposition)-Slope-based method (restraining end effect)-SVMs (forecasting) (abbreviated to EEMD-Slope-SVMs). Following the same naming rule, EEMD-SVMs refers to the model without any end condition methods.
Figure
The framework of proposed EEMD-based SVM learning approach.
In this study, air passenger traffic series from six airlines in USA and UK are chosen as experimental samples. The data of UK are freely obtainable from CAA (
For United Air, American Airlines, and Delta Airlines, the sampling data covers the period from January 1990 to March 2008, with a total 219 observations. The data from January 1990 to March 2006 is used for the training set (195 observations), and the remainder is used as the testing set. For Southwest Airlines, we take a little longer monthly data from January 1990 to June 2008, with a total of 222 observations. The first 198 data, from January 1990 to June 2006, are used as the training set, and the remainder is used as the testing set.
For each of the two UK airlines, easyJet Airline and Virgin Atlantic Airways, the sampling data period covers from January 1998 to September 2007, with a total of 117 observations. We use the data from January 1998 to September 2005 as the training set (93 observations) and the remainder as the testing set.
Normalization is a standard requirement for time series modeling and forecasting. Thus, the air passenger traffic series were firstly preprocessed by adopting liner transference to adjust the original data set scaled into the range of
As previously stated, most of the air passenger traffic series considered exhibit a strong seasonal component or trend pattern. After the linear transference, deseasonalizing and detrending were performed. We conducted deseasonalizing by means of the revised multiplicative seasonal decomposition presented in [
The prediction performance is evaluated using the following statistical metrics, namely, the root mean squared error (RMSE), mean absolute percentage error (MAPE), and geometric mean relative absolute error (GMRAE). Let
Besides accuracy, we also take the directional predictions to improve decision. The ability to predict movement direction can be measured by a directional statistic (
Holt-Winters and ARIMA are used as the benchmarking forecasting methods to justify the performance of the proposed approach in the present study. For the reason of length limit, details of Holt-Winters and ARIMA are omitted. It is worth noting that these two models utilize the original time series for forecast and do not use the decomposed ones.
In this study, a nonparametric Wilcoxon’s signed-rank test [
In this study, we employ LibSVMs (version 2.86.) [
The most important thing in SVM training is the kernel function parameters tuning. In this study, we chose the RBF as the kernel function. For the size of all data series is not very long, the efficiency of the training SVM model is not the key point; we use grid search on tuning the RBF parameters:
The parameters selection is carried out during the training for Holt-Winters and ARIMA using the embedded autofitting function in a forecast package in
The forecasting performances on testing sets of all the examined models (EEMD-Slope-SVMs, EEMD-SVMs, individual SVMs, Holt-Winters, and ARIMA) in terms of RMSE, MAPE, GMRAE, and DS for the six airlines monthly air passenger traffic data series are shown in Table
Forecasting performances of all models across all the data series.
|
|
EEMD-Slope-SVMs | EEMD-SVMs | SVMs | Holt-Winters | ARIMA |
---|---|---|---|---|---|---|
MAPE | 1.501 | 1.8332 | 2.096 | 5.7316 | 3.1582 | |
American | RMSE | 127192 | 141840 | 161874 | 449709 | 251254 |
GMRAE | 0.2165 | 0.2973 | 0.3351 | 0.8012 | 0.5268 | |
DS | 1 | 1 | 1 | 0.6087 | 0.739 | |
MAPE | 3.815 | 4.552 | 5.4158 | 7.4413 | 6.9646 | |
Delta | RMSE | 276491 | 308793 | 331704 | 403902 | 408285 |
GMRAE | 0.5018 | 0.5216 | 0.7141 | 0.8618 | 1.1068 | |
DS | 0.8261 | 0.7391 | 0.7261 | 0.6957 | 0.6522 | |
MAPE | 1.1369 | 1.2937 | 1.2956 | 5.8553 | 6.0091 | |
Southwest | RMSE | 150190 | 150777 | 151706 | 696873 | 632114 |
GMRAE | 0.208 | 0.251 | 0.2643 | 0.8126 | 1.1055 | |
DS | 0.936 | 0.927 | 0.913 | 0.695 | 0.522 | |
MAPE | 1.3901 | 1.9475 | 2.1505 | 6.5151 | 4.0872 | |
United | RMSE | 96132 | 106727 | 120961 | 393550 | 251327 |
GMRAE | 0.2091 | 0.2963 | 0.3811 | 1.1877 | 0.801 | |
DS | 0.9774 | 0.9565 | 0.913 | 0.608 | 0.913 | |
MAPE | 2.017 | 2.9321 | 3.8019 | 5.5213 | 5.6763 | |
easyJet | RMSE | 74361 | 81931 | 102859 | 175269 | 175404 |
GMRAE | 0.4401 | 0.5183 | 0.6872 | 0.526 | 0.525 | |
DS | 0.9107 | 0.8696 | 0.78261 | 0.6087 | 0.6075 | |
MAPE | 1.4078 | 1.8766 | 2.7149 | 4.1953 | 3.659 | |
Virgin | RMSE | 10191 | 11150 | 14150 | 22395 | 18455 |
GMRAE | 0.4109 | 0.4655 | 0.5412 | 0.6303 | 0.6515 | |
DS | 0.8991 | 0.8696 | 0.7826 | 0.6522 | 0.7391 |
Wilcoxon’s signed-rank test for EEMD-SVMs against the three individual models.
Metrics | EEMD- SVMs versus SVMs | EEMD- SVMs versus Holt-Winter | EEMD- SVMs versus ARIMA |
---|---|---|---|
MAPE | 0.0313* | 0.0313* | 0.0313* |
RMSE | 0.0313* | 0.0313* | 0.0313* |
GMRAE | 0.0313* | 0.0313* | 0.0313* |
DS | 0.0625 | 0.0313* | 0.0313* |
* Achieving 5% significance levels, respectively (2-tailed).
Wilcoxon’s signed-rank test for EEMD-Slope- SVMs against the counterparts.
Metrics | EEMD-Slope-SVMs versus EEMD-SVMs | EEMD-Slope- |
EEMD-Slope-SVMs versus Holt-Winter | EEMD-Slope-SVMs versus ARIMA |
---|---|---|---|---|
MAPE | 0.0313* | 0.0313* | 0.0313* | 0.0313* |
RMSE | 0.0313* | 0.0313* | 0.0313* | 0.0313* |
GMRAE | 0.0313* | 0.0313* | 0.0313* | 0.0313* |
DS | 0.0625 | 0.0625 | 0.0313* | 0.0313* |
* Achieving 5% significance levels, respectively (2-tailed).
Generally speaking, the goals of the experimental study are twofold. One is to examine how significant improvement can be achieved by using the hybrid decomposition and ensemble framework. The other is to examine if restraining the end effect by incorporating slope-based method into the EEMD-based SVM modeling framework can improve the performance further.
Focusing on the first goal, this is to say, by comparing the forecasting performances between hybrid EEMD-based models and individual models, two conclusions can be drawn. Note that the most significant improvement is witnessed while comparing the EEMD-SVMs and individual SVMs. For example, the average MAPE of EEMD-SVMs on six data series is 2.406, while individual SVMs is 2.913. And so do the GMRAE and DS. It should be noted that RMSE is an absolute measure and averaging it makes no sense for comparison. These results indicate that EEMD can facilitate the modeling for forecasting by decomposing the original complex data series into several simple time series. Furthermore, the individual SVMs outperform than other individual methodologies, indicating it as a promising alternative for individual modeling tasks. The Wilcoxon’s signed-rank tests for EEMD-SVMs against the three individual models also statistically support the promising performance of the EEMD-SVMs with
As for the comparison between EEMD-Slope-SVMs and EEMD-SVMs to examine the corresponding improvement of restraining the end effect, the experimental results indicate that the proposed EEMD-Slope-SVMs outperform the EEMD-SVMs and the rest three individual models in all cases across all the four metrics. The Wilcoxon’s signed-rank tests for EEMD-Slope-SVMs against the counterparts statistically support the promising performance of EEMD-Slope-SVM approach with
Due to the complex and dynamic pattern with nonlinearity and nonstationarity as well as implicit seasonality, air passenger traffic forecasting still remains as one the most challenging task in the field of air transportation management. This study steps on the way to establish hybrid learning framework for time series modeling and forecasting and contributes to examine the EEMD-based SVM modeling framework with slope-based method through extensive experiments.
Generally speaking, in terms of the experimental results presented in this study, we can draw the following conclusions. (1) EEMD-based SVM modeling frameworks achieve better than the individual models. (2) The proposed EEMD-Slope-SVM modeling framework outperforms EEMD-SVMs and the rest three individual models achieve the best performance. This indicates that restraining the end effect occurring during the shifting process of EEMD can be helpful to improve the prediction performance further.
This study also has limitation in the selection of the methods to restrain the end effect. There are several other methods in the literature and only slope-based method is examined in this study. More extensive studies on the other methods should be conducted and it remains as a future research topic.
This work was supported by the Natural Science Foundation of China under Project No. 70771042, the Fundamental Research Funds for the Central Universities (2012QN208-HUST), and a Grant from the Modern Information Management Research Center at Huazhong University of Science and Technology.