Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Crude oil prices do play significant role in the global economy and constitute an important factor affecting government’s plans and commercial sectors. Forecasting crude oil price is among the most important issues facing energy economists. Therefore, proactive knowledge of its future fluctuations can lead to better decisions in several managerial levels.
The literature dealing with forecasting crude oil is substantial. The application of the classical time series models such as autoregressive moving average (ARMA) (Yu et al. [
Due to the limitations of the classical and econometric models, softcomputing models, such as neural fuzzy (Ghaffari and Zare [
To remedy the above shortcomings, some hybrid methods have been used recently to predict crude oil price and obtain the best performances. In last year, wavelet transform has become a useful method for analyzing such as variations, periodicities, and trends in time series. The hybrid models with wavelet transform processes have been improved for forecasting. For example waveletneural network (Jammazi and Aloui [
A major drawback of wavelet transform for direction prediction is that the input variables lie in a highdimensional feature space depends on the number of subtime series components. Because the number of subtime series components for wavelet is inadvisable to be too many, in this study principal component analysis (PCA) is proposed to reduce the dimensions of subtime series components.
The multiple linear regressions (MLR) model that is much easier to interpret is considered as an alternative to ANN model. In this paper, a hybrid wavelet multiple linear regression (WMLR) model integrating wavelet and MLR is proposed for shortterm daily crude oil price forecasting. The study applies particle swarm optimization (PSO) to adopt the optimal parameters to construct the MLR model. For verification purpose, the West Texas Intermediate (WTI) crude oil sport price is used to test the effectiveness of the proposed WMLR ensemble learning methodology. Finally to evaluate the model ability, the proposed model was compared with individual ARIMA and GARCH models.
The most comprehensive of all popular and widely known statistical methods used for time series forecasting are BoxJenkins models (Box and Jenkins [
GARCH models have found extensive application in the literature and the most popular volatility model is GARCH (
Multiple linear regressions (MLR) model is one of the modelling techniques to investigate the relationship between a dependent variable and several independent variables. Let the MLR have
In this study, particle swarm optimization (PSO) method is presented to determine the optimal parameters of the MLR model. The PSO methods have proven to be very effective in solving a variety of difficult global optimization problems in forecasting (Chen and Kao [
The classic solution of MLR model involves the minimization of the sum of the square errors between the modelpredicted value and the corresponding data value:
Particle swarm optimization (PSO) is a populationbased heuristic method inspired by the collective motion of biological organisms, such as bird flocking and fish schooling, to simulate the seeking behavior to a food source (Bratton and Kennedy [
Each particle consists of three vectors: the position for
Flowchart of PSO algorithm.
Wavelet transformations provide useful decomposition of original time series by capturing useful information on various decomposition levels. Discrete wavelet transformation (DWT) is preferred in most of the forecasting problems because of its simplicity and ability to compute with less time. The DWT can be defined as
For a discrete time series
In an MLR, one of main tasks is to determine the model input variables that affect the output variables significantly. The choice of input variables is generally based on a priori knowledge of causal variables, inspections of time series plots, and statistical analysis of potential inputs and outputs. PCA is a technique widely used for reducing the number of input variables when we have huge volume of information and we want to have a better interpretation of variables (Çamdevýren et al. [
The PCA approach introduces a few combinations for model input in comparison with the trial and error process. Given a set of centred input vectors
In this study, the West Texas Intermediate (WTI) crude oil price series was chosen as experimental sample. The main reason of selecting the WTI crude oil is that these crude oil prices are the most famous benchmark prices, which are widely used as the basis of many crude oil price formulae. The daily data from January 1, 1986, to September 30, 2006, excluding public holidays, with a total of 5237 was employed as experimental data. For convenience of WMLR modeling, the data from January 1, 1986, to December 31, 2000, is used for the training set (3800 observations), and the remainder is used as the testing set (1437 observations). Figure
Daily crude oil prices from January 1, 198, to September 30, 2006.
In practice, shortterm forecasting results are more useful as they provide timely information for the correction of forecasting value. In this study, three main performance criteria are used to evaluate the accuracy of the models. These criteria are mean absolute error (MAE), root mean squared error (RMSE), and
At first, the MLR model without data preprocessing was used to model daily oil prices. In the next step, the preprocessed data which uses subtime series components obtained using discrete wavelet transform (DWT) on original data were entered to the MLR model in order to improve the model accuracy. For the MLR model, the original log return time series are decomposed into a certain number of subtime series components. Deciding the optimal decomposition level of the time series data in wavelet analysis plays an important role in preserving the information and reducing the distortion of the datasets. However, there is no existing theory to tell how many decomposition levels are needed for any time series.
In the present study, the previous log return of daily oil price time series is decomposed into various subtime series (DWs) at different decomposition levels by using DWT to estimate current price value. Three decomposition levels (2, 4, and 8 months) were considered for this study. For the WTI series data, time series of 2day mode (DW1), 4day mode (DW2) and 8day mode (DW3), and approximate mode are presented in Figure
Decomposed wavelet subtime series components (Ds) of WTI crude oil price data.
For the WTI series, six input combinations based on previous log return of daily oil prices are evaluated to estimate current prices value. The input combinations evaluated in the study are (i)
Each of DWs series plays distinct role in original time series and has different effects on the original prices oil series. The selection of dominant DWs as inputs of MLR model becomes important and effective on the output data and has positive effect excessively on model’s ability. The model becomes exponentially more complex as the number of subtime series as input variables increases. Using a large number of input variables should be avoided to save time and calculation effort. Therefore, the effectiveness of new series obtained by PCA is used as input to the MLR model. The PCA approach helps us to reduce the number of original variables to a set of new variables. Generally, the objective of PCA is to identify a new set of variables such that each variable, called a principal component, is a linear combination of the original variables. The new set of variables accounts for 85%−90% of the total variation were considered as the number of new variables.
For example, taking two previous daily oil prices as a random variable. Every previous daily oil price time series are decomposed using DWT into three decomposition levels, respectively. Thus there were 8 subseries considered for the PCA analysis. The result of PCA analysis is shown in Table
Eigen value and cumulative variance contribution rate of the 8 principal components.
PC  1  2  3  4  5  6  7  8 

Eigen value  1.97  1.79  1.59  1.33  0.67  0.41  0.21  0.03 
Cumulative Variance Rate  0.25  0.47  0.67  0.84  0.92  0.97  1.00  1.00 
The structure of the WMLR model.
The forecasting performances of the MLR and WMLR models in terms of the MAE, RMSE, and
Forecasting performance indices of MLR and WLR.
Model Input  Lag  MLR  WMLR  

MAE  RMSE 

MAE  RMSE 
 
M1  1 

0.9514  0.4788  0.6660  0.9001  0.5198 
M2  1, 2  0.6972  0.9517  0.4781  0.6448  0.8842  0.5003 
M3  1, 2, 3  0.6985  0.9545  0.4816  0.5345  0.7505  0.5797 
M4  1, 2, 3, 4  0.6979  0.9550  0.4753 



M5  1, 2, 3, 4, 5  0.6976  0.9545 

0.5770  0.8046  0.5734 
M6  1, 2, 3, 4, 5, 6  0.6969 

0.4850  0.5385  0.7389  0.6444 
For further analysis, the best performance of the LR, WMLR, ARIMA, and ARIMAGARCH models was compared with the best results of ARIMA and forward neural network (FNN) studied by Yu et al. [
The RMSE and MAE comparisons for different models.
Model  RMSE  MAE 

ARIMA (2, 1, 5)  1.3835  1.0207 


GARCH (1, 1)  0.9513  0.6947 
MLR  0.9450  0.6969 
WMLR 


Yu’ ARIMA (Yu et al., [ 
2.0350  — 
Yu’ FNN (Yu et al., [ 
0.8410  — 
Figure
The errors of MLR, WMLR, ARIMA, and GARCH models for crude oil price forecasting.
The accuracy of the wavelet multiple linear regression (WMLR) technique in the forecasting daily crude oil has been investigated in this study. The PCA is used to choose the principle component scores of the selected inputs which were used as independent variables in the MLR model and the particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. The performance of the proposed WMLR model was compared to regular LR, ARIMA, and GARCH model for crude oil forecasting. Comparison results indicated that the WMLR model was substantially more accurate than the other models. The study concludes that the forecasting abilities of the MLR model are found to be improved when the wavelet transformation technique is adopted for the data preprocessing. The decomposed periodic components obtained from the DWT technique are found to be most effective in yielding accurate forecast when used as inputs in the MLR model. The accurate forecasting results indicate that WMLR model provides a superior alternative to other models and a potentially very useful new method for crude oil forecasting. The WMLR model presented in this study is a simple explicit mathematical formulation. The WMLR model is much simpler in contrast to ANN model and can be successfully used in modeling shortterm crude oil price. In the present study, three resolution levels were employed for decomposing crude oil time series. If more resolution levels were used, the results from WMLR model may turn out better. This may be a subject of another study.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors thankfully acknowledged the financial support that was afforded by Universiti Teknologi Malaysia under GUP Grant (VOT 06J13). Besides that, the authors would like to thank the Department of Irrigation and Drainage, Ministry of natural Resources and Environment, Malaysia, in helping us to provide the data.