Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.


Introduction
Crude oil prices do play significant role in the global economy and constitute an important factor affecting government's plans and commercial sectors. Forecasting crude oil price is among the most important issues facing energy economists. Therefore, proactive knowledge of its future fluctuations can lead to better decisions in several managerial levels.
The literature dealing with forecasting crude oil is substantial. The application of the classical time series models such as autoregressive moving average (ARMA) (Yu et al. [1], Mohammadi and Su [2], and Ahmad [3]) and econometric model such as generalized autoregressive conditional heteroscedasticity (GARCH) type models (Agnolucci [4], Wei et al. [5], Liu and Wan [6]) for crude oil forecasting has received much attention in the last decade. But because the crude oil price has the volatility, nonlinearity, and irregularity, the classical and econometric model can lead to the decrease of the accuracy.
Due to the limitations of the classical and econometric models, soft-computing models, such as neural fuzzy (Ghaffari and Zare [7]), artificial neural networks (Kaboudan [8], Mirmirani and Li [9], Shambora and Rossiter [10], and Yu et al. [11]), support vector machines (Xie et al. [12]), and genetic programming (GP), provide powerful solutions to nonlinear crude oil price prediction. Many experiments found that the soft-computing models often had some advantages over statistical-based models. However, these AI models also have their own shortcomings and disadvantages. For example, ANN often suffers from local minima and over-fitting, while other soft-computing models, such as SVM and GP, including ANN, are sensitive to parameter selection [1].
To remedy the above shortcomings, some hybrid methods have been used recently to predict crude oil price and obtain the best performances. In last year, wavelet transform has become a useful method for analyzing such as variations, periodicities, and trends in time series. The hybrid models with wavelet transform processes have been improved for 2 The Scientific World Journal forecasting. For example wavelet-neural network (Jammazi and Aloui [13], Qunli et al. [14], and Yousefi et al. [15]), wavelet-least square support vector machines (LSVM) (Bao et al. [16]), and wavelet-fuzzy neural network (Liu et al. [17]) have been employed recently on some studies in crude oil forecasting. They observed that the wavelet transform fairly improves forecasting accuracy.
A major drawback of wavelet transform for direction prediction is that the input variables lie in a high-dimensional feature space depends on the number of sub-time series components. Because the number of sub-time series components for wavelet is inadvisable to be too many, in this study principal component analysis (PCA) is proposed to reduce the dimensions of sub-time series components.
The multiple linear regressions (MLR) model that is much easier to interpret is considered as an alternative to ANN model. In this paper, a hybrid wavelet multiple linear regression (WMLR) model integrating wavelet and MLR is proposed for short-term daily crude oil price forecasting. The study applies particle swarm optimization (PSO) to adopt the optimal parameters to construct the MLR model. For verification purpose, the West Texas Intermediate (WTI) crude oil sport price is used to test the effectiveness of the proposed WMLR ensemble learning methodology. Finally to evaluate the model ability, the proposed model was compared with individual ARIMA and GARCH models.

Methodology
2.1. The ARIMA Model. The most comprehensive of all popular and widely known statistical methods used for time series forecasting are Box-Jenkins models (Box and Jenkins [18]). It has achieved great success in both academic research and industrial applications during the last three decades. The general form of ARIMA models can be expressed as where is the order of the autoregressive, is the order of the moving average, and is the random error. The Box-Jenkins methodology is basically divided into four steps: identification, estimation, diagnostic checking, and forecasting.

The GARCH
Model. GARCH models have found extensive application in the literature and the most popular volatility model is GARCH (1, 1) model proposed by Bollerslev [19]. The standard GARCH (1, 1) can be described as follows: = + = + ℎ 1/2 ∼ (0, 1) , where denote the conditional mean and ℎ is the conditional variances and is a standardized error and = ln( / −1 ) is log return.

Multiple Linear Regressions.
Multiple linear regressions (MLR) model is one of the modelling techniques to investigate the relationship between a dependent variable and several independent variables. Let the MLR have independent variables with observations. Thus the MLR can be written as where are regression coefficients, is dependent variable, are independent varaiables and is fitting errors. The method of least squares is generally used to estimate the coefficients model. In many applications, the results of a least squares fit are often unacceptable when the model is wrong or when the model is misspecified (Bozdogan and Howe [20]). In this study, particle swarm optimization (PSO) method is presented to determine the optimal parameters of the MLR model. The PSO methods have proven to be very effective in solving a variety of difficult global optimization problems in forecasting (Chen and Kao [21] and Alwee et al. [22]), heat problem (Ma et al. [23] and Tyagi and Pandit [24]), and dynamic environments (Liu et al. [25]).
The classic solution of MLR model involves the minimization of the sum of the square errors between the modelpredicted value and the corresponding data value: where is the number of training data samples, is the actual value, and̂is the forecasted value of train data. The same methodology was used to solve this problem using PSO algorithms. The solution with a smaller fitness ( ) of the training data set has a better chance of surviving in the successive generations.

Particle Swarm Optimization.
Particle swarm optimization (PSO) is a population-based heuristic method inspired by the collective motion of biological organisms, such as bird flocking and fish schooling, to simulate the seeking behavior to a food source (Bratton and Kennedy [26]). The population of PSO is called a swarm and each individual in the population of PSO is called a particle. The PSO begins with a random population and searchers for fitness optimum just like genetic algorithm (GA). To find the optimum solution, each particle adjusts the direction through the best experience which it has found ( best ) and the best experience that has been found by all other members ( best ). Therefore, the particles fly around in a multidimensional space towards the better area over the search process.
Each particle consists of three vectors: the position for th individual particle can be denoted as = ( (1) , (2) , . . . , ( ) ), the best previous position best that the th particle has searched is = ( (1) , (2) , . . . , ( ) ), and the fly velocity of the th is = (V (1) , V (2) , . . . , V ( ) ). The performance of each particle is measured using a fitness The Scientific World Journal 3 function varying from problem in hand. During the iterative procedure, the th particle at iteration is updated by where is called inertia weight, 1 and 2 are acceleration constants, and 1 and 2 are stochastic value of [0, 1]. In a PSO system, particles change their positions at each time step until a relatively unchanging position has been encountered or a maximum number of iterations have been met. In general, the performance of each particle is measured according to a fitness function, which is problem dependent. In MLR model, (4) is the fitness function under consideration. Figure 1 shows the flowchart of the developed PSO algorithm. For further details regarding PSO, please refer to Kennedy and Eberhart [27] and Bratton and Kennedy [26].

Wavelet Analysis.
Wavelet transformations provide useful decomposition of original time series by capturing useful information on various decomposition levels. Discrete wavelet transformation (DWT) is preferred in most of the forecasting problems because of its simplicity and ability to compute with less time. The DWT can be defined as where and are integers that control the scale and time. The most common choices for the parameters 0 = 2 and 0 = 1. ( ) called the mother wavelet can be defined as ∫ For a discrete time series ( ) where ( ) occurs at discrete time , the DWT becomes where , is the wavelet coefficient for the discrete wavelet at scale = 2 and = 2 . According to Mallat's theory, the original discrete time series ( ) can be decomposed into a series of linearity independent approximation and detail signals by using the inverse DWT. The inverse DWT is given by (Mallat [28]) or in a simple format as where ( ) is called approximation subseries or residual term at levels and ( ) ( = 1, 2, . . . , ) are detail subseries which can capture small features of interpretational value in the data.

Principal Component Analysis.
In an MLR, one of main tasks is to determine the model input variables that affect the output variables significantly. The choice of input variables is generally based on a priori knowledge of causal variables, inspections of time series plots, and statistical analysis of potential inputs and outputs. PCA is a technique widely used for reducing the number of input variables when we have huge volume of information and we want to have a better interpretation of variables (Ç amdevýren et al. [29]).
The PCA approach introduces a few combinations for model input in comparison with the trial and error process. Given a set of centred input vectors 1 , 2 , . . . , and ∑ =1 = 0, usually < . Then the covariance matrix of vector is given by The principal components (PCs) are computed by solving the eigenvalue problem of covariance matrix , where is one of the eigenvalues of and is the corresponding eigenvector. Based on the estimated , the components of ( ) are then calculated as the orthogonal transforms of : The new components, ( ), are called principal components. By using only the first several eigenvectors sorted in descending order of the eigenvalues, the number of principal components in can be reduced. So PCA has the dimensional reduction characteristic. The principal components of PCA have the following properties: ( ) are linear combinations of the original variables, uncorrelated and have sequentially 4 The Scientific World Journal maximum variances (Jolliffe [30]). The calculation variance contribution rate is The cumulative variance contribution rate is The number of the selected principal components is based on the cumulative variance contribution rate, which as a rule is over 85∼90.

An Application.
In this study, the West Texas Intermediate (WTI) crude oil price series was chosen as experimental sample. The main reason of selecting the WTI crude oil is that these crude oil prices are the most famous benchmark prices, which are widely used as the basis of many crude oil price formulae. The daily data from January 1, 1986, to September 30, 2006, excluding public holidays, with a total of 5237 was employed as experimental data. For convenience of WMLR modeling, the data from January 1, 1986, to December 31, 2000, is used for the training set (3800 observations), and the remainder is used as the testing set (1437 observations). Figure 2 shows the daily crude oil prices from January 1, 1986, to September 30. In practice, short-term forecasting results are more useful as they provide timely information for the correction of forecasting value. In this study, three main performance criteria are used to evaluate the accuracy of the models. These criteria are mean absolute error (MAE), root mean squared error (RMSE), and stat . The MAE and RMSE can be defined by In crude oil price forecasting, improved decisions usually depend on correct forecasting of directions, of actual price, and forecasted price,̂. The ability to predict movement direction can be measured by a directional statistic ( stat ) (Yu et al., [1]), which can be expressed as

Application and Result.
At first, the MLR model without data preprocessing was used to model daily oil prices. In the next step, the preprocessed data which uses subtime series components obtained using discrete wavelet transform (DWT) on original data were entered to the MLR model in order to improve the model accuracy. For the MLR model, the original log return time series are decomposed into a certain number of subtime series components. Deciding the optimal decomposition level of the time series data in wavelet analysis plays an important role in preserving the information and reducing the distortion of the datasets. However, there is no existing theory to tell how many decomposition levels are needed for any time series.
In the present study, the previous log return of daily oil price time series is decomposed into various subtime series (DWs) at different decomposition levels by using DWT to estimate current price value. Three decomposition levels (2, 4, and 8 months) were considered for this study. For the WTI series data, time series of 2-day mode (DW1), 4-day mode (DW2) and 8-day mode (DW3), and approximate mode are presented in Figure 3.
For the WTI series, six input combinations based on previous log return of daily oil prices are evaluated to estimate current prices value. The input combinations evaluated in the study are (i) −  the total variation were considered as the number of new variables. For example, taking two previous daily oil prices as a random variable. Every previous daily oil price time series are decomposed using DWT into three decomposition levels, respectively. Thus there were 8 subseries considered for the PCA analysis. The result of PCA analysis is shown in Table 1. Table 1 shows that the first four principle components can explain 84% variation of the data variation with the eigenvalues greater than 1 to be retained, in which all the 4 PCs were included in the MLR model. Thus the 8 original variables can be replaced by 4 new irrelevant variables. For training MLR, the PSO algorithm solving the recognition problem is implemented and the program code including wavelet toolbox was written in MATLAB language. The WMLR model structure developed in present study is shown in Figure 4.
The forecasting performances of the MLR and WMLR models in terms of the MAE, RMSE, and stat testing phase are compared and shown in Table 2. Table 2 where ( ) are called principal components and̂= For further analysis, the best performance of the LR, WMLR, ARIMA, and ARIMA-GARCH models was compared with the best results of ARIMA and forward neural network (FNN) studied by Yu et al. [1]. In Table 3, it shows that WMLR outperform MLR, ARIMA, GARCH, Yu' ARIMA and Yu' FNN models in terms of RMSE statistics. This results show that the new series (DWT) have significant extremely positive effect on MLR model results. Figure 5 shows the Box-plot for the ARIMA, ARIMA-GARCH, MLR, and WMLR models for testing period. It can be seen that the errors of WMLR model are quite close to the zero. Overall, it can be concluded that the WMLR model provided more accurate forecasting results than the other models for crude oil forecasting.

Conclusions
The accuracy of the wavelet multiple linear regression (WMLR) technique in the forecasting daily crude oil has been investigated in this study. The PCA is used to choose the principle component scores of the selected inputs which were used as independent variables in the MLR model and the particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. The performance of the proposed WMLR model was compared to regular LR, ARIMA, and GARCH model for crude oil forecasting. Comparison results indicated that the WMLR model was substantially more accurate than the other models. The study concludes that the forecasting abilities of the MLR model are found to be improved when the wavelet transformation technique is adopted for the data preprocessing. The decomposed periodic components obtained from the DWT technique are found to be most effective in yielding accurate forecast when used as inputs in the MLR model. The accurate forecasting results The Scientific World Journal 7 indicate that WMLR model provides a superior alternative to other models and a potentially very useful new method for crude oil forecasting. The WMLR model presented in this study is a simple explicit mathematical formulation. The WMLR model is much simpler in contrast to ANN model and can be successfully used in modeling short-term crude oil price. In the present study, three resolution levels were employed for decomposing crude oil time series. If more resolution levels were used, the results from WMLR model may turn out better. This may be a subject of another study.