Daily Crude Oil Price Forecasting Using Hybridizing Wavelet and Artificial Neural Network Model

A new method based on integrating discrete wavelet transform and artificial neural networks (WANN) model for daily crude oil price forecasting is proposed. The discrete Mallat wavelet transform is used to decompose the crude price series into one approximation series and some details series (DS). The new series obtained by adding the effective one approximation series and DS component is then used as input into the ANN model to forecast crude oil price. The relative performance of WANN model was compared to regular ANNmodel for crude oil forecasting at lead times of 1 day for two main crude oil price series, West Texas Intermediate (WTI) and Brent crude oil spot prices. In both cases, WANN model was found to provide more accurate crude oil prices forecasts than individual ANN model.


Introduction
Crude oil prices fluctuations are of significant interest to both financial practitioners and market participants.However, crude oil price is one of the most complex and difficult to model because fluctuation of the crude oil price is rather irregular, nonlinear, nonstationary, and with high volatility.Thus, accurate forecasting of the crude oil price time series is one of the greatest challenges and among the most important issues facing energy economists towards better decisions in several managerial levels.For this reason, many researchers have devoted considerable effort to the development of different types of models for crude oil price forecasting.
The application of the classical time series models such as autoregressive moving average (ARMA) model (Mohammadi and Su [1], Ahmad [2], Wang et al. [3], and Xie et al. [4]) and generalized autoregressive conditional heteroscedastic (GARCH) type models (Morana [5], Sadorsky [6], and Agnolucci [7]) for crude oil price forecasting has received much attention in the last decade.However, the above models can provide good prediction results, when the price series under study is basically linear or near linear, and have a limited ability to capture nonlinearity and nonstationary in crude oil prices data.
Artificial neural network (ANN) techniques have shown great ability in modeling and forecasting nonlinear and complex time series.ANN offers an effective approach for handling large amounts of dynamic, nonlinear, and noise data.Numerous papers have already presented successful application of ANN for modeling and forecasting the crude oil price series (Lackes et al. [8], Khazem and Mazouz [9], Mirmirani and Li [10], Kulkarni and Haidar [11], Yu et al. [12], and Hu et al. [13]) as well as for forecasting real world time series.Their experimental results show that the performance of ANN is superior to various traditional statistical models.ANN has an ability to learn complex and nonlinear time series that is difficult to model with conventional models.However, there are some disadvantages of ANN.Although ANN has advantages of accurate forecasting, their performance in some specific situation is inconsistent (Khashei and Bijari [14]).ANN also often suffers from local minima and overfitting, and the network structure of this model is difficult to determine and it is usually determined by using a trialand-error approach (Kis ¸i [15]).In addition, ANN model has a limitation with nonstationary data and may not be able to handle nonstationary data if preprocessing of the input data is not done (Adamowski and Sun [16]).
In the last decade, wavelet transforms have been investigated in diverse fields, such as in economics [18]; business [19], and hydrology [20].Wavelet transforms have gained very high attention and have been found to be very effective with nonstationary time series data.Recently, different hybrid models based on wavelet transform have been improved in the field of oil price forecasting.For example, Qunli et al. [21] proposed a hybrid model based on wavelet transform and radial basis function (RBF) neural network to forecast the future oil price.They found that wavelet decomposes the original price sequence successfully as the input layer of the network RBF neural network model.Bao et al. [22] developed a hybrid model integrating wavelet and least squares support vector machines (LSSVM) for crude oil forecasting.Youesfi et al. [23] apply a wavelet methodology to decompose the crude oil price and extended them directly to make forecasts.de Souza e Silva et al. [24] apply wavelet analysis to omit high frequency noises and then use hidden Markov model to predict future price movement.He et al. [25] proposed a wavelet decomposed ensemble model to incorporate the Heterogeneous Market Hypothesis into the modeling process.However, in the crude oil field, a hybrid wavelet and ANN model have received very little attention and there are only a few applications of this model to forecast crude oil price series.Shambora and Rossitier [26] developed a hybrid wavelet and ANN model in crude oil forecasting.Jammazi and Aloui [27] explored the Haar A Trous wavelet and backpropagation neural network algorithm to develop a crude oil price forecasting model.Mingming and Jinliang [28] constructed a multiple wavelet recurrent neural network simulation model to analyze crude oil prices.Wavelet analysis was used to catch multiscale data characteristics, and recurrent neural network model was used to simulate crude oil prices.The simulation results showed that the model has high prediction accuracy.
The main contribution of this paper is to propose a novel hybrid integrating wavelet transform and ANN model for crude oil forecasting.In order to catch this purpose, the daily West Texas Intermediate (WTI) and Brent crude oil spot price were decomposed into subseries at different scale by Mallat algorithm.Then, effective subseries were summed together and then used as inputs into the ANN model for crude oil forecasting.Finally to evaluate the model ability, the proposed model was compared with individual ANN model.

Artificial Neural Network
An ANN is flexible computing that has been extensively studied and used for time series forecasting in many areas of science and engineering since the early 1990s.ANN is a mathematical model which has a highly connected structure similar to brain cells.An ANN model usually consists of three layers: the first layer is the input layer where the data are introduced to the network, the second layer is the hidden layer where data are processed, and the last layer is the output  layer where the results of given input are produced.Figure 1 illustrates the architecture of the proposed ANN for crude oil price forecasting.The relationship between the output (  ) and the input ( −1 ,  −2 , . . .,  − ) is given by [29] where   ( = 0, 1, 2, . . ., ) and   ( = 0, 1, 2, . . ., ;  = 0, 1, 2, . . ., ) are the model parameters,  is the number of input nodes,  is the number of hidden nodes, and (⋅) is the transfer function.
Generally, ANN may have different transfer functions for different neurons in the same or different layers.The majority of research uses hyperbolic tangent sigmoid function for hidden neurons and there is no consensus on which transfer function should be used for output neurons [30].The hyperbolic tangent sigmoid is often used as the hidden layer transfer function, given by and one unit output layer is a pure linear transfer function, given by where  is the weighted input of the hidden layer, () is hyperbolic tangent sigmoid transfer function for hidden layer,   is the weighted input of the output layer, () is pure linear transfer function for output layer.
The most popular neural network training method used is the backpropagation (BP) algorithm which is essentially a gradient steepest descent method introduced by Rumelhart et al. [31].This algorithm suffers the problems of slow convergence, inefficiency, and lack of robustness [32].Furthermore, it can be very sensitive to the choice of the learning rate.It is hard to choose a proper learning rate, because a low-learning rate may result in long training times, and a high learning rate may cause system instability [33].
To overcome the weakness of BP algorithm, many researchers have investigated the use of Levenberg-Marquardt (LM) algorithm.The LM algorithm is one of the most popular and more efficient nonlinear optimisation methods.It is a variation of Newton's method using Hessianbased algorithm for nonlinear least squares optimization which is used in most optimisation packages.The LM algorithm takes large steps down the gradient where the gradient is small, such as near local minima, and takes small steps when the gradient is large.Its faster convergence, robustness, and the ability to find good local minima make it attractive in ANN training.This optimisation technique is more powerful than the conventional gradient descent technique [34][35][36].

Wavelet Transform
The wavelet transforms is a strong mathematical tool that provides a time-frequency representation of an analyzed signal in the time domin.This overcomes the basic shortcoming of Fourier analysis, which is that the Fourier spectrum contains only globally averaged information.Wavelet transformations provide useful decomposition of the original time series by capturing useful information on various decomposition levels.Wavelet transform can be divided by two categories: continuous wavelet transforms (CWT) and discrete wavelet transforms (DWT).The CWT of the time series () is given by where ( * ) indicates the conjugate complex function,  stands for a time,  stands for the time step,  ∈ [0,∞] stands for the wavelet scale, and () is the transforming function and is called the mother wavelet.The term wavelet means small wave, where small refers to the condition that the function is of a finite length.The wave refers to the condition that the function is of a finite length.Several families of wavelets () that have been proven to be useful for various applications are described in related references (Mallat [37]).
The wavelets are chosen based on their shape and their ability to analyse the time series in a particular application.Typical wavelet families such as Haar, Daubechies, Coiflet, Symlet, Meyer, Morlet, and the Mexican Hat can be used as a mother wavelet.The CWT is not often used for forecasting due to its computationally complex and time requirements to compute [38].Instead, successive wavelet is often discrete in forecasting applications to simplify the numeric solutions.DWT requires less computation time and is simpler to apply.DWT is given by where  and  are integers that control the scale and time.
The most common and simplest choice for parameters are  0 = 2 and  = 1.For a discrete time series (), the DWT becomes where  ,  is the wavelet coefficient for the discrete wavelet at scale  = 2  and  = 2  .According to the Mallat's theory, the original discrete time series () can be decomposed into a series of linearly independent approximation and detail signals by using the inverse DWT.The inverse DWT is given by or in a simple format as in which   () is called approximation subseries or residual term at levels  and   ()( = 1, 2, . . ., ) is detailed subseries which can capture small features of interpretational value in the data.

Application
In this study, two main crude oil prices, Brent crude oil spot price and West Texas Intermediate (WTI) crude oil spot price (in US dollars per barrel) series, were chosen as the experimental sample.The main reason for selecting these crude oil spot price indicators is that these two crude oil prices are the most famous benchmark prices, which are widely used as the basis of many crude oil price formulae [12,17] While, for WTI data, the daily data from January 1, 1986, to September 30, 2006, excluding public holidays, with a total of 5237 were employed as experimental data, for convenience of WANN modeling, the data from January 1, 1986, to December 31, 2000, are used for the training set (3800 observations), and the remainder is used as the testing set (1437 observations).The graphical representation of the price for Brent and WTI crude oil price is given in Figure 2.
In practice, short-term forecasting results are more useful as they provide timely information for the correction of forecasting value.In this study, two performance criteria such as RMSE and MAPE were used to evaluate the accuracy of the models.These criteria are given below: where BIAS = (  − ŷ )/  ,   is the actual and ŷ is the forecasted value of period , and  is the number of total observations.The criteria to judge for the best model are relatively small RMSE and MAPE found in the training and testing of the data.

Fitting ANN Model to the Data
At first, the multilayer perceptron (MLP) feedforward ANN model without any data preprocessing was used to model the crude oil price series.One of the most important steps in the model development process of the ANN model is the determination of significant input variables.There are no fixed rules for the selection of input variables for developing ANN model, even though a general framework can be followed based on previous successful application in crude oil prices problems [39].For Brent and WTI data, six input combinations based on previous log return of daily oil prices are evaluated to estimate current prices value.The input combinations evaluated in the study are (i) ANN model used in this study is the standard three-layer feedforward network.For improving the network modeling and reducing the chance of being trapped in a local minima, data normalization is applied in ANN input and output.Due to the crude oil prices series for both data collected are nonstationary (see Figure 2); to improve the accuracy of the training and testing, each data point in the input neurons had to be preprocessed by normalizing and transformed within the range of [−1, 1].There are no fixed rules as to which standardization approach should be used in particular circumstances.In this study, normalization for each raw input and output data set was calculated by taking log return of current oil prices as the following formula: where   and  −1 are current and one-period lagged prices.
To achieve optimal weights of ANN, LM algorithm provided by the MATLAB neural network toolbox is used to train the network.In this study, the hyperbolic tangent sigmoid function and a linear function are employed as activation functions for the hidden and output layers, respectively.The initial value of learning rate is set to 0.001 and a momentum coefficient of 0.9.The learning rate is increased by a factor of 10 and a decrease step of 0.1 until the change above results in a reduced performance value.The number of epochs that are used to train is set to 1000.The training of ANN will stop when the error achieves 10 −5 or when the number of epochs reaches 1000.
The hidden layer plays important roles in many successful applications of ANN.It has been proven that only one hidden layer is sufficient for ANN to approximate any complex nonlinear function with any desired accuracy.In the case of the popular one hidden layer networks, several practical numbers of neurons in the hidden layer were identified for better forecasting accuracy.The optimal number of nodes in the hidden layer was identified using several practical guidelines.Berry and Linoff [40] claimed that the number of hidden nodes should never be more than 2, where  is the number of inputs.Hecht-Nielsen [41] claimed that the number of hidden neuron is equal to 2 + 1.However, as far as the number of hidden neurons is concerned, there is currently no theory to determine how many nodes in the hidden layer are optimal.In the present study, the number of hidden nodes was progressively increased from 1 to 2 + 1.
A program code including the wavelet toolbox was written in MATLAB language for the development of the ANN model.The optimal complexity of ANN model, that is, the number of input and hidden nodes, was determined by a trial-and-error approach.Table 1 shows the best performance results of different ANN with different numbers of hidden neurons from 1 to 2 + 1 for both data series.The training set is used to estimate parameters for any specific ANN architecture.The testing set is then used to select the best ANN among all numbers of hidden neurons considered.
Table 1 shows the performance of ANN varying in accordance with the number of neurons in the hidden layer.It can be seen that the prediction errors (%MAPE) by ANN range from 1.60% to 2.28% for Brent data and from 1.67% to 1.80% for WTI data.For Brent data, the best architecture according RMSE and MAPE criterion for training data and testing data has 2 input layer neurons, 4 hidden layer neurons, and 1 output layer neuron (2-4-1).For this architecture, RMSE

Fitting Hybrid Wavelet-ANN Model to the Data
The hybrid wavelet and ANN (WANN) model is obtained by combining two methods, discrete wavelet transform (DWT) and ANN model.In WANN, the original crude oil price series was decomposed into a certain number of subtime series components which were entered to ANN in order to improve the model accuracy.In this study, the Daubechies wavelet, one of the most widely used wavelet families, is chosen as the wavelet function to decompose the original series (Eynard et al. [42]; Bagheri et al. [43]).When conducting wavelet analysis, the number of decomposition levels that are appropriate for the data must be chosen.To choose the number of decomposition level, the following formula is used [34]: where  is the level of decomposition and  is the number of time series data.According to this formula, the optimal number of decomposition levels for both crude oil price series data in this study would have been 3.For this purpose, firstly, the original oil price series is decomposed into 3 level components (D1, D2, and D3) that stand for different frequency components of the original series (details) and an approximation component (A3), as shown in Figures 3 and 4. Each of D's series plays a distinct role in the original time series and has different effects on the original crude price series.Instead of using each D's component individually as model input, employment of the added suitable D's component is more useful and can highly increase forecast performances of hybrid models.In this study, the effectiveness of wavelet components is determined using the coefficient of determination (R 2 ) between each D's subtime series and original data.The R 2 values are given in Table 2.It can be seen that R 2 of the D1 is zero for Brent data and 0.0291 for WTI data.However, the wavelet components D2 and D3 show significantly higher R 2 compared to the D1 for both series.According to the R 2 analyses, the effective components D2 and D3 were selected as the dominant wavelet components.Afterward, the significant wavelet components D2, D3 and approximation (A3) component were added to each other as ANN model input for crude oil forecast.Figure 6 shows the structure of the WANN model.Figure 5 shows the original crude oil prices and their Ds, that is, the time series of 2month mode (D1), 4-month mode (D2), 8-month mode (D3), approximate mode (A3), and the combinations of effective details and approximation components mode (DW = A2 + D2 + D3).Six different combinations of the new series input data are used in this study.WANN model was trained in the same way as the ANN model, with the exception that the inputs were the combinations of effective details and approximation components mode (DW) after the appropriate wavelet decomposition was selected.The input combinations for WANN model evaluated in the study are In all cases, the input and output are transformed into the log return of current oil prices.The selection of the optimal number of nodes in both the input and hidden layers was done in the same way as for the ANN model.The model architecture for WANN consists of 1 to 6 nodes in the input layer (), 1 to 2 + 1 nodes in the hidden layer, and one neuron in the output layer.The data was partitioned into training and testing sets in the same manner as the ANN model.
The performance measures of WANN for both crude oil data are shown in Table 3.According to Table 3, the best architecture of ANN according to RMSE and MAPE criterion is 4-4-1 for Brent data and 5-1-1 for WTI data.
For further analysis, the best performances are compared for the ANN and WANN models in terms of the RMSE and MAPE in the testing phase.The statistical results of different models are summarized in Table 4.
For Brent data, WANN improved ANN forecast about 17.7% and 20.1% reduction in RMSE and MAPE values, respectively.For WTI data, WANN obtained the best value of RMSE and MAPE during the testing phase which decreases by 20.1% and 22.3%, respectively, comparing with ANN.Generally, it can be observed that WANN model outperforms ANN for both data.Thus, the results indicate that WANN is able to obtain the best result in terms of different evaluation measures during the testing phase.
In order to further compare in terms of forecasting accuracy, Diebold-Mariano (DM) test statistic is used to test the statistical significance of different forecasting models [44].DM statistic can be defined as where   = ∑  =1 (  − ŷWANN, )  Input time series y t = f(y t−1 , y t−2 , . . ., y t−p ) Decompose input using DWT The effectiveness of DWT as input for ANN the proposed model performs the best in forecasting oil price in short term at least at the 99% confidence level.Now, DM test statistic provides stronger support for WANN model.Figure 6 shows the box plots of the BIAS values computed using ANN and WANN for Brent and WTI data.The picture produced consists of the most extreme values in the data set, the lower and upper quartiles, and the median.The criterion for selecting a suitable model is based on the minimum achieved by median BIAS value and dispersion in BIAS. Figure 6 shows that the performance of WANN is better than the ANN for Brent and WTI data.
Finally, in order to evaluate the efficiency of the proposed hybrid model, the obtained results were also compared with the results of ANN and ARIMA [12] and GARCH [17]   using the same data.The comparison has been summarized in Table 5.The results are tabulated in Table 5 in which the linear ARIMA and GARCH models were unable to completely model the complex nonlinearity and high irregularity of the crude oil prices.However, when ANN is used, the model could capture both nonlinear and complex features of the crude oil price series.ANN model performs much better than the ARIMA and GARCH models which demonstrate the existence and importance of nonlinear and nonstationary behavior in the crude oil price time series.However, ANN model is less efficient compared to proposed WANN model.It is observed that the proposed model yields better result than the other models for both crude oil price data.This result shows that the new input series from discrete wavelet transforms have significant extremely positive effect on ANN model results.

Conclusions
In this study the wavelet transform and the ANN model were combined in order to develop a hybrid model for modeling and forecasting the crude oil price series.In the first model, ANN model without any data preprocessing was used to model the crude oil price series.In the second proposed hybrid model, the wavelet transform can capture the multiscale features of a time series, which was used to decompose the crude oil price series.In this study, the new series obtained by adding the effective wavelet components was used as input to the ANN model to forecast the crude oil price at one time step ahead.It is recommended that the sum of effective details and the approximation component in modeling by ANN should be selected rather than using all subtime series components.Effective wavelet components are selected based on correlation between particular component series (approximation series and some details series) and crude oil price series.The performance of the proposed WANN model was compared to ANN model.The hybrid model showed

Figure 2 :
Figure 2: Daily Brent and WTI crude oil prices series.

Figure 5 :
Figure 5: The structure of the WANN model.

Figure 6 :
Figure 6: Comparison of ANN and WANN models for the testing results.

Table 1 :
Training and testing performance of different network architecture of ANN model.

Table 2 :
Coefficient of determination ( 2 ) between each of subtime series for Brent and WTI data.
2 − ∑  =1 (  − ŷANN, ) 2 ,  = (∑  =1   )/,   =  0 + 2 ∑ ∞ =1   , and   = cov(  ,  − ).ŷANN, and ŷWANN, are forecast values from ANN and WANN models, respectively.The null hypothesis of equal forecast accuracy is rejected if the test statistic is negative and statistically significant.Table 4 reports the estimate values of DM test statistic for testing performance of WANN compared with ANN.Table 4 shows that all DM values of the proposed WANN for Brent and WTI data are below −2.326 and the  values are far more smaller than 1%, indicating that

Table 3 :
Training and testing performance of different network architecture of WANN model.

Table 4 :
Forecasting performance indices of ANN and WANN.

Table 5 :
The RMSE and MAPE comparisons for different models.great improvement in crude oil price modeling and produced better forecasts than ANN model alone.The study concludes that the forecasting abilities of WANN model are found to be improved when the wavelet transformation technique is adopted for the data preprocessing.The decomposed periodic components obtained from DWT technique are found to be most effective in yielding accurate forecast when used as inputs in ANN model.The accurate forecasting results indicate that WANN model provides a superior alternative to other models and a potentially very useful new method for crude oil price forecasting. a