Forecasting Crude Oil Price with Multiscale Denoising Ensemble Model

1 School of Earth Science and Resources, Chang’an University, Xi’an 710054, China 2 School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China 3 International Business School, Shaanxi Normal University, Xi’an 710062, China 4Department of Management Sciences, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Kowloon, Hong Kong 5 College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China


Introduction
Given its role as one of the most important industry inputs, the accurate and reliable forecasting of the crude oil price movement has profound impacts throughout the national economy.The forecasting of crude oil price has developed through different stages, including structural approach, time data series approach, artificial intelligence (AI) approach, and other recent developments.Earlier attempts were mainly linear, parametric, structural, and economic theory based.They have been effective in the understanding and forecasting of the crude oil price movement over the medium to long term time horizon with acceptable approximation accuracy and computational efficiency, where the aggregated price behavior is comparably stable and stationary.These include models in the much simplified macroeconomic equilibrium framework to analyze the economic relationships among participants in the crude oil markets and derive analytic equations to model them.Typical models include the supply and demand equilibrium model and more recently the cointegration model, and so forth.For example, Chevillon and Rifflart [1] use cointegration OPEC related relationships in the market and develop a forecasting equation based on it [1].Ye et al. [2] incorporate further the cumulative excess production capacity as the ratcher variable into forecasting models and show that it results in improved performance [2].
Time series models became popular in recent years.It uses historical information to characterize and forecast movement of the series itself.RW and autoregressive moving average (ARMA) models are two representative time series models [3].Till now RW and ARMA still serve as important benchmark in crude oil price forecasting because there is no conclusive evidence that any econometric models consistently beat these two benchmark models [4].For example, Alquist and Kilian [4] find RW model to be by far the best models available [4].Meanwhile, another trend is the emergence of AI approach such as the artificial neural network (ANN) and support vector regression (SVR).For example, Yu et al. [5] find that ANN outperforms ARMA model but has room for further improvement using ensemble algorithms.Azadeh et al. [6] propose a flexible oil price forecasting algorithm based on ANN and fuzzy regression (FR), incorporating the economic indicators during the forecasting process.They have achieved improved forecasting performance [6].
Some recent development includes the wavelet analysis and ensemble algorithm.Wavelet analysis possesses the capability to project data into time-scale domain and conducts multiscale analysis [7].In crude oil forecasting literature, only limited efforts can be observed.Mostly these attempts use wavelet analysis to preprocess data before time series techniques and machine learning techniques are used to analyze the data and make forecasts.Yousefi et al. [8] use wavelet analysis to decompose crude oil price and extend them directly to make forecasts [8].Meanwhile, ensemble algorithm represents another interesting development.It attempts to incorporate partial information set captured by individual forecasters to produce more accurate forecasts [9].Traditionally linear based ensemble approaches have achieved only moderate level of accuracy, while the recently emerging nonlinear ensemble approaches based on intelligence techniques such as neural network have achieved much improved performance.For example, Xiong et al. [10] propose a revised hybrid model built upon empirical mode decomposition (EMD) based on the feed-forward neural network (FNN) modeling framework incorporating the slope based method (SBM) and have achieved improved prediction performance with accredited computational load [10].He et al. [11] propose a wavelet decomposed ensemble model to improve the forecasting accuracy of crude oil price with deeper understanding of the market microstructure [11].
This paper explores the potential of multiscale signal processing techniques, more specifically the denoising technique, in forecasting price movement in crude oil markets.The main research problem addressed is the empirical evidence of multiscale heterogeneous data characteristics distinguishable by sizes and the incorporation of this stylized fact in the construction of forecasting algorithm.More specifically, the wavelet denoising algorithms based on diverse wavelet families are used to construct the forecasting matrixes, whose members are individual forecasts based on different time series specifications capturing different data features.Wavelet denoising algorithm could serve as the potential source separation tool to analyze its underlying structure.Experiment results confirmed that the model specification and parameters vary with different choices of parameters for the wavelet denoising algorithm.The forecasting accuracy also improves when noises are filtered out during the modeling process.
The key findings and contributions of this paper are threefold.Theoretically this paper proposes the HMH theory to model the crude oil market and provides a more general framework that reveals the hidden data components based on wavelet analysis with different parameters and analyze them incorporating their heterogeneous specifications and parameters.Technically HMH has received less attention in extant literature, due to lack of methodology to test and model it.Wavelet denoising technique has been introduced as an important paradigm shift that could provide the empirical evidence for HMH and help gain further insights.Compared to other blind signal decomposition approaches such as morphological component analysis (MCA) and empirical model decomposition (EMD) methods.The wavelet analysis uses the theoretically well-defined wavelets to capture the matching distinguished data patterns.Results are more stable and offer better interpretations.Practically work in this paper distinguishes from the existing literature by introducing the nonlinear ensemble algorithm to reduce instability and biases introduced by different wavelet families.The performance of wavelet based algorithms is sensitive to the parameters set, as revealed in empirical research.This is a less addressed issue in the literature and could cast doubts on results obtained in the empirical research literature.The nonlinear ensemble algorithm could serve as a promising alternative to reduce instability and biases introduced by different wavelet families.This has significant practical implication that the proposed algorithm adapts better to the practical data and achieves better stability in the estimates.
The rest of the paper is organized as follows.In Section 2, we briefly review the relevant literature of researches on wavelet denoising algorithm as well as ICA-LSSVR based nonlinear ensemble methodology.In Section 3, we propose the integrated methodology WDN-ICA-LSSVR.Major findings and performance evaluations results of empirical studies conducted are reported in Section 4. Section 5 summarizes and concludes.

Relevant Theories
2.1.Wavelet Denoising Algorithm.Denoising algorithms have been one of the most important research topics in the engineering field, although they have not received much attention in the field of economics and finance.The critical issue in the denoising algorithm research is to set the right boundary and remove the noises while preserving major data features.There are many approaches developed in the literature.These include moving average filter, exponential smoothing filter, linear Fourier smoothing, simple nonlinear noise reduction and nonlinear wavelet shrinkage, and so forth [12].During the denoising process, some data features are changed to a varied degree depending on the filters used.For example, simple methods including moving average filter and exponential smoothing filter are known to modify the statistical property of data significantly [13].There are also some other nonlinear denoising techniques with mixed results in the literature.Yahya Bey [14] and Bey [15] propose both modified and constant frequency extent denoising algorithms and they have shown superior performance compared to the wavelet denoising method in the empirical studies [14,15].Verma and Ganguli [13] propose the nonlinear rational filter with median preprocessor to improve the noise reduction ratio in gas turbin health monitoring applications [13].Wavelet shrinkage could provide more optimized and localized noise reduction processes.Boto-Giralda et al. [16] use the stationary wavelet based denoising methods to improve the performance of traffic volume prediction models in intelligent transportation system [16].Gao et al. [17] propose adaptive denoising algorithms and contend it to be superior to wavelet based approaches when applied to analysis of electroencephalogram (EEG) signals contaminated with noises [17].Kwon et al. [18] point out the problems with existing denoising techniques as assuming homogeneous error structure and they propose wavelet denoising method incorporating a variance change point detection thresholding method to deal with it in protein mass spectroscopy applications [18].Lotric and Dobnikar [19] and Lotrič [20] integrate the neural network with the wavelet denoising method to optimize the denoising parameters dynamically and found the performance improvement in prediction accuracy [19,20].It is interesting to see that wavelet based denoising methods have become the predominant approaches.
The unique feature of wavelet analysis is its localization over both time and scale during the analysis.This is in contrast to the traditional spectrum analysis tool that signifies the patterns at different frequencies at the cost of ignoring details at time scales.This capability stems from the wavelets functions used, which have high energy concentration in a short interval of time, which is in direct contrast to the globally time invariant sinusoid functions used in more traditional spectrum analysis tools such as Fourier analysis [21].The forward and inverse operation of wavelet analysis form the multiresolution analysis, which provides unique perspectives into the data structure.As financial data usually exhibit nonstationary time varying characteristics, wavelet analysis is appealing when it comes to financial data analysis.Wavelets are continuous functions in  2 (R) that have vanishing moments and fluctuating characteristics within a certain time period [22].Wavelets are characterized by their various features including vanishing moments, compact support, regularity, symmetry, and time frequency windows [22].The typical wavelets used include Haar, Daubechies, Coiflets, discrete Meyers, and Biorthogonal and reverse Biorthogonal [22].Interested readers are referred to Percival and Walden [22] for more details.
Utilizing the wavelets and their appealing time scale localization characteristics, the original data are projected into the multiscale domain using wavelet transform as in the following: where (( − )/) refers to the wavelet families translated by location parameter , which corresponds to time scale localization, and dilated by scale parameter , which corresponds to the frequency scale localization.
The wavelet transform can be reversed to perfectly reconstruct the original data by inverse wavelet synthesis as in the following: The wavelet denoising technique utilizes the wavelet transform and synthesis during the data denoising process.It firstly projects the original data series into different scales using wavelet transform.Thus multiresolution analysis (MRA) can be defined as analyzing the time and frequency characteristics of the data, both denoised and noise part, with finer details revealing patterns at more microscales.Then the separation between denoised and noise data components is achieved by applying the threshold chosen specifically at different scales to either suppress or shrink the wavelet coefficients.The processed wavelet coefficients are reconstructed into the unified data series using wavelet synthesis.
Threshold selection rules and shrinkage algorithms are critical to effective extraction of features during the denoising process.Typical threshold selection strategies include universal, minimaxi, and Steins' unbiased risk estimate threshold.The shrinkage strategies include the hard and soft shrinkage algorithm [23].The differences among different threshold selection rules lie in the trade-off set between smoothness and accuracy to obtain denoised data.For example, the maximal level of reduction of noises is achieved with universal threshold selection rule at the cost of lower goodness-of-fit and risk of potential disruption of valuable information in the original data.The minimaxi threshold selection rule, on the other hand, attempts to achieve the best fit approximation to the original data by minimizing the function fitness criteria such as mean squared error (MSE), at the cost of lower function smoothness.The shrinkage strategies include the hard and soft shrinkage algorithms, both widely used in mainstream literature over the years [23,24].The hard shrinkage rule is the high pass filter which suppresses coefficients in the wavelet transform below the chosen threshold values and leaves the remaining coefficients intact.The soft shrinkage rules suppress coefficients in the wavelet transform below the set threshold value and subtract the threshold value from the remaining wavelet coefficients 2.2.Nonlinear Ensemble Algorithm.Since the seminal work by Bates and Granger [9], empirical researches have shown that the ensemble algorithm can improve the generalizability of the model by incorporating the partial information set [9].Given the forecasting set pair (  ,   ()) ( = 1, 2, . . ., ), where   is the input and   () is the output, the goal of the ensemble model as in (3) is to find the hidden pattern that has the minimal generalization error and generalizes well out-ofsample: where f() is the ensemble forecasts and equal the combination of individual forecasts f () as ∑  =1   f ().() is the expected ensemble forecasts.The performance improvement from ensemble forecasts is determined by two factors: the forecasting accuracy of individual models and the level of independence of individual models.The greater the level of independence and heterogeneity of individual models, the smaller the level of correlations among individual forecasters, and thus the higher the level of generalizability of models estimated.These two issues are taken into consideration during the modeling process in this paper.On one hand, the forecasting accuracy issue is addressed by multiscale denoising based time series methodology.On the other hand, independent component analysis has been proposed in this paper to reduce the dimensionality of forecasting matrix and improve the independence of ensemble members.Traditionally principal components analysis (PCA) has been proposed to achieve the uncorrelated ensemble members.But they are independent only in the Gaussian framework.For example, forecasts can have zero covariance and yet be interdependent at higher moments.In the meantime, ensemble members transformed using ICA are statistically independent, that is, not just zero covariance, but independence at higher moments.ICA, also known as blind source separation (BSS), attempts to find the optimal linear projection of the original data to reduce the problem dimensionality and noise level.It recovers latent variables from the original mixed data without prior information except the statistical properties, as in the following: where  is the [ × ] observation matrix,  is the [ × ] mutually independent source matrix, and  is the real mixing coefficients  , ( = 1, . . ., ;  = 1, . . ., ).
ICA attempts to estimate , whose inverse  recovers the original source  from observations  as  =  −1  = .
Among different approaches developed over the years, FastICA based on the fixed point algorithm is the most widely used and computationally efficient one [25].Independent components can be estimated either in parallel or one after another following the Gram-Schmidt orthogonal procedure.During the optimization process, higher order statistics rather than the second moment are utilized as further guidance on the degree of statistical independence of components calculated.
Least square support vector regression (LSSVR) is the emerging nonlinear ensemble algorithm.Compared to traditional neural network based approach, its solution is more stable and global optimum.It adopts the structural risk minimization principle during the data training and learning process and models it as a convex optimization problem to balance between the fitting accuracy and model generalizability.Thus it would alleviate the overfitting and local minima issue with the traditional supervised learning algorithm such as neural network models, which are based on the empirical risk minimization principle.Interested readers are referred to [26] for details.
The basic notion of LSSVR is as follows.Firstly using kernel functions that satisfy the Mercer condition, the nonlinear data are mapped into higher dimensions using kernel tricks and a typical regression problem is formulated accordingly as in the following:  =  () = sign [   () + ,  :   → ,  ∈ ] , (5) where () is the mapping function that maps the variable  in the nonlinear input space into the variables in the linear feature space. and  are coefficients.
A slack variable   is introduced to model the estimation error, thus the LSSVR is formulated as in the following: where   = ( 1 ;  2 ; . . .;   ) ∈   and  is the penalty parameters.It serves to control for two objectives, that is, the minimization of estimation error and the function smoothness, during the optimization process.
The Lagrangian function for the dual problem is formulated as in the following: (7) where  = ( 1 ;  2 ; . . .;   ) is the Lagrange multipliers.Optimality is achieved with solving the linear system obtained by differentiating  in (7) with the variable , , , , as in the following:

𝜕𝐿
LSSVR, as the extension and improvement of SVR with more efficiency in the large scale problem, has shown performance improvement similar to the SVR algorithm in many time series forecasting problems.In crude oil market, Bao et al. [27] have proposed a hybrid model combining wavelet analysis and LSSVR in a two-stage process and have demonstrated its superior performance in medium to long term empirical studies [27].However, there have been few attempts for LSSVR application in the short term crude oil forecasting literature.Despite the positive results accumulated recently in the literature, both LSSVR and SVR suffer issues similar to the neural network model.For example, it is often difficult to provide economic justification for results obtained due to the black box nature of its training process.Its performance is sensitive to the parameters, which are often arbitrarily chosen.

A Novel WDN-ICA-LSSVR Algorithm
Recent progress in the mixture modelling literature provides further evidence of heterogeneous data components of different distributions and specifications in the financial markets.For example, work by Thavaneswaran et al. [28] points out that mixture distributions are better in capturing the heteroscedasticity in the financial data.They then derive the Markov models under mixture distribution [28].Shahbaba [29] points out that ignoring the latent data structure and mixture distributions would lead to wrong inference and they propose the mixture models to effectively capture the underlying DGP [29].Empirical researches show that the crude oil market has heterogeneous market structure with underlying components corresponding to both normal market conditions and transient events.
Thus this paper proposes the HMH theory as the theoretical basis for modeling the crude oil market.In HMH, the market is assumed to consist of heterogeneous agents with heterogeneous investment strategies and investment time horizons [30].Compared to the homogeneous reaction to news shocks assumed in efficient market hypothesis (EMH), HMH assumes that agents or investors adopt diversely different investment strategies based on their defining characteristics.Thus their investment time horizon and dealing frequency are intrinsically different; some tend to have lower dealing frequency focusing on long time horizon, while others tend to have high dealing frequency with short time horizon [31].Although theoretically sound, in practice only limited studies have been conducted to explore the methodology to systematically model the heterogeneous market structure in HMH.When investigating the crude oil markets, there are mainly two major issues addressed in this paper.
Firstly to recognize the heterogeneous market structure, theoretically major influencing factor or criteria need to be chosen to disentangle the hidden market structure.As empirical studies suggest, the microstructure of the crude oil market is very complex, consisting of investors with different investment strategies targeting at different investment time horizons [32].The market prices reflect the combined influences from the existing traders with different time horizons.This may include fundamentalist trader with long term strategies, short term trader aiming mainly at limited time horizon (e.g., daily), and intraday traders dealing with high frequency shocks throughout the day.The statistical property of the market process is dynamic and constantly changing due to its heterogeneity nature.Therefore, the high frequency shock could only influence part of but not all traders in the market, while the low frequency shock has far deeper impact on the whole market and market makers.The line between the short lived nature of high frequency shock and long term nature of low frequency shock is crucial in the sense that all investors and managers are essentially utility maximizer and should only be concerned with decisions and associated consequences within their own target investment horizon.Among different features that can be used to distinguish the market structure, this paper proposes the investment scales to disentangle the hidden market structure.Thus based on different investment scales, we assume that participants in the crude oil market can be categorized uniquely based on different investment scales into the fundamental investors (FI) and the noise traders (NT) as in the following: Implications from this assumption are that trading activities of these two types of traders characterized by their investment scales would exhibit distinctively different characteristics.The behavior of noise traders restricted by smaller band of frequency is closely related to temporal, stochastic, and nonlinear patterns in nature, usually resulting from highly speculative behavioral patterns and easily arising operational as well as transaction recording errors.The behavior of fundamental investors restricted by larger band of frequency are closely related to more stable, consistent, and long term patterns, which have been the subject for the extensive studies over years.They commonly exhibit characteristics such as autocorrelation, heteroscedasticity, mean reverting, and long memory, described by the mainstream econometric models.
Secondly techniques that can effectively and efficiently model the heterogeneous market structure need to be identified and incorporated to model accurately different market price movements characterized by different investment scales.In this paper, we introduce the wavelet denoising algorithms since they provide rough approximation to the crude oil market characterized by two major forces of influences.The main difficulty to separate data from noises with higher level of accuracy as the separation accuracy affects critically the generalizability of the models fitted.Proper removal of noises could result in more well behaved data.The inappropriate separation twists the original data features.Traditional data denoising techniques include spectrum analysis methods such as simple averages, Kalman filtering, and Fourier transforms.However, these methods do not receive significant attention in the mainstream empirical researches in the literature.Firstly, with the currently prevalent imperfect extraction technique in the literature, the data may be distorted during the denoising process with different assumptions imposed and different parameters used, which reduce forecasting accuracy.Secondly, they are mostly based on spectrum and frequency scale domain while ignoring the multiscale heterogeneous structure in the data and their underlying components, corresponding to different individual investment strategies and time horizon.The wavelet based denoising algorithm serves as the alternative modeling tool to extract data features of different characteristics, which is a more refined methodology.Meanwhile, as the true underlying DGP is unknown, each individual wavelet denoising model attempts to extract restricted information from the data with its unique assumptions or perspectives.These interpretations of data only offer partial insights and are never complete unless unified.Since there is no consensus as to the suitability of parameters to use during the wavelet based denoising process with particular data set, to reduce the biases introduced and further stabilize the estimation results, this paper proposes the LSSVR based nonlinear ensemble algorithm to ensemble partial information set captured by models with different parameters to produce more stabilized and accurate results.Besides, ICA is also used during the ensemble process to reduce the noises and reduce the computational complexity with the high dimension of forecast matrix consisting of forecasts based on different parameters.
Numerical procedure of the WDN-ICA-LSSVR algorithm involves the following key steps.
(1) Firstly, the wavelet based denoising algorithm is used to separate data from noises using particular wavelet families , as in the following: By decomposing data into the multiscale domain, data and noises are separated based on their different characteristics across scales with noises smaller in scales.Thus more subtle distinctions between data and noises can be set.
(2) The conditional mean for the denoised data is modeled by ARMA processes as in the following: where r is the conditional mean at time ,  − is the lag  returns with parameter   , and  − is the lag  residuals in the previous period with parameter   .
(3) The optimal specification for the ARMA model (i.e., model orders) is determined following AIC and BIC minimization principal as in the following: (4) With high dimensional forecasts matrix consisting of forecasts based on different wavelet families, the ICA is used to remove noises and reduce dimensions.
The original forecast matrix is transformed as in the following: where  refers to the projected principal components using ICA. refers to the set of forecasts based on different wavelet parameters. is the real mixing coefficients.
(5) The nonlinear ensemble algorithm is used to aggregate individual conditional mean forecasts in the ICA transformed forecast matrix as in the following: The LSSVR algorithm is used during the nonlinear ensemble process to calculate the optimal weights with improved robustness and generalizability for the model.

Empirical Studies
We conducted the empirical studies using daily observations in the West Taxes Intermediate (WTI) crude oil market.The data source is the Energy Information Administration, Department of Energy, US.The data range covers the time period January 2, 2002, to March 20, 2009.This includes 1814 daily observations.The data set is divided into three subdata sets, that is, the training set for wavelet denoising ARMA model (60%), the training set for ICA-LSSVR based nonlinear ensemble model (24%), and the test set for the out-of-sample test to evaluate the performance of different models (16%).Since there is no consensus on the econometric criteria for sample segmentation, the separation ratios are determined following convention in the machine learning literature, where at least 60% of the dataset is usually reserved as the training set so that the size of the remaining data set for the out-of-sample test is sufficiently large set and the results are statistically valid [33,34].The one day ahead forecast is performed using rollingwindow method.Since Autocorrelation Function (AF) and Partial autocorrelation function (PAF) analyses indicate that the original data include trend factors, it is log differenced at the first order as   = ln(  /  1 ) when the data set is constructed.The returns are transformed to be scale free, which correspond to percentage changes in financial positions and have more attractive statistical properties such as stationarity, and so forth.The holding period is assumed to be 1 day.For each experiment, a portfolio of one asset position worth 1 USD is assumed.
The generalizability of different models in out-of-sample forecast tests is evaluated using the forecasting error measurements and statistical tests.This includes mean square error (MSE), Pesaran-Timmerman directional test (PT), and Clark West test of equal predictive accuracy [35,36].
Experiment results in Table 1 show some stylized facts.The market exhibits moderate level of fluctuations, as suggested by the moderate level of volatility level.The distribution of the market returns is fat-tail and leptokurtic, as suggested by significant skewness and kurtosis levels.There is also high level of market risk exposure due to extreme events in the market, as reflected in the significant kurtosis level.The market returns also deviate from the normal distribution and exhibit nonlinear dynamics; this is further confirmed by the rejection of both Jarque-Bera test of normality and BDS test of independence at 95% confidence levels.Therefore, both denoised data component and noises data are extracted from the original data series for further analysis.From experiment results in Table 1, denoised data series exhibit more leptokurtic and fat tailed behavior.The rejection of both Jarque-Bera and BDS test suggests that the data series are characterized by nonlinear dynamics.Thus it is categorized as reflecting the underlying permanent factors.Meanwhile, noises data series are more normally distributed than the original data based on the descriptive statistics, although the Jarque-Bera test of normality is rejected.The BDS test is accepted at the moderate level of statistical significance.Therefore, it is categorized as reflecting the transient factors.However, the moderate significant level of BDS test statistics also suggests that there may be some linear or nonlinear data patterns involved.These analysis suggest that the model specifications for both denoised and noises data are different and the original data are mixture of different underlying DGP.
The wavelet denoising ARMA model is applied to the testing data to investigate the effects of different parameters on the model performance.Different combinations of parameters choices are pooled into the parameters pool, including three threshold selection rules (i.e., Stein's unbiased risk estimate (SURE), MiniMaxi, and Universal), two shrinkage rules (i.e., hard and soft shrinkage rules), 6 decomposition levels up to scale 6, and 7 wavelet families, which includes Haar, Daubechies, Coiflets, Symlets, discrete Meyers, and Biorthogonal and reverse Biorthogonal.The model orders for ARMA (, ) processes are determined following information criteria (IC) such as Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) minimization.Decomposition level is chosen to be 5, the rolling window is set to 512, the lag order is set to 2, the threshold selection rule chosen is MiniMaxi, and the shrinkage algorithm chosen is hard.Several parameters need to be determined for LSSVR model.The kernel chosen is radial basis function (RBF).Grid search method is used to determine the penalizing parameter  and parameter  was chosen for RBF. is 0.0518 and  is 2.8651.
Experiment results in Table 2 list predictive accuracy measured by the MSE for the original data, denoised and the noises data separately in the WTI markets.
Experiment results in Table 2 further confirm that the forecasting accuracy of wavelet denoising ARMA model is sensitive to the wavelet parameters used to denoise the original data.Some of the wavelet parameters can improve the forecasting performance to the level that beats the traditional benchmark models significantly.In the meantime, the forecasting accuracy decreases as a result of improper denoising model used.This result supports the proposed HMH in this paper that the price movement in the markets is joint influences from heterogeneous agent behaviors, including their different beliefs, strategies, and risk appetites [37].The perception into the separation and modeling of the distinct behaviors of different agents based on different wavelets may be redundant.
Since there is a lack of consensus on the appropriate selection of denoising algorithm and specifications, one approach common in the literature is to follow the trial and error method in the identification process, which is arbitrary and computationally expensive.To tackle this issue, this paper introduces the LSSVR based ensemble algorithm Experiment results in both Tables 3 and 4 show the superior performance of the proposed multiple wavelet denoising based ensemble against the traditional benchmark RW and ARMA models, at the statistical significant level.The forecasting accuracy is higher in terms of lower MSE value, while the directional forecasting accuracy is better in terms of higher ratio with significant  value at 95% confidence value.Meanwhile, the ensemble forecasts estimated are more stable than individual forecasts as the partial information set from individual forecasters is integrated to produce the optimal and stable forecasts with the proposed approach.The proposed algorithm helps produce optimal estimate by combining the partial information set in the constructed forecasting matrix.
Theoretically experiment results in this paper show that the HMH represents important and more realistic alternative theories for the market data structure.Technically experiment results confirm that HMH based modeling would result in the improved forecasting performance when the underlying data components are modeled with the noise removed in the multiscale domain.This also implies that the appropriate optimization of the weights assigned to different forecasts based on different weights would combine the partial information set they capture and lead to more stable and improved forecasting results.

Conclusions
This paper proposes a novel wavelet denoising crude oil price forecasting algorithm based on both wavelet based denoising algorithm and ICA-LSSVR based nonlinear ensemble framework.Work in this paper provides further evidence in support of HMH against EMH and predictability of price movement in the crude oil market.The market structure is viewed and analyzed in the emerging HMH perspective, relaxing the assumptions of traditional EMH.Under the framework of HMH, this paper proposes that the underlying heterogeneous market structure can be distinguished by different investment scales; that is, the crude oil price behaviors are affected by noises and main trends, which have different characteristics.Thus the separation of noises and data need to be conducted in a multiscale manner to recover the useful data for further modeling by ARMA model.The more accurate separation of data from noises leads to better behaved data and higher level of model generalizability.This paper also argues that the performance of traditional approaches could be further improved by the appropriate removal of noises from data.
Meanwhile, this paper also finds that the behavior of the denoised data is sensitive to the wavelet parameters chosen and exhibits completely different behavior, which corresponds to different model specifications and performances.Since there is no consensus on the criteria to choose the appropriate parameters in the literature, to reduce the estimation biases and improve the robustness, this paper introduces the LSSVR based nonlinear ensemble algorithm and ICA into the modeling process.The ICA effectively reduces the forecast matrix dimension and recovers important components that contribute most of the data features.The LSSVR based ensemble algorithm achieves higher forecasting accuracy and robustness by achieving faster convergence speed to the global optimal solution in the problem domain.The proposed model shows competent performance and improved robustness.

Table 1 :
Descriptive statistics and statistical tests for WTI market.

Table 2 :
In-sample forecasting accuracy comparison of wavelet models for WTI market.The test case with superior performance than that of Random Walk model.toensembleforecastsbased on different wavelet parameters.Based on the constructed individual forecast matrixes based on different wavelet families, ICA is used to reduce the matrix dimensions to produce the ensemble matrix and the LSSVR is used to ensemble the individual matrix to produce the optimal result.The dimensionality of the forecasting matrices for both denoised data and noises data is reduced to 1.The nonlinearity function used is the cubic function.The parameters for WDN-ICA-LSSVR algorithm are determined using grid search algorithm: for WTI market,  is 0.3353 and  is 32.6345.Experiment results in Table4show that the proposed WDN-ICA-LSSVR outperforms the benchmark RW and ARMA model, in terms of MSE and directional accuracy measured by PT directional test statistics.

Table 3 :
Out-of-sample forecasting accuracy comparison of different models.

Table 4 :
Out-of-sample directional predictive accuracy comparison of different models.