Forecasting Oil Price by Hierarchical Shrinkage in Dynamic Parameter Models

+e aim of this paper is to forecast monthly crude oil price with a hierarchical shrinkage approach, which utilizes not only LASSO for predictor selection, but a hierarchical Bayesian method to determine whether constant coefficient (CC) or time-varying parameter (TVP) predictive regression should be employed in each out-of-sample forecasting step. +is newly developed method has the advantages of both model shrinkage and automatic switch between CC and TVP forecasting models; thus, this may produce more accurate predictions of crude oil prices. +e empirical results show that this hierarchical shrinkage model can outperform many commonly used forecasting benchmark methods, such as AR, unobserved components stochastic volatility (UCSV), and multivariate regression models in forecasting crude oil price on various forecasting horizons.


Introduction
Crude oil price is one of the key indicators of the global macroeconomy and financial markets [1][2][3][4][5][6]. However, the oil price prediction is a complex process since various factors affect oil pricing [2] and the influence degree of these factors on oil price varies over time [7][8][9][10][11]. So, finding a proper oil price forecasting method, which is not merely able to select the important predictors but also reflect the dynamics of predictors impact, is of interest for a wide range of applications [12][13][14][15][16][17][18][19].
A vast of literatures [2,4,5,11,13,18,[20][21][22][23][24][25] indicate that except for previous oil prices, other parameters such as basic oil supply, demand and oil stock effects, financial market forces, market sentiment and uncertainty, macroeconomy, and geopolitical influences are also main influencing factors. If adding all these explanatory variables into the multivariate regression or autoregression (AR) class framework, it may lead to overfitting and misspecification problems and thereby constrain the forecast accuracy [7,26,27]. Additionally, time-varying effect of these parameters should be also considered in oil price forecasting, but drawing the time-varying effect into regression models would make the overfitting problem worse [7,11,28].
In this study, we introduce a prevailing Bayesian approach which not only overcomes overparametrization and misspecification problems in oil price prediction, but also discusses the time-varying properties of explanatory parameters in both short and long oil price forecasting horizons. is study mainly makes three contributions to the literature on oil price forecasting as follows.
First, we can estimate a large number of explanatory parameters with limited observations. Usually, low-frequency dataset is easier to access and process than highfrequency dataset; putting more informative explanatory variables into the model can help macroeconomists, politicians, and other market participants get more comprehensive information on the crude oil price. Further, we implement least absolute shrinkage and selection operator (LASSO) shrinkage method to handle all the considered endogenous and exogenous explanatory factors and select the most powerful influential factors automatically. Although previous studies [6,[29][30][31][32][33] simulate that LASSObased approaches show better out-of-sample forecasts and surpass both AR class models and time-varying parameter models, it is unclear whether LASSO operator is also outperforming other commonly used benchmark models in oil price forecasting. Examining the LASSO operator effectiveness may help oil market decision-makers identify significant influential indicators efficiently and seize investment opportunities.
Besides, for better explaining the oil price, we introduce more comprehensive exogenous (see Table 1) and endogenous variables (e.g., observations from previous time steps) as regression predictors. On the one hand, bringing previous oil prices into the regression enables comparison with autoregression models (AR) and time-varying vector autoregression (TVP-VAR) models, which are commonly used and proved models in energy price prediction that can generate accurate forecasts [18,[34][35][36][37]. On the other hand, we introduce a more comprehensive exogenous factors framework, which avoids model misspecification. Most of the oil price forecasting studies [3,16,34,[38][39][40] only focus on several key oil price predictors and ignore the rest due to the limited variables processing capacity; this leads to error of misspecification, while using the LASSO operator in this study can shrink the coefficient on unimportant explanatory variables to zero and include all the exogenous variables within the model without having to worry about multiindicators' processing capacity.
Second, it has been well documented that the predictive ability of the forecast parameters on crude oil prices varies over time [7-11, 18, 41]. is motivates us to study the timevarying properties of the regression coefficients. Shrinkage model in time-varying parameters is described by [28] and is considered an effective forecasting method [32]. Accordingly, we apply LASSO for the time-varying regression model in the oil market and evaluate oil price forecasting performance. is Bayesian-based estimation method can predict both long-term and short-term forecast horizons via monthly information. With hierarchical shrinkage in oil price predictors, we can select the most relevant predictors and pick out time-varying parameters automatically. It is worth noting that few works investigate parameters dynamic properties incorporating a large set of predictors in a single model. Our study provides empirical evidence regarding the most powerful contributor in forecasting oil prices and judges its dynamic properties simultaneously.
ird, we extend our ideas for using the mean of the log predictive likelihood (MLPL) to check the entire of predictive distribution robustness, which fill gaps of the commonly used forecasting performance measurement-the mean of the squared forecast errors (MSFE) and the mean of the absolute value of the forecast errors (MAFE) [19,23,27,28]-which can only judge the point forecasts. We also examine the forecasting performance by changing regressors, dependence variables, and rolling window estimation regimes for robustness check. Our out-of-sample evidence indicates that LASSO hierarchical shrinkage models outperform other competing models in most cases; LASSO can select informational variables automatically and efficiently. e remainder of this paper is organized as follows: Section 2 presents the econometric approach and comparison models. Section 3 introduces our data. Section 4 provides the out-of-sample empirical results and discussion. In Section 5, we present the robustness checks, and Section 6 concludes.

Empirical Models and Computation Processes
2.1. Empirical Models. Crude oil has both commodity and financial properties. As aforementioned, apart from the previous oil prices, we still have hundreds of influencing exogenous variables and seasonal adjustment should also be taken into account. In this case, the suitable full model for forecasting crude oil price is given by where Oil t+h is the future crude oil price we want to forecast at h-periods ahead, c is the intercept, and ε t ∼ N(0, σ 2 t ) is the error term. k k�1 β k x kt represents the sum of exogenous variables part, k is the number of explanatory variables, and β k is the kth regression parameter. p i�0 α r Oil t− r includes the sum of p lags of oil price;α r is the rth lag coefficient. 11 j�1 c j dum j is the sum of 11 monthly dummies which is added for seasonal adjusting. c j is the jth dummy variable coefficient. In total, the number of potential independent variables should be m � 1 + k + p + 11.
Each part of the model (intercept 11 j�1 c j dum j , p− 1 r�0 α r Oil t− r , or k k�1 β k x kt ) can be excluded from the model. Briefly, the computation steps can analyze models by adding different terms into the model, then judging the time-varying properties, and selectively do LASSO shrinkage for the variable parameters in both constant variance (homoskedasticity of σ t ) and stochastic variance (heteroskedasticity of σ t ) regimes. So, the model structure is diversifying to the following three restrictive forms: (2) e model specifies that the future crude oil price depends linearly on its past values.
(2) Multivariate regression model e model considers the effect of several key exogenous variables but excludes endogenous variables' influence on the oil price.
which assumes that the future oil price consists of components with a direct interpretation that cannot be observed. ese three models are commonly used and proved that they can generate relatively accurate linear regression prediction [18,[34][35][36][37][42][43][44][45]. Same as the full model (equation (1)), these three restricted model versions can also do hierarchical parameter shrinkage and decide which variable parameter varies with time. In Section 4.2, we compare the full model and the three restricted models in prediction performance with the same prior choices and basic model structures. e specific econometric method computation processes are as follows.
In equation (5), we assume ε t ∼ N(0, σ 2 t ) and v t ∼ N(0, Ω). σ 2 t can be stochastic or constant volatility. e errors are assumed to be independent of each other and independent at all leads and lags. Ω is of dimension m × m, which can be large relative to the number of observations. To keep the model relatively brief, we assume Ω is a diagonal matrix, Ω � diag(ω 2 1 , . . . , ω 2 m ).Ω introduces shrinkage in the time variation then switches the constant coefficients to time-varying coefficients. If ω i is zero, the ith (i � 1, . . . , m) coefficient is constant over time, and larger values of ω i mean more time variation. In order to elicit ω i , Belmonte et al. [28] separate the model into two parts, one part is constant (represented by βz t ) and the other part is time-varying (represented by β t z t ). Equation (5) will change to where β � β * 0 and β t � β * 0 − β. en, let β t,t � β i,t /ω i and transform equation (6) to where v i,t ∼ N(0, 1) for i � 1, . . . , r.
rough implementing LASSO in terms of equation (7), we can judge the time-varying properties and forecasting power of predictors. Four possible computation cases are discussed as follows: (1) ω i shrank to 0, but β i is not shrunk to 0; then, the ith variable parameter is constant over time (2) Both ω i and β i shrank to 0; then, the ith variable is irrelevant for forecasting the oil price Discrete Dynamics in Nature and Society (3) ω i is not shrunk to 0, but β i shrank to 0; then, the ith variable parameter has small time-varying characteristics (since β i,0 � 0, the coefficient will volatile around a value of zero) (4) Both ω i and β i are not shrunk to 0; then, the ith variable is relevant for forecasting the oil price and the time-varying coefficient is unrestricted around zero we can use the Bayesian LASSO shrinkage priors to estimate these parameters. According to the study of [28], the LASSO shrinkage can be obtained by starting from normal hierarchical priors for β and ω.
Hierarchy shrinkage 1: for the constant coefficients, the priori for and exponential mixing density τ 2 i |λ ∼ exp(λ 2 /2). λ is the shrinkage parameter for constant coefficients and we assume λ 2 ∼ Gamma(a 1 , a 2 ). So, the first hierarchy is conditional on λ to estimate τ 2 i then obtain β i . Hierarchy shrinkage 2: from equation (4), we can infer that the time-varying parameters β t (for t � 1, , is also with exponential mixing density ξ 2 i |κ ∼ exp(κ 2 /2). e shrinkage parameter κ lies at the bottom of the hierarchy and we assume κ 2 ∼ Gamma(b 1 , b 2 ). e second hierarchy is conditional on κ; we can in turn to derive ξ 2 i and ω i , at last, judge whether β is time-varying or not.
For the two hierarchy shrinkage processes mentioned above, we set the prior hyperparameters a 1 � a 2 � b 1 � b 2 � 0.001, which implies proper but very noninformative priors. For constant coefficients model, which removes the TVP part of the model, we set b 1 � 100000 to make ωi shrink very close to zero and its prior variance is 0.1.
To complete these two hierarchical shrinkage computations, [28] provides Markov Chain Monte Carlo (MCMC) algorithm blocks and precise steps to draw the parameter posteriors. After using a nonparametric kernel smoothing algorithm on the parameter posteriors, we can obtain an approximation of the oil price predictive density.
As LASSO shrinkage can be applied to both constant coefficients and time-varying coefficients, the full model and restricted models can derive several versions for the following: (1) LASSO on constant coefficients and time-varying parameters: both constant and time-varying part use LASSO priors and do hierarchical shrinkage.
(2) LASSO only on constant coefficients: this model omits the time-varying part ( r i�1 ω i β i,t z i,t in equation (7)) LASSO priors and uses a relatively noninformative and nonhierarchical normal prior for ω i .
implies an extremely tight prior on ω i with prior concentrated very close to 0.

Evaluation Criteria.
e results of predictive density or forecasting points from the previous steps are useful to quantitatively compare the out-of-sample predictive performance among different models. Following the convention in the literature on prediction measurement, we use point forecasting loss functions of MAFE and MSFE to demonstrate the ranking of model forecasts [17,23,29,31]. Further, since researchers and policymakers focus more on total distribution forecast uncertainty than just a point forecast, we also adopt the mean of the log predictive likelihood (MLPL) to evaluate the entire predictive distributions. e specific formulations of these three measuring statistics are listed below: Respectively, T is the end date, t 0 is the start date, h is prediction length, Oil t+h is the predictive median of oil price, and Oil 0 t+h is the corresponding real value. Smaller MAFE and MSFE and larger MLPL indicate stronger forecasting ability.

Data
is paper uses two prevailing proxies in crude oil pricing: the monthly spot price of Brent crude oil as dependent variable and West Texas Intermediate (WTI) oil futures for robustness check. Both datasets span from January 2004 to December 2018 yielding t � 180 observations; the out-ofsample evaluation period consists of the last 110 observations.
On the foundation of previous studies [3,16,34,[38][39][40], we select a relatively comprehensive predictors framework to forecast crude oil price and use available real-time data. e exogenous variable dataset consists of crude oil fundamentals (include crude oil supply, demand, and stocks), capital market prices (gold price, exchange rate, and stock market price index), substitute product price (natural gas price), market sentiment index (volatility index), macroeconomic influencing factors (industrial production and Kilian indexes), and political change (global policy uncertainty and Google trend). is variable set not only captures the information in both the supply and demand of crude oil but also includes activities related to the financial market and macroeconomy. Accordingly, they are widely used variables for crude oil price forecasting.
e ADF and PP test in Table 2 indicate that no variables have unit roots after first-order logarithmic difference, which means all the series are stationary time series, so we can use these series for further econometric modeling. e two dependent variables-WTI and Brent-are left-skewed, leptokurtic, and nonnormal distribution. Within 20 lags, the Q-statistics of both WTI index and Brent spot price series show significant autocorrelation, which suggest that past oil prices have influences on the current oil price, so it is reasonable to include AR terms in the model.
To examine whether the current oil price is affected by the past oil prices, we further include the logged first difference of 12 lags of the Brent crude oil price index in the model. In addition, an intercept and 11 monthly dummies (omitting the January dummy) are designed to distinguish monthly or seasonal effects on the crude oil prices.
All the explanatory variables are standardized to have mean zero and variance one. e model can flexibly include an intercept, different numbers of lags, 11 monthly dummies, and 12 predictors listed above. In addition, it can forecast oil prices a month ahead (short term) and a year ahead (long term).
In summary, the full variable model includes 36 coefficients to estimate with fewer than 15 years of data, which is a relatively short dataset. Omitting 12 predictors and 11 dummies, the model leads to AR models or TVP-AR models. If the lags are further excluded, it leads to TVP models or multivariate regression models. If only 11 dummies are left in the model, model form changes to UCSV model. In total, for each sample size rolling window estimation, we compute 20 different versions of full models and 100 competing models to check the models' robustness.

Time-Varying and Shrinkage Parameters Results.
is section focuses on time-varying and shrinkage coefficients represented by ω 2 i and τ 2 i .ω i close to zero means the ith (i � 1, . . . , m) coefficient is constant over time; larger values of ω i allow for more time variation. While the smaller value of τ 2 ensures a higher degree of shrinkage, larger τ 2 indicates the prior is more dispersed and shrinkage is less. In order to better explain the time-varying and shrinkage properties, we post the full model (LASSO shrinkage on both constant coefficients and time-varying parameters) results for Brent oil as an instant.
ese results show moderate shrinkage in most coefficients, but the shrinkage degree varies. Table 3 shows that in one-month ahead (h � 1) forecasting, ω 2 i for crude oil consumption, gold price, and industrial production index tend to shrink more than the coefficients on other variables, which indicate that the influence on crude oil price from these three variables is relatively time-invariant. In contrast, in short-term forecasting, the impacts of crude oil production on oil prices vary over time. e τ 2 i of gas price, industrial production index, and the Kilian index shrink most among all exogenous variables; this signifies that the role of substitute product of oil, production level, and macroeconomic factors will not exert a significant effect on crude oil price in the short term. Instead, the three representative market uncertainty variables-VIX, EPU, and Google trend-show low-level shrinkage, so the policy uncertainty, market sentiment, and topic heat have a greater effect on the oil prices in the short term.
In the long-run (h � 12) forecasting, crude oil stocks, SP500, and the Kilian index show larger ω 2 i than other variables, which means larger time variation in these coefficients. Moreover, the Kilian index presents the largest τ 2 , which indicates that Kilian's index is a powerful predictor for oil price long-term forecasting. In the contrary, tradeweighted US dollar index, gas price, and production level are relatively unimportant factors. Table 4 exhibits that the half-year ago oil prices have big impact on current oil price in the short-run forecasting; the influences from the end and beginning of the quarter are moderate. Table 5 depicts that the crude oil prices bear little relationship to the cycle of the seasons, because all the monthly dummies shrink more than the most of the other predictors and lags.

Forecasting Results Evaluation.
In the tables, all the results are presented relative to the corresponding full model (LASSO on both constant coefficients and TVPs); smaller MAFE or MSFE, or larger MLPL than full model statistics indicate that the restricted model is forecasting better than the benchmark model. e upper metrics of Table 6 results indicate that in onemonth ahead forecasting, there is evidence that LASSO on constant coefficients outperforms other restricted models in both stochastic and constant volatility, which meet the shortterm forecasting expectation that the majority coefficients do not change over time. Table 3 results are consistent with  Table 6 and proved our opinion again.
In terms of the latter forecast metrics-the annual forecasting horizon-coefficients tend to show more time variation, so the full model has the best performance.
It is worth noting that the TVP regression models and constant coefficients models produce the worst forecasts in both cases according to MLPL. e results verified again that the new Bayesian hierarchical LASSO outperforms the traditional counterparts and enhances the prediction accuracy. Additionally, the bad performance of LASSO only on time-varying parameters indicates that the inclusion of timevariant parameters in the model is necessary for the oil prices forecasting.
To sum up, all results exhibit the advantages of the Bayesian hierarchical shrinkage. Firstly, putting a LASSO prior allows Discrete Dynamics in Nature and Society  the data to decide whether the coefficients are time-varying and by how much they vary and restricts the TVP regression models coefficients wandering too widely which can obtain a better forecast performance. Secondly, in allusion of the misspecification problem, LASSO priors can automatically discover the lack of time variation in coefficients and shrinking the coefficients of unnecessary variables to zero, which improve the prediction accuracy and solve misspecification efficiently. irdly, hierarchical shrinkage in time-varying series facilitates researchers' start with a very flexible model with a relatively short dataset; the model results allow researchers and practitioners identify the most powerful predictors more efficiently then make the right investment decisions.
To investigate whether forecast performance varies over time, we present Figure 1, which uses the model with LASSO prior to both constant coefficients and time-varying parameters (TVPs) with forecasting horizon h � 1 (similar patterns are found with the other computation results).
From (a) and (b) in Figure 1, it can be seen that the constant and stochastic volatility versions of the model forecast roughly as well as each other; however, many conflicts occur during the time of the shale oil revolution in 2014. MAFE, MSFE, and MLPL will have a similar pattern for most of the time, but inconsistent during periods of oil price intense volatility. What is happening is that the heteroskedastic version includes too much increase in volatility which began with the shale oil revolution since MLPL measures the whole distribution prediction performance.
is has little impact on the point forecasts MAFE and MSFE which do not differ by much between the constant and stochastic versions of the model.

Robustness to Different Models' Specification.
Firstly, we conduct the robustness check by changing the variable set; the out-of-sample performance of AR, multivariate, and UCSV models are shown in the following tables.
Tables 7-9 indicate that, like the full model, smaller MAFE and MSFE and larger MLPL are also observed in LASSO on constant coefficients and LASSO on both constant and TVPs in AR, multivariate, and UCSV models. ese results suggest that hierarchical shrinkage method can also outperform other competing models even with changes in the model structures.

Robustness Check by Alternative Estimation Window.
In this section, we change the estimation window from the recursive rolling window to the rolling window; the results are shown in Table 10. LASSO on constant coefficients and TVPs and LASSO only on constant results are qualitatively similar in both rolling window and recursive rolling window.
Further, we change three in-sample window sizes suggested by [6,46,47] to check the robustness of hierarchical shrinkage models. In 40%, 50%, and 60% different out-ofsample evaluation periods, the results show that models with LASSO shrinkage exhibit lower MAFE and MSFE and higher Note. e bold text noted indicates relatively larger value among all ω 2 i and τ 2 , while the underlined text represents values relatively smaller ones.   Table 11 reports the main out-of-sample forecasting results of another prevailing proxy of crude oil prices, WTI. e     results are quite close to Brent oil, which provides further support for the superiority of the hierarchical shrinkage method in alternative proxy of crude oil price forecasting.

Conclusions
In this paper, we predict the crude oil price based on the Bayesian hierarchical shrinkage method with a relatively short dataset and comprehensive variables framework. is method avoids overfitting and misspecification problems faced by linear regression prediction and improves the oil price forecasting accuracy. It also takes parameters dynamic properties into account. So, practitioners or policymakers can easily identify the most powerful indicators and do appropriate strategies during different periods. e point and distribution forecasting performance statistics suggest that the hierarchical shrinkage models exhibit significantly better out-of-sample forecasting performance than other competing models in both homoskedasticity and heteroskedasticity versions. Our results are robust to a wide range of model settings, including various model structures, different out-of-sample sizes, alternative estimation rolling windows, and crude oil proxies. erefore, our study provides evidence regarding which indicators are informative and powerful to improve forecasting accuracy in the oil market.

Data Availability
e Brent and WTI crude oil price data are openly available on the website of EIA at https://www.eia.gov/dnav/pet/ pet_pri_spt_s1_d.htm. Note. e value noted in bold and underlined text indicates a model performing the best out of all models, while the bold and italic text represents a model performing the worst. MSFE, MAFE, and MLPL refer to the mean squared forecast error, mean absolute forecast error, and mean log predictive likelihood, respectively. Note. e value noted in bold and underlined text indicates a model performing the best out of all models, while the bold and italic text represents a model performing the worst. MSFE, MAFE, and MLPL refer to the mean squared forecast error, mean absolute forecast error, and mean log predictive likelihood, respectively.