^{1}

^{1}

^{1}

^{2}

^{3}

^{3}

^{1}

^{2}

^{3}

The accuracy of time series forecasting is more important and can assist organizations to take up-to-date decisions for better planning and management. Several classical econometrics and computational approaches show promising results for the ordinary time series forecasting tasks, but they are not satisfactory in crude oil price forecasting. Ensemble empirical mode decomposition (EEMD) not only resolves the problem of nonlinearity and nonstationarity of time series prediction but also creates some problems (i.e., mood mixing and splitting). In this study, we proposed a new hybrid method that combines the median ensemble empirical mode decomposition and group method of data handling (MEEMD-GMDH) to reduce mood splitting problems and forecast crude oil price. MEEMD is achieved by replacing the mean operator with the median operator during the EEMD process. For testing and validation purposes of the different models, the two-seat stamp benchmarked crude oil price data are used (i.e., Brent and West Texas Intermediate (WTI)). To check the proposed model performance, different evaluation measures are used including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Diebold-Mariano (DM) test. All the forecasting accuracy measures confirmed that our proposed model performs well in crude oil prices forecasting as compared to other hybrid models.

The global market price fluctuates dramatically and increases in the long term. Commodity prices’ fluctuation causes a massive impact on the global economy, for example, soaring the cost of imports, stimulating inflation, sluggish growth in the economy, and decreasing the efficacy of macroeconomic policy. Thus, analyzing the characteristics of international market variations in goods in order to forecast the price and pattern is critical for the world economy. Goods are vital for global growth from the perspective of nations and governments and have a significant strategic effect on national economic stability. If consumer prices can be more reliably estimated, goods can be imported at low prices to significantly relieve imported inflation pressures. As import prices drop, government subsidies to businesses may be lowered and fiscal policy stability increases. Moreover, the currency reserves of a nation could be used to accommodate the flexibility of exchange rates to improve the resilience of monetary policies regardless of the decline of foreign exchange spending.

In the opinion of producers, goods are raw materials for the aviation, shipping, and food processing sectors. They are also items for the oil mining industry and nonferrous metallurgy businesses. Fluctuations of product markets influence the costs and earnings of companies. The effective fluctuation of the value of goods helps farmers to accurately schedule production, minimize the cost, and achieve greater profitability. As for exchange firms, sharp commodity price swings are also the result of significant losses. The probability of market volatility is therefore underdetermined due to the lack of an analysis team, operational team, and the required decision-making process. In fact, the expected price contrasts considerably with the real price when dealing with product prospects. By creating a model of projection for consumer goods and a method of mitigation fluctuations in prices, trading firms are able to avoid risks and lower trade losses. In general, it is important and urgent to study commodity price predictions to provide rational support for the government and business decision-makers.

Crude oil is one of the core major natural products, with demand and supply exceeding 80 million barrels per day, because it covers two-thirds of the world’s direct energy consumption [

In 2019, global oil consumption reached 1,0075 million barrels per day considering data from the International Energy Agency (IEA). Oil indeed plays the most important role in fulfilling global energy needs. Asian emerging-market countries have become the key contributors to the rising demand for crude oil. The fast economic development led them to dramatically raise the demand for crude oil. The crude oil demand has increased due to the expeditious economic growth. China’s oil consumption, for instance, has risen from an average of 69,700 barrels per day in 2005 to 145,100 barrels per day in 2019. As a demand factor, rising crude oil prices would result in higher production costs for the nonoil companies and a shrink in profit [

Due to the nonlinear and complex nature, it is difficult for humans to understand the high volatility in crude oil prices. In the past, crude oil prices for the West Texas Intermediate (WTI) peaked in July 2008 at USD 145.31 per barrel. But the price fell sharply to USD 30.28 per barrel due to the financial crisis, which was about 80 percent from the high at the end of 2008. Prices climbed to $113 per barrel in April 2011, when the economy boomed, but in February 2016, it dropped again to $27 per barrel, owing to certain political causes and demand and supply variations [

In the past decades, future observation and prediction based on time series data have attracted great attention in many research fields. To predict the future behavior of a particular phenomenon, many techniques have been developed to address this issue, such as cointegration analysis, vector error correction model (VECM), vector autoregression (VAR), linear-regression (linR), random walk model, GARCH, and ARIMA models. Other than that, computational approaches such as empirical mode decomposition (EMD), artificial neural network (ANN), and ensemble empirical mode decomposition (EEMD) have also been used. Gülen [

Motivated by the potential of median ensemble empirical mode decomposition (MEEMD) in signal decomposition, we proposed a new method for the prediction of crude oil prices combining MEEMD, namely, the median ensemble empirical mode decomposition and group method of data handling (MEEMD-GMDH), stimulated by the capability of MEEMD in the signal breakdown in order to minimize mood splitting and mixing problems. In particular, there are three phases of MEEMD-GMDH. First, MEEMD is used to decompose the daily raw price range for crude oil into several relatively simple components. Secondly, we use GMDH inputs from the autoregressive term to separately forecast each component. Finally, the predicted results for each component are aggregated as a result of the final forecast.

This paper contains the following main contributions:

We are proposing a new MEEMD framework that uses the median operator rather than the mean operator during an ensemble noisy intrinsic mode function (IMF) trial.

We forecast crude oil prices by integrating MEEMD-GMDH following the “decomposition and ensemble” framework. To the best of our knowledge, this blend is used for forecasting purposes for the first time.

Experimental outcomes show that the approach proposed is beneficial for forecasting crude oil prices.

The remainder of the study is structured in the sequence as follows: In Section 2, we concisely explain MEEMD, GMDH, ANN, and ARIMA models. Section 3 formulates the proposed MEEMD-GMDH model in detail. To evaluate the proposed model, results and discussion are presented in Section 4 and finally, and Section 5 concludes this paper.

EMD is one of the popular and widely used decomposition methods for nonlinear and nonstationary time series forecasting. The EMD decomposes the data into IMFs along with a residual. Intermittent recurrence of signals in EMD is usually due to mode mixing and mode splitting. Mode mixing is defined as one IMF containing different scales while mode splitting is defined as the spread of one scale over two or more IMFs. To remove the effect of mode mixing from EMD, an additional white noise term is added to the original signal before applying EEMD. The new added white noise solves the problem of mode mixing; however, it inevitably creates new mode splitting due to two main reasons: (i) Signals having a scale located in the overlapping region of the EMD equivalent filter would have a finite probability of mode splitting [

Create the ensemble:

For

Perform and decompose every member of

Use the median operator to obtain final IMFs within MEEED. The IMFs are computed as

In normal distribution

The added noise for MEEMD is as follows:

Flowchart of median ensemble empirical mode decomposition (MEEMD).

Box and Jenkins (1976) introduced the Box-Jenkins technique (ARIMA models) in the area of time series analysis [^{th} previous values, the previous error, and current and previous values of other time series. Box-Jenkins versions are highly versatile due to the use of both AR and MA concepts. The time series (Yt) interdependencies are measured by AR terms, while the previous error conditions depend on the MA terms. The following form is given for an ARMA order model (p, d, and q) for a univariate series.

The ARIMA process of order (p, d, and q) is defined as follows:

Here,

ANN is a very successful model for time series forecasting [

The architecture of the ANN model.

The idea of GMDH was first proposed by Ivakhnenko in 1966, as an inductive learning algorithm [

Here,

For partial description of GMDH, choose the new k variables

Many researchers consider that the partial description is a transfer function. There are many types of transfer functions. In this study, we use the radial basis function (RBP). The radial basis function is the form of

Estimate the vector of coefficients of partial description using the least square method.

Here,

In this step, identify new (inputs) for the second layer. Based on some criteria to choose the input (variables) for the second layer, choosing the best variables is based on some performance index, MSE, and Relative Mean Square Error (RMSE). The best neuron out of

Those values of

In criteria 2, ignore the weakest variables, and replace

In this step, check the stopping criterion. Whether a set of polynomials of the model is further improved, the lowest value of MSE obtained in the current layer is compared with the smallest value of MSE obtained in the previous layer. If an improvement is achieved, one goes back and repeat steps 1 to 5; otherwise, the process is stopped, and the algorithm has been completed. And finally, the GMDH model is shown in Figure

GMDH network flowchart.

Due to nonlinear features, the classical time series models are not capable of predicting crude oil prices accurately [

Decomposition of data: MEEMD is applied to decompose the

Individual prediction: divide the data into training and testing sets. Construct the GMDH model using train data, while evaluating the estimated model using the testing dataset.

Ensemble prediction: the test results for all IMFs from Step 2 are composed by adding as the final prediction results.

Flowchart of the proposed MEEMD-GMDH model.

In this study, daily crude oil prices of time series data are utilized, that is, WTI and Brent. The WTI series consists of 8000 observations from Feb 10, 1989, to Oct 10, 2019; 80 percent (6400 observations) are used as a training set while 20 percent (1600 observations) are used as a testing set. The Brent dataset consists of 12000 observations from Dec 10, 1973, to Oct 10, 2019; 80 percent (9600 observations) are used as a training set whereas 20 percent (2400 observations) are used as an assessment set to check the model performances. The distribution of training and checking data series is 80 percent and 20 percent, respectively [

The forecasting accuracy measures are the most important criteria when competing models occur. In this study, the forecasting capacity of the models is measured using four criteria, as presented in Table

Criteria to assess the competing models.

Criterion | Formula |
---|---|

Root Mean Square Error | |

Mean Absolute Error | |

Mean Absolute Percentage Error | |

Diebold-Mariano (DM) |

The Diebold-Mariano (DM) test statistic compares the prediction error of the two models, where

The decomposition results of WTI series using MEEMD.

The decomposition results of the Brent series using MEEMD.

In designing the GMDH model, one must determine the number of input variables. The selection of input corresponding to the number of variables plays an important role in many successful applications of the GMDH model. According to [

We proposed a hybrid model that includes two components: a decomposition by MEEMD and forecasting by the GMDH model. The compared single models include GMDH, ANN, and classical time series ARIMA. In this study, the comparison of the three competing single models concerning forecasting evaluation (accuracy) for testing datasets is presented in Table

Forecasting accuracy comparison of single models.

Models | Brent | WTI | ||||
---|---|---|---|---|---|---|

GMDH | ANN | ARIMA | GMDH | ANN | ARIMA | |

RMSE | 1.1372 | 2.2543 | 1.2397 | 2.0031 | ||

MAE | 1.0292 | 1.3469 | 0.9235 | 2.8719 | ||

MAPE | 1.2885 | 1.5164 | 1.5234 | 1.5035 |

From Table

The forecasting evaluation measures for single models are shown in Table

The forecasting evaluation performance (RMSE) for both crude oil markets is presented in Table

The MAPE value of GMDH (single model) 0.2994% and 0.7584% for both markets (Brent and WTI) lies in the classification of perfect forecasts.

The ANN model attained second (lowest values) and the ARIMA model has achieved the third rank in the forecasting performance.

The most important comparison of hybrid models based on decomposition methods for both crude oil series is shown in Tables

Forecasting accuracy of hybrid models of WTI series.

Models | EEMD | MEEMD | ||||
---|---|---|---|---|---|---|

GMDH | ANN | ARIMA | GMDH | ANN | ARIMA | |

RMSE | 0.6580 | 0.6680 | 0.1043 | 0.4297 | ||

MAE | 0.6244 | 0.6061 | 0.1025 | 0.3915 | ||

MAPE | 0.4943 | 1.2306 | 0.3770 | 0.5850 |

Forecasting accuracy of hybrid models of Brent series.

Models | EEMD | MEEMD | ||||
---|---|---|---|---|---|---|

GMDH | ANN | ARIMA | GMDH | ANN | ARIMA | |

RMSE | 0.7142 | 0.7926 | 0.0643 | 0.6780 | ||

MAE | 0.6742 | 0.8045 | 0.1325 | 0.6820 | ||

MAPE | 0.4783 | 0.9762 | 0.5370 | 0.9227 |

Regarding the hybrid models (i.e., EEMD-GMDH, EEMD-ANN, EEMD-ARIMA, MEEMD-GMDH, MEEMD-ANN, and MEEMD-ARIMA), Tables

The forecasting evaluation criterion RMSE results are shown in Tables

The forecasting evaluation performance (MAE) for both crude oil markets is presented in Tables

The forecasting evaluation criterion MAPE on both markets for all selected models is presented in Tables

From the forecasting evaluation measures shown in Tables

Among all these models, the proposed method performed well based on RMSE, MAE, and MAPE presented in Tables

The hybrid models based on MEEMD have better performance than those based on EEMD.

The proposed model performs better for long-term dependence than other classical and machine-learning methods for predicting crude oil prices.

Both MEEMD and EEMD hybrid models performed better than single time series and machine-learning models.

The suggested MEEMD-GMDH framework is way above all other comparable models in terms of MAPE, RMSE, and DM test, by utilizing the benefits of MEEMD and GMDH. All of these mean that the MEEMD-GMDH can effectively forecast crude oil prices.

Next, to confirm the superiority of the proposed model, we apply the DM test. For WTI dataset, the DM test statistic and their

DM test results for WTI series.

Tested models | EEMD | MEEMD | ||
---|---|---|---|---|

ANN | ARIMA | ANN | ARIMA | |

GMDH | ||||

(0.000) | (0 | (0.000) | (0.000 | |

ANN | 8.342 | |||

(0.000) | (0.000) |

DM test results for Brent series.

Tested models | EEMD | MEEMD | ||
---|---|---|---|---|

ANN | ARIMA | ANN | ARIMA | |

GMDH | ||||

(0.000) | (0.000) | (0.000) | (0.000) | |

ANN | 7.456 | 4.321 | ||

(0.000) | (0.000) |

The assumptions of the DM test are the two methods that have the same number of predictions. The DM test confirmed the above conclusion. The MEEMD-based model statistically outperformed ANN and ARIMA models, and their

In this section, simulation is performed to check the robustness and generalizability of the proposed MEEMD-GMDH model [

The first synthetic time series consisting of a sine function represents the deterministic component, whereas the normal distribution represents the stochastic component. That is,

The second synthetic time series consisting of the sine function represents the deterministic component, whereas the ARMA model represents the stochastic component with an error of 0.25.

Different time series are generated using equations (

Forecasting accuracy of all models for first synthetic data.

Number of observations | Models | EEMD | MEEMD | ||||
---|---|---|---|---|---|---|---|

GMDH | ANN | ARIMA | GMDH | ANN | ARIMA | ||

500 | RMSE | 0.7543 | 1.0234 | 2.4567 | 0.3199 | 0.7089 | 1.4563 |

MAE | 0.7432 | 1.0032 | 2.4321 | 0.2582 | 0.8694 | 1.0234 | |

MAPE | 0.9183 | 1.2313 | 2.0235 | 0.7253 | 1.0342 | 0.8456 | |

1000 | RMSE | 1.2345 | 2.0934 | 3.5672 | 0.6462 | 0.7843 | 1.5321 |

MAE | 1.0326 | 2.0742 | 2.9531 | 0.3719 | 0.7653 | 1.3267 | |

MAPE | 1.9482 | 2.6148 | 3.0219 | 0.6037 | 0.8723 | 1.7364 | |

2000 | RMSE | 1.0234 | 2.3456 | 2.5342 | 0.3621 | 0.4932 | 1.3942 |

MAE | 0.9934 | 1.4532 | 2.4329 | 0.2899 | 0.3444 | 1.2345 | |

MAPE | 0.8894 | 2.8743 | 1.6893 | 0.3523 | 0.6953 | 0.9654 | |

5000 | RMSE | 0.8743 | 1.0043 | 2.0001 | 0.3945 | 0.5673 | 1.8721 |

MAE | 0.8362 | 0.8732 | 1.8863 | 0.3173 | 0.5542 | 1.8290 | |

MAPE | 0.9795 | 1.3452 | 2.8743 | 0.8358 | 0.9731 | 1.2236 | |

10000 | RMSE | 0.8456 | 1.4567 | 2.5693 | 0.2867 | 1.9567 | 3.1043 |

MAE | 0.8123 | 1.4221 | 2.7123 | 0.2224 | 1.9256 | 2.9876 | |

MAPE | 0.7632 | 1.1345 | 1.4576 | 0.5211 | 2.1345 | 2.1376 |

Forecasting accuracy of all models for second synthetic data.

Number of observations | Models | EEMD | MEEMD | ||||
---|---|---|---|---|---|---|---|

GMDH | ANN | ARIMA | GMDH | ANN | ARIMA | ||

500 | RMSE | 0.7943 | 1.6321 | 2.6523 | 0.1592 | 0.6321 | 1.7643 |

MAE | 0.6793 | 1.6432 | 2.3987 | 0.1151 | 0.6124 | 1.7432 | |

MAPE | 1.0032 | 2.1953 | 2.4404 | 0.7757 | 0.8943 | 2.0032 | |

1000 | RMSE | 1.9123 | 2.8621 | 2.8732 | 0.5319 | 1.9456 | 2.3452 |

MAE | 1.8943 | 2.5001 | 2.7632 | 0.4983 | 1.4387 | 1.0643 | |

MAPE | 2.0232 | 2.7348 | 2.1198 | 0.6183 | 2.4532 | 1.8476 | |

2000 | RMSE | 0.4323 | 1.7634 | 3.2123 | 0.1493 | 0.3456 | 2.2345 |

MAE | 0.2341 | 1.4532 | 2.2134 | 0.1245 | 0.3324 | 2.4321 | |

MAPE | 0.4567 | 1.5567 | 2.6785 | 0.2346 | 0.6783 | 3.4321 | |

5000 | RMSE | 1.5432 | 2.9783 | 4.1283 | 0.1734 | 2.3457 | 2.2198 |

MAE | 1.0003 | 2.4431 | 3.9991 | 0.1368 | 1.2963 | 1.8645 | |

MAPE | 1.7634 | 3.1863 | 4.9921 | 0.9703 | 2.7234 | 2.0036 | |

10000 | RMSE | 0.7431 | 2.5673 | 2.9528 | 0.1419 | 1.3176 | 3.2134 |

MAE | 0.7219 | 1.9090 | 2.1947 | 0.1126 | 1.0954 | 2.5674 | |

MAPE | 0.8732 | 1.3421 | 2.4788 | 0.6234 | 0.8743 | 3.4532 |

From Tables

One of the most important quantitative models with significant interest in the literature is time series forecasts. Oil prices are a crucial factor influencing the economic agenda and policies of government and trading enterprises, because of the importance of the role of crude oil in the world economy. Proactive experience of their potential movements will also contribute to improved decision-making at all levels of government and management. The forecasts for oil prices are very complicated since the financial time series is extremely unpredictable, nonlinear, and erratic.

Despite the attempts to fix the issue with new mathematical approaches and because of its inherent complexity, apparently volatile existence, and different variables influencing the fluctuation of the demand in crude oil, oil prices are still difficult to tackle. Incorporate strategies have been more indispensable than ever to apply computational approaches for predicting and encouraging investment decisions.

Achieving an accurate prediction of a time series is a very important but difficult task because of its attributes of nonlinearity and nonstationarity. In this paper, we proposed a hybrid model called MEEMD-GMDH for crude oil price forecasting. The MEEMD method uses a median operator instead of a mean operator during the ensemble noisy IMF trial during the standard EEMD process. The advantage of MEEMD is to reduce the mood splitting problem of IMFs. This is the first time that MEEMD-GMDH has been applied to predict crude oil prices. The experimental results show that our new proposed methodology goes beyond other decomposition hybrid models (EMD, EEMD, and CEEMD). This shows that MEEMD-GMDH is a superior and promising alternative to the autoregressive integrated moving average model, ANN, and other machine-learning approaches studied by other researchers. In addition to crude oil prices, for its robustness and routine testing, the MEEMD-GMDH methodology can be implemented with more complex tasks. Both theoretical and observational literature evidence indicate that the hybrid model is less generic or error dependent on the use of dissimilar models or models that vary strongly. Moreover, the hybrid procedure can reduce the model instability, usually present in the statistical inference and time series prevision, due to potential unreliable or evolving data trends. A literature analysis of the crude oil forecast reveals that there have been limited studies on AI and complex methods. The main objective of this approach is to help decision-makers to reduce the risks of crude oil and improve the accuracy of crude oil price forecasts. Moreover, the research results of this study are crucial to national economic growth and sustainable development.

The future work could be extended in two aspects: (1) to predict other time series, such as gold price series, electricity, and wind speed, one can apply MEEMD-GMDH; (2) to attain more accurate and special decomposition of time series data, one can apply more advanced average operators such as weighted mean, quartiles, or geometric mean.

The data used to support the findings of this study are included within the supplementary information files.

The authors declare that there are no conflicts of interest regarding the publication of this paper.

The crude oil prices data used in this paper consist of WTI and Brent.