A New Approach for Forecasting Crude Oil Prices Using Median Ensemble Empirical Mode Decomposition and Group Method of Data Handling

-e accuracy of time series forecasting is more important and can assist organizations to take up-to-date decisions for better planning andmanagement. Several classical econometrics and computational approaches show promising results for the ordinary time series forecasting tasks, but they are not satisfactory in crude oil price forecasting. Ensemble empirical mode decomposition (EEMD) not only resolves the problem of nonlinearity and nonstationarity of time series prediction but also creates some problems (i.e., mood mixing and splitting). In this study, we proposed a new hybrid method that combines the median ensemble empirical mode decomposition and group method of data handling (MEEMD-GMDH) to reduce mood splitting problems and forecast crude oil price. MEEMD is achieved by replacing the mean operator with the median operator during the EEMD process. For testing and validation purposes of the different models, the two-seat stamp benchmarked crude oil price data are used (i.e., Brent and West Texas Intermediate (WTI)). To check the proposed model performance, different evaluation measures are used including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Diebold-Mariano (DM) test. All the forecasting accuracy measures confirmed that our proposed model performs well in crude oil prices forecasting as compared to other hybrid models.


Introduction
e global market price fluctuates dramatically and increases in the long term. Commodity prices' fluctuation causes a massive impact on the global economy, for example, soaring the cost of imports, stimulating inflation, sluggish growth in the economy, and decreasing the efficacy of macroeconomic policy.
us, analyzing the characteristics of international market variations in goods in order to forecast the price and pattern is critical for the world economy. Goods are vital for global growth from the perspective of nations and governments and have a significant strategic effect on national economic stability. If consumer prices can be more reliably estimated, goods can be imported at low prices to significantly relieve imported inflation pressures. As import prices drop, government subsidies to businesses may be lowered and fiscal policy stability increases. Moreover, the currency reserves of a nation could be used to accommodate the flexibility of exchange rates to improve the resilience of monetary policies regardless of the decline of foreign exchange spending.
In the opinion of producers, goods are raw materials for the aviation, shipping, and food processing sectors. ey are also items for the oil mining industry and nonferrous metallurgy businesses. Fluctuations of product markets influence the costs and earnings of companies. e effective fluctuation of the value of goods helps farmers to accurately schedule production, minimize the cost, and achieve greater profitability. As for exchange firms, sharp commodity price swings are also the result of significant losses. e probability of market volatility is therefore underdetermined due to the lack of an analysis team, operational team, and the required decision-making process. In fact, the expected price contrasts considerably with the real price when dealing with product prospects. By creating a model of projection for consumer goods and a method of mitigation fluctuations in prices, trading firms are able to avoid risks and lower trade losses. In general, it is important and urgent to study commodity price predictions to provide rational support for the government and business decision-makers.
Crude oil is one of the core major natural products, with demand and supply exceeding 80 million barrels per day, because it covers two-thirds of the world's direct energy consumption [1]. Oil assumes an undeniably important role in the global economy since about 66% of the world's energy utilization comes from unrefined petroleum and gasoline. Sharp oil price value improvements are most likely going to shake aggregate economic activity, especially since Jan 2004, the world's oil cost has been rising rapidly and is creating striking fluctuations for the world economy. Consequently, unss oil prices are a source of major zeal for many analysts, research experts, and organizations. e price of crude oil is essentially dictated by its demand and supply but is more clearly affected by numerous unpredictable past/present/ future occurrences, such as climate change, stock levels, GDP development, and political perspectives. ese realities lead to a distinctly varying and nonlinear market and the basic component of maintaining the intricate dynamic is not understood.
In 2019, global oil consumption reached 1,0075 million barrels per day considering data from the International Energy Agency (IEA). Oil indeed plays the most important role in fulfilling global energy needs. Asian emergingmarket countries have become the key contributors to the rising demand for crude oil. e fast economic development led them to dramatically raise the demand for crude oil. e crude oil demand has increased due to the expeditious economic growth. China's oil consumption, for instance, has risen from an average of 69,700 barrels per day in 2005 to 145,100 barrels per day in 2019. As a demand factor, rising crude oil prices would result in higher production costs for the nonoil companies and a shrink in profit [2]. As crude oil is a very critical commodity to the global economy, many leading governments, investors, and scholars have invested a lot of effort in building models to predict fluctuations in their prices and important properties. Given its complexity, price charts are susceptible to factors such as supply and demand, speculative activities, competition between suppliers, development of technology, and endless war [3,4].
Due to the nonlinear and complex nature, it is difficult for humans to understand the high volatility in crude oil prices. In the past, crude oil prices for the West Texas Intermediate (WTI) peaked in July 2008 at USD 145.31 per barrel. But the price fell sharply to USD 30.28 per barrel due to the financial crisis, which was about 80 percent from the high at the end of 2008. Prices climbed to $113 per barrel in April 2011, when the economy boomed, but in February 2016, it dropped again to $27 per barrel, owing to certain political causes and demand and supply variations [5]. From different perspectives, the impact of crude oil price fluctuations on the national economy is reflected in two aspects. In the first aspect, soaring crude oil prices have seriously affected the economic empowerment of oilimporting economies. e second aspect is that the decline in crude oil prices (such as the decline in 1998) has caused serious budgetary deficit problems for oil-exporting countries [6]. Since crude oil price series are generally considered to be nonlinear and nonstationary time series, they can be accurately influenced by several factors; therefore, accurately predicting the price of oil can be quite challenging. Since the oil price pattern displayed nonlinear, nonstatic, or multiscaling elements, researchers started to analyze oil price volatility by using multiscale techniques, such as the wavelet analysis and the analytical decomposition mode (EMD). ese techniques have a strong time and frequency resolution and can increase the regularity of the variations. ere are several methods of analyzing and predicting oil prices developed by researchers that can be separated approximately into single models and mixed models. Single models include observational approaches, methods for causal inference, times, and math. Combined models are made according to such laws by integrating single models.
In the past decades, future observation and prediction based on time series data have attracted great attention in many research fields. To predict the future behavior of a particular phenomenon, many techniques have been developed to address this issue, such as cointegration analysis, vector error correction model (VECM), vector autoregression (VAR), linear-regression (linR), random walk model, GARCH, and ARIMA models. Other than that, computational approaches such as empirical mode decomposition (EMD), artificial neural network (ANN), and ensemble empirical mode decomposition (EEMD) have also been used. Gülen [7] used a cointegration methodology to predict the WTI crude oil price. Lanza et al. [8] utilized the error correction model (ECM) to predict crude oil prices. Another famous methodology is the GARCH model; likewise, [9] used the GARCH properties to predict Brent crude oil price. Mohammadi and Su [10] applied the ARIMA-GARCH model on weekly crude oil spot prices in eleven international markets, to forecast the conditional mean and volatility. ANN and ARIMA models were used to predict the future price of WTI crude oil [11]. ey documented a comparative analysis between the ANN and ARIMA models to show the techniques with the best results based on the forecasting accuracy measures including Mean Absolute Error (MAE), Mean Square Error (MSE), and Mean Absolute Percentage Error (MAPE). e scholars concluded that the ANN had better prediction results than the ARIMA model. Mirmirani and Li [11] investigated US oil prices using vector autoregression (VAR) and ANN and concluded that BPN-GA attains the best results. Ahmad [12] predicted the Oman crude oil prices using the ARIMA model and proved that ARIMA (1, 1, 5) * (1, 1, 1) achieved the best results. Aamir et al. [13] used the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) hybrid model to forecast crude oil prices for both Brent and WTI markets.
ree-layer feedforward neural network (FNN) was incorporated by [14] for forecasting short-term crude oil price. e authors [15,16] forecast the crude oil price by using a support vector machine (SVM) and contrasted the results with backpropagation neural network (BPNN) and ARIMA models. eir findings indicated that SVM outperformed the BPNN and ARIMA models. e aim of the authors [17] was to achieve some improvements in the prediction of oil price volatility by using an Artificial Neural Network Generalized Autoregressive Conditional Heteroscedasticity ANN-GARCH hybrid model combined with financial variables. ey concluded that the hybrid model improves the volatility prediction accuracy by more than 30% through the results measured by the heteroscedasticity-adjusted mean square error (HMSE) model. Lin and Sun [2] used CEEMDAN-MLGRU based decomposition method to forecast WTI crude oil price. Li et al. [18] used a new hybridbased model, namely, EEMD-SBL-ADD, and concluded that the proposed model is promising for forecasting crude oil price.
e authors [19] also worked on Box-Jenkins (ARIMA) and neural network (NN) models to forecast repairable system failure analysis. ey concluded that both models performed better with short-term forecasting; however, NN as compared to the ARIMA model gave satisfactory performances. Moreover, the authors in [20] used machine-learning (decision tree) models to forecast crude oil prices. is study also concluded that the decision tree models achieved higher prediction accuracy than benchmark models such as multiple linR and ARIMA. Due to nonlinear features, the classical time series models were not capable of predicting crude oil prices accurately [21,22].
Motivated by the potential of median ensemble empirical mode decomposition (MEEMD) in signal decomposition, we proposed a new method for the prediction of crude oil prices combining MEEMD, namely, the median ensemble empirical mode decomposition and group method of data handling (MEEMD-GMDH), stimulated by the capability of MEEMD in the signal breakdown in order to minimize mood splitting and mixing problems. In particular, there are three phases of MEEMD-GMDH. First, MEEMD is used to decompose the daily raw price range for crude oil into several relatively simple components. Secondly, we use GMDH inputs from the autoregressive term to separately forecast each component. Finally, the predicted results for each component are aggregated as a result of the final forecast.
is paper contains the following main contributions: (1) We are proposing a new MEEMD framework that uses the median operator rather than the mean operator during an ensemble noisy intrinsic mode function (IMF) trial. (2) We forecast crude oil prices by integrating MEEMD-GMDH following the "decomposition and ensemble" framework. To the best of our knowledge, this blend is used for forecasting purposes for the first time.
(3) Experimental outcomes show that the approach proposed is beneficial for forecasting crude oil prices.
e remainder of the study is structured in the sequence as follows: In Section 2, we concisely explain MEEMD, GMDH, ANN, and ARIMA models. Section 3 formulates the proposed MEEMD-GMDH model in detail. To evaluate the proposed model, results and discussion are presented in Section 4 and finally, and Section 5 concludes this paper.

MEEMD.
EMD is one of the popular and widely used decomposition methods for nonlinear and nonstationary time series forecasting. e EMD decomposes the data into IMFs along with a residual. Intermittent recurrence of signals in EMD is usually due to mode mixing and mode splitting. Mode mixing is defined as one IMF containing different scales while mode splitting is defined as the spread of one scale over two or more IMFs. To remove the effect of mode mixing from EMD, an additional white noise term is added to the original signal before applying EEMD. e new added white noise solves the problem of mode mixing; however, it inevitably creates new mode splitting due to two main reasons: (i) Signals having a scale located in the overlapping region of the EMD equivalent filter would have a finite probability of mode splitting [23]; (ii) e added white noise cannot guarantee full uniformity across all scales, which may lead unexpected signal intermittency and irregularity [24]. To reduce the mode splitting problem, [25] proposed MEEMD. It is the variation of the EEMD method that uses the median operator instead of a mean operator during the ensemble noisy IMF trial. e steps of MEEMD are as follows: (i) Create the ensemble: (1) into a maximum number (M n ) of IMFs using standard (EMD) to obtain the IMFs d m n (t) M n m�1 and one residuer n (t).
(iii) Use the median operator to obtain final IMFs within MEEED. e IMFs are computed as In normal distribution N ∼ (μ, σ 2 ), the median is asymptotically normal [2]. at is, mean � m, e added noise for MEEMD is as follows: Mathematical Problems in Engineering where σN is the final standard deviation which is equal to the difference between input and sum of IMFs. e flowchart of the MEEMD is presented in Figure 1.

ARIMA
Here, Y t is the target value of time series t, Y t− j is the lagged previous values, and ∅ j is the coefficients of lagged previous time series values, whereas θ j is the coefficients of the previous error term, ∈ t is an error term with normally distributed (0, δ 2 ), and ∈ t− 1 is the previous error terms. As most of the time series are usually nonstationary, the ARIMA model needs stationary data [27]. is can be achieved by differentiating the time series data. e ACF and PACF plots can be used to select the appropriate order of AR and MA terms.

Artificial Neural Network
Model. ANN is a very successful model for time series forecasting [28]. One of the characteristics of ANNs is their universal approximation; that is, ANNs can estimate any nonlinear continuous function up to any desired degree of accuracy [29,30]. e most commonly used for time series forecasting is single hidden layer feedforward neural network (SLFN). e relationship between output (y t ) and the inputs (y 1 , . . . , y t− p ) has the following output: where θ ij (i � 1, 2, . . . , p, j � 1, 2, . . . , q) and ∅ j (j � 0, 1, 2, . . . , q) are the weights, ∅ 0 and θ 0j are the bias term, Y t− i (i � 1, 2, . . . , p) represents the input nodes, and j � (1, 2, . . . , q) is hidden nodes. Using the logistic function as the hidden layer activation function g, the first layer is the input layer where the data are introduced to the network, the second layer is the hidden layer, and the last layer is the output layer where the result of a given input is produced. e ANN architecture is shown in Figure 2.

Group Method of Data Handling (GMDH).
e idea of GMDH was first proposed by Ivakhnenko in 1966, as an inductive learning algorithm [31]. According to [32], GMDH methodology solves higher-order regression polynomials, that is, solving modeling and classification problems. In the time series forecasting, the GMDH algorithm identifies the relationship between the variables based on their lag values. e GMDH methodology automatically chooses the process to follow in the algorithm after training the relationship between variables. e authors in [33] analyzed that GMDH has the ability to generalize and can fit the complexity of nonlinear systems.
e Ivakhnenko polynomial is defined as follows: Here, y represents the response variable, α 0 represents weights (coefficients), and x i x j represents lagged time series data. GMDH model consists of the following five steps: Step 1.
T is equal to the number of inputs. Construct the GMDH model using train data, while evaluating the estimated model using the testing dataset. e partial description is the form of Many researchers consider that the partial description is a transfer function. ere are many types of transfer functions. In this study, we use the radial basis function (RBP). e radial basis function is the form of Step 3. Estimate the vector of coefficients of partial description using the least square method.
Here, A i � (α 0 , α 1 , α 2 , . . . , α n ) is a polynomial coefficient. Y � (y 1 , y 2 , . . . , y n ) is the observed values, and Step ose values of Z whose MSE is less than the threshold; then stop the process. Otherwise, In criteria 2, ignore the weakest variables, and replace x 1 , x 2 , . . . , x k by those columns of z 1 , z 2 , . . . , z k that best estimate the response variable in the checking set.
Step 5. In this step, check the stopping criterion. Whether a set of polynomials of the model is further improved, the lowest value of MSE obtained in the current layer is compared with the smallest value of MSE obtained in the previous layer. If an improvement is achieved, one goes back and repeat steps 1 to 5; otherwise, the process is stopped, and the algorithm has been completed. And finally, the GMDH model is shown in Figure 3.

The Proposed MEEMD-GMDH Model
Due to nonlinear features, the classical time series models are not capable of predicting crude oil prices accurately [21]. erefore, inspired by the advantage of MEEMD in this study, a novel approach that integrates MEEMD-GMDH is used for forecasting crude oil price. e decomposition and ensemble framework of MEEMD-GMDH consists of the following steps and is shown in Figure 4.
Step 1. Decomposition of data: MEEMD is applied to decompose the X(t) { } crude oil prices series into two parts: (i) IMFs components and (ii) one residue component r n (t).
Step 2. Individual prediction: divide the data into training and testing sets. Construct the GMDH model using train data, while evaluating the estimated model using the testing dataset.

Mathematical Problems in Engineering
Step 3. Ensemble prediction: the test results for all IMFs from Step 2 are composed by adding as the final prediction results.

Evaluation Criteria.
e forecasting accuracy measures are the most important criteria when competing models occur. In this study, the forecasting capacity of the models is measured using four criteria, as presented in Table 1, where n is the number of data points, Y t represents observed values, and Y t represents the predicted values.
e Diebold-Mariano (DM) test statistic compares the prediction error of the two models,

Fitting Proposed Model to the Data.
In designing the GMDH model, one must determine the number of input variables.
e selection of input corresponding to the number of variables plays an important role in many successful applications of the GMDH model. According to [35], no theory can be used to guide the selection of the number of inputs. To make the MEEMD-GMDH model, we choose the best order (p, d, and q) of the ARIMA model for every kth IMFs based on AIC and BIC that are used to determine the input variables for the proposed model.

Predictive Performance of Single Models.
We proposed a hybrid model that includes two components: a decomposition by MEEMD and forecasting by the GMDH model. e compared single models include GMDH, ANN, and classical time series ARIMA. In this study, the comparison of the three competing single models concerning forecasting evaluation (accuracy) for testing datasets is presented in Table 2.
From Table 2, among all these models, GMDH model attains the smallest value (better performance) on the metrics (RMSE, MAE, and MAPE). As shown in Table 2, the single ANN model performed better than the classical ARIMA model. Table 2 also indicated that the ARIMA model performed worst, because the classical econometric and time series method does not perform well for nonlinear time series. e forecasting evaluation measures for single models are shown in Table 2; some interesting conclusions can be drawn: (1) e forecasting evaluation performance (RMSE) for both crude oil markets is presented in Table 2. e decision is made from the RMSE value that GMDH (single model) got the lowest value and outperformed the other single models (ANN; ARIMA) for both markets. (3) e ANN model attained second (lowest values) and the ARIMA model has achieved the third rank in the forecasting performance.   Tables 3 and 4. e conclusion is drawn from RMSE that the proposed MEEMD-GMDH model significantly outperformed the other models for both Brent and WTI crude oil markets, while on other hand, MEEMD-based models (MEEMD-ANN and MEEMD-ARIMA) also attain the lowest value (better performance) than the corresponding EEMD-based (EEMD-ANN and EEMD-ARIMA) models.     Tables 3 and 4. e decision is made from the MAE value that the proposed model got the lowest value and outperformed the other models for both markets. From Tables 3 and 4, we observe that the MEEMDbased model also performs better than the corresponding EEMD-based models.
e forecasting evaluation criterion MAPE on both markets for all selected models is presented in Tables 3 and 4. e decision made from MAPE value that the proposed model significantly outperformed the other models for both markets. e models MEEMD-ANN and MEEMD-ARIMA also attain the lowest values and perform well; then, the other benchmarked models and ranked second and third, respectively, in terms of MAPE. e MAPE values of the proposed model are 0.0087 and 0.0051 for both Brent and WTI markets, respectively, which lies in the classification of highly accurate forecasts.
From the forecasting evaluation measures shown in Tables 3 and 4, some interesting conclusions can be drawn: (1) Among all these models, the proposed method performed well based on RMSE, MAE, and MAPE presented in Tables 3 and 4. (2) e hybrid models based on MEEMD have better performance than those based on EEMD. (3) e proposed model performs better for long-term dependence than other classical and machinelearning methods for predicting crude oil prices. (4) Both MEEMD and EEMD hybrid models performed better than single time series and machine-learning models. (5) e suggested MEEMD-GMDH framework is way above all other comparable models in terms of MAPE, RMSE, and DM test, by utilizing the benefits of MEEMD and GMDH. All of these mean that the MEEMD-GMDH can effectively forecast crude oil prices.
Next, to confirm the superiority of the proposed model, we apply the DM test. For WTI dataset, the DM test statistic and their p-values are shown in Table 5, while for the Brent series, the DM test statistic and their corresponding p-values are presented in Table 6.
e assumptions of the DM test are the two methods that have the same number of predictions. e DM test confirmed the above conclusion.
e MEEMD-based model statistically outperformed ANN and ARIMA models, and their p-values are less than <0.01 for both markets which shows the superiority of the MEEMD-GMDH model, while on the other hand, EEMD hybrid models and their p-values are also less than 0.01. Finally, the proposed model performed better than other models in this study.

Monte Carlo Simulations.
In this section, simulation is performed to check the robustness and generalizability of the proposed MEEMD-GMDH model [36]. As we know, the nature of the crude oil prices data is the combination of the stochastic and deterministic components. e MEEMD and EEMD procedure divided the original time series into IMFs in such a way that the first IMF is more stochastic as compared to the second IMF and the second IMF is more stochastic than the third IMF and so on, whereas the last IMF is completely deterministic. Synthetic time series datasets which are composed of additive white noise and sine function are described in two different scenarios as follows [37][38][39]: (1) e first synthetic time series consisting of a sine function represents the deterministic component, whereas the normal distribution represents the stochastic component. at is, (2) e second synthetic time series consisting of the sine function represents the deterministic component, whereas the ARMA model represents the stochastic component with an error of 0.25.
(3) Different time series are generated using equations (15) and (16) with a different number of observations, that is, 500, 1000, 2000, 5000, and 10000, and decompose all the series using MEEMD and EEMD. e distribution of training and testing data series is 80 and 20 percent, respectively, of every series. e forecast accuracy measures RMSE, MAE, and MAPE for testing datasets are presented in Tables 7 and 8,   respectively, for scenarios 1 and 2 for all models, that is, EEMD-GMDH, EEMD-ANN, EEMD-ARIMA, MEEMD-ARIMA, MEEMD-ANN, and the proposed MEEMD-GMDH model.
From Tables 7 and 8, it is observed that the MEEMD improved the performance of the ARIMA, ANN, and GMDH models as compared to EEMD. us, for forecasting the crude oil prices, the MEEMD is recommended for data decomposition. e model MEEMD-GMDH outperforms all of the models for a different number of observations, that is, 500, 1000, 2000, 5000, and 10000. e MAPE values of the model MEEMD-GMDH are less than 1 for all sets of observations which demonstrated the highly accurate forecasts [37,38]. e experimental findings of both scenarios demonstrated that all ensemble methodologies were effective but MEEMD was more effective. Moreover, the forecasting accuracy measures in terms of MAE, RMSE, and MAPE highlighted that the model MEEMD-GMDH is the most efficient method for forecasting daily crude oil prices.

Conclusion
One of the most important quantitative models with significant interest in the literature is time series forecasts. Oil prices are a crucial factor influencing the economic agenda and policies of government and trading enterprises, because of the importance of the role of crude oil in the world economy. Proactive experience of their potential movements will also contribute to improved decision-making at all levels of government and management. e forecasts for oil prices are very complicated since the financial time series is extremely unpredictable, nonlinear, and erratic.
Despite the attempts to fix the issue with new mathematical approaches and because of its inherent complexity,  apparently volatile existence, and different variables influencing the fluctuation of the demand in crude oil, oil prices are still difficult to tackle. Incorporate strategies have been more indispensable than ever to apply computational approaches for predicting and encouraging investment decisions. Achieving an accurate prediction of a time series is a very important but difficult task because of its attributes of nonlinearity and nonstationarity. In this paper, we proposed a hybrid model called MEEMD-GMDH for crude oil price forecasting. e MEEMD method uses a median operator instead of a mean operator during the ensemble noisy IMF trial during the standard EEMD process. e advantage of MEEMD is to reduce the mood splitting problem of IMFs.
is is the first time that MEEMD-GMDH has been applied to predict crude oil prices. e experimental results show that our new proposed methodology goes beyond other decomposition hybrid models (EMD, EEMD, and CEEMD). is shows that MEEMD-GMDH is a superior and promising alternative to the autoregressive integrated moving average model, ANN, and other machine-learning approaches studied by other researchers. In addition to crude oil prices, for its robustness and routine testing, the MEEMD-GMDH methodology can be implemented with more complex tasks. Both theoretical and observational literature evidence indicate that the hybrid model is less generic or error dependent on the use of dissimilar models or models that vary strongly. Moreover, the hybrid procedure can reduce the model instability, usually present in the statistical inference and time series prevision, due to potential unreliable or evolving data trends. A literature analysis of the crude oil forecast reveals that there have been limited studies on AI and complex methods. e main objective of this approach is to help decision-makers to reduce the risks of crude oil and improve the accuracy of crude oil price forecasts. Moreover, the research results of this study are crucial to national economic growth and sustainable development.
e future work could be extended in two aspects: (1) to predict other time series, such as gold price series, electricity, and wind speed, one can apply MEEMD-GMDH; (2) to attain more accurate and special decomposition of time series data, one can apply more advanced average operators such as weighted mean, quartiles, or geometric mean.

Data Availability
e data used to support the findings of this study are included within the supplementary information files.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.

Supplementary Materials
e crude oil prices data used in this paper consist of WTI and Brent. (Supplementary Materials)