A Hybrid LSTM-Based Ensemble Learning Approach for China Coastal Bulk Coal Freight Index Prediction

China Coastal Bulk Coal Freight Index (CBCFI) reflects how the coastal coal transporting market’s freight rates in China are fluctuated, significantly impacting the enterprise’s strategic decisions and risk-avoiding. ,ough trend analysis on freight rate has been extensively conducted, the property of the shipping market, i.e., it varies over time and is not stable, causes CBCFI to be hard to be accurately predicted. A novel hybrid approach is developed in the paper, integrating Long Short-TermMemory (LSTM) and ensemble learning techniques to forecast CBCFI. ,e hybrid LSTM-based ensemble learning (LSTM-EL) approach predicts the CBCFI by extracting the time-dependent information in the original data and incorporating CBCFI-related data, e.g., domestic and overseas thermal coal spot prices, coal inventory, the prices of fuel oil, and crude oil. To demonstrate the applicability and generality of the proposed approach, different time-scale datasets (e.g., daily, weekly, and monthly) in a rolling forecasting experiment are conducted. Empirical results show that domestic and overseas thermal coal spot prices and crude oil prices have great influences on daily, weekly, and monthly CBCFI values. And in daily, weekly, and monthly forecasting cases, the LSMT-EL approaches have higher prediction accuracy and a greater trend complying ratio than the relevant single ensemble learning algorithm. ,e hybrid method outperforms others when it works with information involving a dramatic market recession, elucidating CBCFI’s predictable ability. ,e present work is of high significance to general commerce, commerce-related, and hedging strategic procedures within the coastal shipping market.


Introduction
Nowadays, 90% of global trades are completed via sea transportation [1]. Sea transport is critical to the business system on the globe and also in the domestic trade system. China Coastal Bulk Coal Freight Index (CBCFI) [2] is built for timely reflecting how coastal coal transporting market's freight rates in China are fluctuated, by complying with the present mechanism of China Coastal Bulk Freight Index (CBFI) [3]. is system can publicize the complex index and spot ratios in terms of various routes/kinds pertaining to vessels of the coastal coal service market on a day-to-day basis. China (Coastal) Bulk Freight Index Panelist offers CBCFI freight data per weekday (Shanghai Shipping Exchange publicizes CBCFI at the official website and as well as http://www.chineseshipping.com.cn at 15:00 (Beijing Time) on each index publication day).
CBCFI represents the voyage-charter freight rate conditions pertaining to the market of bulk coal shipping in coastal areas. is indicates the unstable property pertaining to the coal bulk shipping market, as well as reflecting the developing states pertaining to the China economic condition and domestic business tendency. us, it refers to the "weatherglass" pertaining to the market of bulk coal shipping in coastal areas. Due to the mentioned feature, numerous internal personnel and specialists attempt at estimating the subsequent tendency pertaining to the market of bulk coal shipping in coastal areas by accurately predicting the bulk coal freight index for guiding company strategy decisions. Additionally, forecasting CBCFI value enables operating personnel and decision-making personnel for managing market trends and avoiding risks inside the coastal coal shipping market. Furthermore, it helps industries and manufactories with the domestic shipping system.
Previous studies indicate that the shipping freight index usually presents complicated instability features to be uncertain, cyclical, and nonlinear properties [4][5][6]. To achieve the satisfactory freight index prediction, a variety of freight indices predicting methods developed previously, where the econometric time series predicting approaches exhibit drawbacks in nonstationary and nonlinear properties, while the single neural network (NN) methods show disadvantages in overfitting, local minimum point, and parameter selection problems. Because both single econometric and NN approaches are limited in freight index prediction, recently, empirical studies consistently demonstrate the adaptability of hybrid techniques. Hybrid approaches are capable of combining individual approaches and make the respective advantage remedy others' deficiencies.
Recently, ensemble learning algorithms are extensively employed for analyzing the multivariable prediction. Ensemble learning algorithms (e.g., random forest (RF) and gradient boosting regression tree (GBRT)) are effective in determining the essential variables of a time series and investigate the inner relations among variables [7]. Over the past few years, ensemble learning algorithms have been extensively employed for studying the mentioned time series to be stock prices, Baltic Dry Index, and traffic flow, leading to the production of essential outcomes [8][9][10].
Ensemble learning algorithms' fast advancement presents one novel idea for the way of utilizing multidata and improves the readability from the original data. We here combine AI and ensemble learning algorithms for formulating one emerging hybrid approach, an ensemble learning (LSTM-EL) approach (e.g., LSTM-GBRT and LSTM-RF) by exploiting Long Short-Term Memory (LSTM) for CBCFI forecasting. Inside the approach, the LSTM layer obtains details dependent on time within the data, and the GBRT/RF layer shows great robustness of ensemble learning in terms of the training of approach. e present work has the following organization. Section 2 presents the review of the analyses of the market of shipping freight. Section 3 presents the designed approaches. Section 4 introduces the data collection. In Section 5, the approach performance receives the comparison and analysis. Lastly, Section 6 gives the concluding remarks.

Literature Review
Since freight rates are uncertain and unstable, the approaches to quantitatively analyze the rates arouse long concern from the shipping industry. As a result, increasing literature proposes approaches to predict freight rates, many of which use the Baltic Dry Index (BDI) [11]. Both BDI and CBCFI are shipping indices that effectively assess the current situation or shipping market, and they have some similar features. First, CBCFI and BDI are both daily number-issued leading indicators that measure the costs of shipping raw materials. Specifically, BDI mainly measures shipping costs for dry bulk commodities, including coal, grain, iron ore, and metals, while CBCFI mainly measures shipping costs for coal; second, in terms of the composition, BDI takes 23 shipping routes measured on a time charter and voyage basis while CBCFI takes 14 shipping routes measured on a voyage basis; third, as for the type of vessel, BDI looks at the ships that can carry 15,000 deadweight tons (DWT)-80,000 DWT of cargo (90 percent of the global fleet), while CBCFI focuses on the ships that can carry 15,000 DWT-60,000 DWT [3,12,13]. As the BDI and CBCFI share these similar features, we refer to the abundant research of the BDI to obtain a clear view of current prediction approaches. e analytic methodology of BDI among others is split into three types.
Traditional econometric approaches are initially covered, including vector error correction (VEC), generalized autoregressive conditional heteroskedasticity (GARCH), vector autoregression (VAR), and the autoregressive integrated moving average (ARIMA) approaches. Cullinane et al. [14] first developed a Baltic Dry Bulk Index (BDI) study approach method through the ARIMA approach. Kavussanos and Alizadeh-M [15] developed one season-related ARIMA approach for one independent variant as well as one VAR approach for investigating seasonality in the dry bulk shipping market. Batchelor et al. [16] discussed the performances of VAR and ARIMA as well as vector equilibrium correction (VECM) approaches for the prediction of spot and forward freight rates. To improve the accuracy of BDI forecasts, Tsioumas et al. [17] developed a multivariate vector autoregressive approach with exogenous variables (VARX) approach and the results demonstrate that the VARX approach outperforms the ARIMA approach. Chen et al. [18] applied the ARIMA and the VAR approaches to predict the spot rates of several dry bulk routes and found out that the VAR approach performed better than the ARIMA approach in test-sample forecasts. Adland et al. [19] showed a cointegration-based method for analyzing regional ocean freight rates' dynamic properties.
According to Stopford [20], sea prediction is difficult for statistical and traditional econometrical approaches for capturing the nonlinearity of dry bulk freight rates [21]. Accordingly, a second type of analytic approach currently employs several nonlinearity and artificial intelligence (AI) approaches, comprising artificial neural network (ANN), machine learning algorithm, and nonlinear methods. Li and Parsons [22] applied neural systems to predict monthly tanker freight rates from the short to the long term and found the neural systems outperforming the ARIMA time sequence approach for prediction in the long run. Yang et al. [23] investigated and predicted the freight rate instability alarming pertaining to China Coastal Bulk Freight Index (CCBFI), China Containerized Freight Index (CCFI), and Baltic Freight Index (BFI) by exploiting support vector machine (SVM).
alassinos et al. [24] adopted a chaos approach for predicting the BDI using the invariable coefficients pertaining to the strange attractor under reconstruction, governing the system's evolving process. Guan et al. [25] developed one SVM-based multistep prediction approach to predict weekly Baltic Supermax Index (BSI) data. Şahin et al. [26] develop three ANN approaches based on BDI data and results show that their performances are close, whereas the highest consistency pertains to the ANN by exploiting the past two weekly observations of the BDI data. Among the deep-learning approaches, the long shortterm memory (LSTM) neural network is regarded as a practical technique for handling time series problems [27][28][29]. For example, Nelson et al. [27] employed the LSMT network to predict the future trends of stock prices based on the historical price, alongside technical analysis indicators; Duan et al. [30] implemented the LSTM model for multistep ahead travel time prediction. e above studies demonstrated that NN methodologies generate more effective forecasting performance than conventional time series and econometric approaches.
ough NNs are capable of handling nonlinearity and show better robustness, it is hard to determine the configuration of the NN algorithm; in addition, it tends to fall into over or lacked training, easily resulting in local minimum trapping [11]. is prompts one shift into the third category hybrid methodologies, which usually integrates one noisereducing method by adopting one NN-based algorithm. Leonov and Nikolov [31] proposed a hybrid approach of wavelets and neural networks to investigate the fluctuation in the freight rates of the Baltic Panamax routes 2A and 3A. Bulut et al. [32] established one vector autoregressive fuzzy combined logical predicting approach in terms of time charter rates. Zeng et al. [33] propose a hybrid approach of empirical mode decomposing process (EMD) as well as ANN. Uyar et al. [34] presented one trained recurrent fuzzy neural system with the use of a genetic algorithm for improving long-term dry cargo freight rates prediction to be more accurate. As an efficient strategy to improve the forecasting ability of a single model, ensemble learning has been widely used to improve the model performance [35][36][37][38]. For example, Kamal et al. [35] developed a deep ensemble recurrent network of recurrent neural network (RNN), long-short-term memory (LSTM), and gated rectified unit neural network (GRU) to improve the BDI predictive performance, and results showed that the ensemble method outperforms the single deep-learning approach. Tan et al. [37] proposed an LSTM-based deep ensemble learning model that combined bagging, random subspace, and boosting to forecast ultra-short-term industrial power demand, and they found out that the proposed model obtains higher accuracy and robustness than LSTM, eXtreme Gradient Boosting (XGBoost), and other time series methods. Liu et al. [36] proposed a deep air pollution model for forecasting PM2.5 concentrations based on a wind-sensitive attention mechanism, LSTM, and XGBoost, and experiments illustrate that the proposed approach is superior to a single multilayer perceptron (MLP), SVM, LSTM, and XGBoost. e above studies indicate that models combining machine learning and neural networks usually obtain better predictive results than any single model.
Although much research has been conducted to solve the shipping indices forecasting issues, most do not consider different spot rates or other related factors information. Accordingly, these methods cannot effectively reflect the critical factors that contribute to strengthening the predictive performance. To address this issue, we propose a hybrid model of LSTM and ensemble learning algorithm to handle the CBCFI forecasting problems. e LSTM approach has the ability to acquire the data determined by time and noticeably impacts the predicting process of time series, whereas this approach fails to appropriately mine the implicit relations between exogenous variables in predicting inflection point data. us, to better utilize feature information, it is necessary to incorporate an applicable ensemble learning algorithm to optimize the feature combination in order to construct the feature set that reflects the short/longterm trend of the CBCFI. For the ensemble learning part, boosting and bagging are two of the important ideas; they both combine a set of weak learners to create a strong learner that obtains better performance. In all kinds of machine learning methods, Gradient Boosting Regression Tree (GBRT) [39] and Random Forest (RF) [40] have received much attention and are often used as representatives of ensemble learning algorithms. GBRT and RF are decisiontree-based ensemble learning algorithms that use a boosting and bagging framework, respectively. Because of the implementation of a gradient boosting algorithm or a bootstrap sampling method, GBRT or RF can handle variables fast, making it suitable for complicated tasks. Accordingly, this study adopts GBRT and RF as ensemble learning algorithms for CBCFI forecasting.

Data Description
is section firstly presents the data source. As this dataset includes both historical freight index and other impacted variables, variable correlations are introduced and conducted in this section as well.

Data Source.
All datasets in the present study are supported by the Wind Economic Database [41]. Wind pairs more than 1.3 million macroeconomy-related and industry time series based on effective graphics and data study equipment for elucidating China's economy to finance-related professionals.
e China (Coastal) Bulk Coal Freight Index (CBCFI) is a compound weekday index (null on holiday and weekend) that considers 14 shipping routes [2]. It takes 1st September 2011 as the base period, and the base index is 1,000 points. Now, CBCFI is extensively employed by practitioners of industry and considered a vital economic indicator in coastal coal transportation. Figure 1 displays the composite routes of CBCFI and its corresponding weights. It can be seen that CBCFI mainly describes the shipping market of transporting coal from the north to the south. According to the histogram at the bottom of Figure 1, we notice that six routes whose weights are over 10% hold dominant positions, in which most of them depart from Qinhuangdao Port. For easily reading, in Figure 1, the dominant routes are plotted in red, and for those weight values less than 10% but nonzero, routes are marked in dark grey, while those with zero weights are in light grey.
Previous studies of traditional shipping index (e.g., BDI) generally discuss the daily, weekly, and monthly forecasting cases to evaluate the performance of approaches in short-, mid-, and long-term predictions [11,16,22]. us, in this paper, daily, weekly, and monthly forecasting experiments are conducted and the full available CBCFI data from January 2012 to October 2020 are employed as the experimental datasets, with 2129 daily observations, 422 weekly observations, and 10 yearly observations. According to Figure 2, the CBCFI fluctuations exhibit irregular properties and high frequencies as well as large amplitudes. In the long run, from 2012 to the first half of 2016, CBCFI had a few inflection points. And in the fourth quarter of 2013, CBCFI reached the highest value of 1450.39 since its release and then returned to the low level at the end of the year. From the second half of 2016 to now, CBCFI has experienced a large fluctuation range, especially from 2017 to 2020. Compared with the previous years, the rise and fall of freight rates have been much larger, and the frequency of rise and fall also increases obviously, which reflects the changeable market situation of coastal coal transport. Moreover, the global Coronavirus Disease 2019 (COVID-19) outbreak has generated a public health crisis that started in December 2019, which has certain consequences for the economics of the shipping industry. For seasonal fluctuations, from the trend of each year, except for 2014, CBCFI shows the phenomenon that its index rises in summer and winter in other years, but the increase rate is different each year. e main reason is that demand for coal tends to rise in summer and winter, traditionally the peak seasons for electricity use, leading to a seasonal rise in coal transport prices. Although not every summer or winter in the past 7 years has seen an increase in rates, historical statistics indicate a high probability of rates rising in both summer and winter each year.
To justify the uncertain and nonlinear characteristics of the CBCFI series, descriptive statistics of the daily CBCFI and its rate of change are provided, and the results are listed    Table 1.
e rate of change for the CBCFI is R t � ((y t − y t−1 )/y t−1 ).
From the above statistics, it can be observed that, in terms of the skewness and kurtosis, the skewness of CBCFI series and its rate of change series are greater than zero, and the kurtosis of each is either less than or greater than three. erefore, the daily CBCFI and its rate of change series are left-skewed and right-skewed, respectively, which indicates that they have the characteristics of a sharp peak and a thick tail. In addition, the J-B test results indicate that both time series do not follow a normal distribution because they reject the hypothesis of the Jarque-Bera (J-B) statistic at 5% significant level. Moreover, both daily CBCFI and its rate of change series are indicated to be nonstationary, based on the results of the augmented Dickey-Fuller (ADF) unit root test; specifically, the t-statistics values summarized in Table 1 are less than the value of the critics (−2.86)). Furthermore, the Ljung-Box statistics to the squared residuals (also known as Q-Statistics of Square Residuals) is applied to test the nonlinearity of the CBCFI and its rate of change series. For both series at lag 6, we reject the hypothesis of no autocorrelation at the 5% significance level. Autocorrelated squared residuals are indications of nonlinearity [2]. e above statistics indicate that the CBCFI has the uncertain, nonstationary, and nonlinear properties, and those complicated features justify the need for the proposed ensemble approach.

CBCFI Impact Factor Data.
Within other markets, freight rates inside the shipping industry receive the formation based on the interacting processes pertaining to various factors, such as the prices of the transported cargo, the demand and supply, and the cost. us, if these elements influence the CBCFI, then they should be investigated. e demand for coastal transport arises from the need of exporters and importers to transport the coal to specifically domestic destinations. e "derived" demand is mainly affected by domestic economy and trade, such as the Import and Export Trade (IET), the industry production, domestic coal inventory, the contract rates, and spot rates of domestic thermal coal. As the domestic economy is improved, domestic trade will be promoted, and shipping transport will be more demanded. Moreover, random shocks based on emergency events (e.g., 2015 Tianjin explosions and 2019 COVID-19 outbreak) and cyclical and seasonal market movements of the coal transported by sea further substantiate that the demand for shipping transport depends on macroeconomic factors. Besides, transportation consumption and costs constitute other CBCFI determinates. On the promise of collecting data, the relevant data of the CBCFI are considered maximally, which could roughly be classified into three categories: (i) Domestic and Overseas ermal Coal Spot Prices. It is worth mentioning that Qinhuangdao Port is the greatest coal export port on the globe, and the routes of CBCFI are the dominant carriers for thermal coal transportation from north to south. us, the spot Note. J-B represents the Jarque-Bera (J-B) test in which a goodness-of-fit test is performed for examining whether the time series follows a normal distribution [42]. ADF represents the augmented Dickey-Fuller (ADF) test with a null hypothesis of the existence of autocorrelation in the sample series [43]. Q represents the Ljung-Box statistics to the squared residuals (also known as Q-Statistics of Square Residuals) in which a nonlinearity test [44] under the null hypothesis of linearity and residuals of a properly specified linear model should be independent [45]. Journal of Advanced Transportation price of thermal coal is considered as one of the critical factors affecting the fluctuation of CBCFI. Furthermore, with the enlarged opening of the market, the influence of the international market on the domestic market will be increasingly intensified, and the correlation between the domestic and foreign markets will be further enhanced. Even at this stage, the price of thermal coal in the international market will affect the export of coal, which will be reflected through the price of thermal coal. (ii) Coal Inventory. Coal inventory refers to a critical element that impacts the coal price based on the economic new normal [46]. Coal inventory is the result of coal production, transportation, consumption, and other factors. It is basically the same as the factors of price formation and has a leading effect on the price change of thermal coal. In recent years, Qinhuangdao coal inventory has become the weathervane of thermal coal price change [47]. (iii) Fuel Oil and Crude Oil Prices. On the one hand, fuel oil prices have a significant impact on coal prices as fuel oil prices influence the cost of shipping. When the price of fuel oil increases, the shipping cost will increase; when the price of fuel oil declines, the shipping cost will decrease. On the other hand, coal and crude oil are the most basic energy sources, and the sharp rise in oil prices has also contributed to the rise in thermal coal prices [48]. e influence of crude oil price on thermal coal price is reflected in the following aspects. First, crude oil, that is, one type of vital fossil energy source, refers to a dominant substitute to coal. us, crude oil price instability can impact coal demand and price in a corresponding manner [49]. When crude oil rises in price, the demand for coal would rise as the substitute. e fluctuation of coal price will influence the bulk coal shipping cost somehow and thus influence the CBCFI values. Note that the "prices" discussed in the present study contain the spot prices and futures prices (a spot price is an offer to complete a commodity transaction immediately, while a future contract locks in a price for future delivery). Table 2 summarizes the abovementioned factors in the basic feature set and will be optimized later. Variables are numbered sequentially for the feature description.

Variables Correlation Analysis.
To give a clear and simple view of the correlations between CBCFI and relevant impact factors, the Pearson correlation coefficient [50] is calculated and the results are presented in Table 2. Domestic and overseas thermal coal spot rates (u 1 ∼u 8 ) show obvious correlation with CBCFI, particularly, "Qinhuangdao Port-Q5000 Index-FOB," "Jintang Port-Q5000 Index-FOB," and "Guangzhou Port-Q5500 Index (Indian Coal)-EXT" with the largest Pearson correlation values; on the other hand, coal inventories (u 9 ∼ u 12 ) show negative correlations with the CBCFI series. Based on the results, we remove two variables with the smallest Pearson coefficient value (u 9 and u 11 ), and the rest of the 22 variables are selected as the input variables.
In addition, to demonstrate the necessary property for selecting the ensemble learning algorithm to deal with the CBCFI prediction problem, the relationships between the 24 variables and CBCFI are checked. en, one color-coded Pearson correlation matrix is generated. e numerical value one with the expression of dark blue indicates one overall positive linear correlation of two characteristics, whereas chartreuse indicates zero, demonstrating no linear correlation. As is shown in Figure 3, there is an interrelation of different degree between the 24 variables. For example, fuel oil spot and futures prices (u 14 ∼ u 19 ) are highly correlated with crude oil spot and futures prices (u 20 ∼ u 24 ). erefore, the ensemble learning algorithms dealing with multidata relations are considered to solve the CBCFI prediction problem.

Methodology
In this section, we first give a problem statement to present an overview of the prediction problem researched in this work.
en, the core concept and the flowchart with algorithm pseudocode of the proposed hybrid model structure are presented. At last, the prediction accuracy measurements are described.

Problem Statement.
e goal of this work is to predict the CBCFI values of the next day given historical data. We define the historical observations of the target CBCFI as Y � (y 1 , . . . , y t , . . . , y T ) T ∈ R T , where T represents time window size and y t is the CBCFI value at time t. Similarly, auxiliary factors are represented by X � (X 1 , . . . , X t , . . . , X T ) T ∈ R T×D , where D specifies the number of related factors. X t ∈ R D is the values of all D related factors at time t, and X d ∈ R T is the value of the d th factor in time window T. us, the prediction target y T+1 could be defined as follows: where F(·) is the mapping function we are aiming to learn.

Preparing Data. Given a CBCFI time series s with length
, the time series firstly should be preprocessed. Data cleaning and data normalization are the key data preparation tasks for further forecasting tasks. As the given time series has no missing value and to preserve the characteristics of real-world data, no noise reduction or data smoothing was performed on data. In this study, we only normalized the data with min-max normalization algorithm. e time series s(t i ) is normalized and the resulting normal data are expressed as s � s(t i ) N i�1 :

LSTM-Ensemble Learning (LSTM-EL) Approaches for CBCFI Forecasting.
Traditional freight indices prediction methods usually use the historical time series data of the target with ignorance of other impact factors. Generally speaking, the trend of CBCFI is reflected in two ways: historical CBCFI information and impact factor information. e historical time series information sometimes is sparse and thus not enough to produce accurate prediction, while some close impact factor information could reflect the movement of CBCFI from different aspects to a certain extent and with the support of the powerful database of Wind Economic Database (Wind Economic Database, refer to https://www.wind.com.cn/en/edb.html).
Our proposed LSTM-EL model is composed of two layers: in the first layer, a cluster of LSTMs is constructed to generate the embedding features, and in the second layer, an ensemble learning method for final CBCFI prediction. Figure 4 illustrates the overall framework of the bilevel hybrid LSTM-EL configuration; it consists of an LSTM method and two parallel ensemble learning methods. Note that the LSTM-EL model includes two different hybrid models, LSTM-GBRT and LSTM-RF. GBRT behaves similarly to RF in the manner of fitting multiple trees, but it instead fits them in a sequential manner, and hence another expectation of this paper aims to explore which ensemble algorithm fits the CBCFI forecasting problem better. For detailed descriptions of single RF, GBRT, and LSTM approaches, see Appendix A.
In the first layer, the dataset is first split into the insample and the out-of-sample. e preliminary embedded LSTM focuses on extracting the time-dependency information from variables of the in-sample and generates embedding features from the last LSTM layer, GBRT/RF is taken as an ensemble learning method to make the final Qinhuangdao Port-Q5500 Index-FOB (a spot price is an offer to complete a commodity transaction immediately, while a futures contract locks in a price for future delivery. It leaves its point of origin) refers to the price of coal shipped from the warehouse, including the price before the coal is put into the warehouse and the warehouse usage fee) e idea behind GBRT is that each iterator is used to reduce the previous residual. To reduce these residuals, a new tree in the direction of the gradient descent of the loss function is created. After LSTM forms the training samples, the recursive form regression tree is as the equation to calculate F g,m (s t+1 ′ ) in Algorithm 1.
Before building the bilevel LSTM-EL prediction architecture, several hyperparameters should be determined. For the upstream model LSTM, the LSTM network with optimization of multiple hyperparameters has achieved acceptable performance when applied on sequence data [51]. For the time series problem, the key hyperparameters include the number of LSTM layers, the number of nodes in each LSTM layer, the number of fully connected layers, and the time-lags, and for ensemble learning algorithm, the number of trees and the maximum depth of a tree are the most essential parameters. In our work, the time-lag and embedding size are the most important hyperparameters.
(1) Time-lag: the time-lag parameter has a significant impact on the performance of time series forecasting [52], as it determines the length of the historical sequence that should be included for the training. (2) Embedding size: that is, the number of neurons for the last layer in the LSTM network represents the input-data dimension of the downstream ensemble learning models and further determines the complexity of GBRT and RF. If the embedding size is very high, then the LSTM will be overfitting on training instances and increase the training difficulties of the downstream models, and if its size is too small, then it will be unable to memorize the time-dependency information collected from the time-lag sequences.
However, to the best of our knowledge, there are no general rules to choose the time-lag and the hidden layers' size. erefore, we investigated the effect of key parameters   Input: historical observations: one unit). s t+1 ′ is then being used to predict y t+1 in the second layer.
Step 2. LSTM prediction: y t+1 ′ � Dense(LSTM(S t ))//y t+1 ′ ∈ R is the prediction value given by LSTM part. Note that we just use y t+1 ′ to optimize the parameters of the embedding part and the final prediction of our method is obtained by the downstream methods, GBRT and RF.
Step 3. Optimization: min θ ‖y t+1 − y t+1 ′ ‖ 2 2 // e embedding part is trained by minimizing the objective function shown above and its parameters θ are updated via backpropagation. //GBRT or RF is taken as downstream method to make the final predictions. //Model 1: GBRT part Given the embedding features s t+1 ′ generated from S t : Input: training set s t+1 ′ T t�1 , differentiable loss function L(y t+1 , F g (s t+1 ′ )), and the maximum number of trees M. //F g is the decision tree mapping function, and the optimal F g can be obtained through minimizing the loss function.
. //ρ m is the step-size, g m is the base learner, and lr is the learning rate. For each step, a new decision tree is aimed at correcting the error made by its previous base learner. Journal of Advanced Transportation while keeping the other parameters fixed, and grid search [53] was applied to find the optimum hyperparameters. To do so, our model requires some basic settings. In our work, we built our model using two LSTM layers and one fully connected layer. e number of the neurons for the first layer is equal to that of the second layer. e search space of the above four critical parameters is illustrated in Table 3.
For each combination of the time-lag and embedding size, the LSTM-EL model is designed and trained, and the corresponding optimal combination of the number of trees and depth of trees is selected using the grid search. Here, for the sake of brevity, different combinations of the time-lag τ and the embedding size d ′ are evaluated by root mean square error (RMSE) and mean absolute percentage error (MAPE) and the results are shown in Figures 5 and 6. According to previous studies, Li et al. [54] showed that a small time-lag cannot guarantee enough long-term memory inputs for our LSTM-EL model; thus, the model cannot fully exploit the LSTM for long-term memory modeling. Large time-lags permit an increased number of unrelated inputs, which increases the model's complexity and the difficulty of learning useful features. It can be observed that the effect of the number of nodes in each neuron layer shows that, with the increase in the number of neuron nodes, the prediction performance improves slightly. us, we set different timelag τ and embedding size in the successive experiment to optimize both accuracy and time efficiency for both LSTM-GBRT and LSTM-RF, as indicated by the RMSE and MAPE. Based on the results of the experiments, Table 4 summarizes the best parameters of the obtained model.

Accuracy Measurement.
For measuring the prediction precision pertaining to the developed approach, several evaluation criteria are implemented, such as the RMSE as well as the MAPE. Generally, with the decline of MAPE and RMSE, the approach will be more precise. However, it is well known that, for a given prediction, actual outcomes above and below the prediction are treated asymmetrically when using MAPE and RMSE [55]. For the mentioned reason, to measure the data fluctuations (e.g., upward, stable, or downward), direction matching rate (Dsta) is employed. In addition, we utilize mean absolute scaled error (MASE) to assess if the developed prediction approach outclasses the naïve prediction method [56]. For detailed description of MAPE, RMSE, Dsta, and MASE criteria, see Appendix B.

Daily, Weekly, and Monthly CBCFI Forecasting.
After determining the best network architecture for the prediction task, the training set was utilized to train our LSTM-EL model until convergence. Evaluations were conducted using the test set. To analyze the generality of the hybrid LSTM-EL structure, we use a dataset with day-to-day, week-to-week, and month-to-month bases. Specifically, the weekly and monthly data are calculated as the average of daily CBCFI. In addition, to avoid overfitting problem, early stopping and validation sets are utilized in the present study, and the percentages for training, testing, and validating sets are 60%, 20%, and 20%, respectively.
In our prediction of each forecasting approach, LSTM-RF, LSTM-GBRT, GBRT, and RF models are applied, all of which are evaluated by calculating MAPE, RMSE, Dsta, and MASE. A rolling approach is implemented to conduct a next-day/ weekly/monthly CBCFI forecast. e approach uses the actual value of the predictor variable in the previous period for making a prediction in the testing set. Note that the time-lag is fixed, and new data are added for further t + 1 prediction. Figure 7 displays how the rolling approach works.
Note that, in weekly forecasting, each point represents the weekly CBCFI value and a new weekly CBCFI value is calculated by every new 5 daily CBCFI values (only workdays data). Likewise, in monthly forecasting, each point represents the monthly CBCFI value and a new monthly CBCFI value is calculated by the working days in each new month, automatically excluding weekends (Saturday and Sunday).
We next conduct the daily, weekly, and monthly CBCFI forecasting experiments, respectively. e CBCFI data from January 2012 to October 2020 are sample data. To evaluate the predictive performance of LSTM-EL models, we split the data into training data, validating data, and testing data. e ratio for each dataset is 6:2:2. Figure 8 shows the learning curves of mean square error (MSE) for the validation data and training data for 100 epochs in daily, weekly, and monthly forecasting cases. e learning curves show a good fit because of the decrease in training and validation loss to one stable data with a minimal gap between the two final loss values. Table 5 compares the predictive performance of two hybrid LSTM-EL approaches with the corresponding single EL approaches in the rolling forecasting approaches. For daily CBCFI forecasting cases, we found that the predictive performances for orientation matching and errors are enhanced through the introduction of the proposed hybrid structure. And all the values of MASE less than 1 indicate that four approaches outperform the average one-step naïve forecast. Note that the predictive performance in MAPE, RMSE, and Dsta is improved by considering hybrid structure forecasting. Compared to the predictive performances of corresponding single approaches, the improvement percentages (the improvement percentage is calculated by the difference between the evaluation index value of the hybrid model and the conventional model over that of the conventional model) in MAPE of LSTM-GBRT and LSTM-RF are 22.47% and 24.59%, respectively, of RMSE are 41.54% and 24.73%, respectively, and of Dsta are 72.84% and 69.36%, respectively. e above results indicate that LSTM-GBRT exhibits the most significant improvement level.  [1,50] For weekly CBCFI forecasting cases, consistent with daily CBCFI predicting processes, performance enhancement rates are positive for both errors and Dsta. e improvement percentages in MAPE of LSTM-GBRT and LSTM-RF are 54.62%, 39.48%, respectively, of RMSE are 32.12% and 11.77%, respectively, and of Dsta are 50.91% and 67.17%, respectively. Moreover, the improvement in MAPE for all two LSTM-EL approaches shows higher significance for the weekly CBCFI predicting processes than the daily prediction.
us, using a hybrid structure to extract the time-dependent characteristics between features in LSTMbased prediction enhances accuracy. In comparison with daily information, the hybrid approach shows significant improvements for the weekly CBCFI predicting process.
Since predicting long-term CBCFI with low-frequency time-scale data raises several challenges, this study examines how the hybrid approach promotes monthly data forecasting to be accurate. Consistent with daily and weekly forecasting, hybrid approaches outperform the single ensemble learning approaches. Notably, the hybrid structure promotes the precision noticeably, MAPE improvement for GBRT and RF are 53.72% and 61.40%, respectively. RMSE improvements for LSTM-GBRT and LSTM-RF are 50.32% and 57.83%, respectively. Moreover, the MASE of the monthly data of hybrid approaches is less than 1, indicating that hybrid approaches in this approach outperform the average onestep naïve forecast. Table 6 compares the general predictive performances of hybrid LSTM-EL approaches over three time scales. e MAPE and RMSE values indicate that hybrid LSTM-EL approaches perform best in weekly CBCFI forecasting and achieve the most obvious improvement of accuracy in monthly forecasting. And the performance indicators of different approaches (Table 5) show that the LSTM-GBRT approach outperforms LSTM-RF in daily forecasting while LSTM-RF achieves a higher accuracy in weekly and monthly forecasting, which indicates that LSMT-GBRT is better capable of dealing with high-frequency data while LSTM-RF is more suitable for mid-or low-frequency data. Figure 9 shows the comparisons between the real CBCFI values and the predicted values produced by LSTM-EL approaches and their corresponding single EL approaches. e inset square of Figure 9(a) shows how the hybrid and single EL approaches perform at the uptrend, bottom, and downtrend. It is found that four approaches are capable of predicting the daily, weekly, and monthly tendency of CBCFI, but the outputs from LSTM-EL approaches derive less than from the actual CBCFI values. In the three situations, single EL predictions show obvious fluctuations that actual CBCFI values do not have. For example, in situation I beginning in January 2019, the GBRT and RF predictions display heavily up and down while they do not happen in the actual CBCFI trend, LSTM-GBRT and LSTM-RF still keep close to the real CBCFI values, and similar phenomena are observed in situations II and III. Figures 9(b) and 9(c) show that single GBRT and RF models produce large errors in weekly and monthly forecasting, especially at the bottom or top of the trendline. In contrast, the hybrid LSTM-EL forecasting approach reproduces CBCFI trends and generates relatively small errors. For example, in weekly forecasting beginning in December 2019, the CBCFI dropped from approximately 1,000 points to a historic low of roughly 450 points, since the outbreak of COVID-19 hits the economy across the globe. And the CBCFI was kept at a low point until May 2020, despite the epidemic was gradually under control. In this case, only LSTM-GBRT and LSTM-RF predictions reproduce the CBCFI trend, while other approaches display large fluctuations not existing in the actual CBCFI itself. Likewise, in the monthly forecasting, the single EL approaches GBRT and RF predictions deviate, and the LSTM-GBRT and LSTM-RF predictions are very close to the actual CBCFI.

Diebold-Mariano (DM) Test.
To evaluate whether there is any statistically significant difference between the hybrid Window T -3: and conventional models, the Diebold-Mariano (DM) test [57] is implemented to compare the testing-sample prediction results. e DM test is widely used in determining whether the differences of time series predicting accuracy by different models are substantially crucial from a statistical perspective [58]. Table 7 summarizes the results of the DW test for the daily, weekly, and monthly CBCFI datasets, respectively. p value less than 0.05 indicates the rejection of the null hypothesis that there is no difference between the two compared forecasting models. It can be seen that LSTM-EL models have significantly different accuracies when compared to other benchmarks. For the daily dataset, LSTM-RF and LSTM-GBRT present statistical differences in predictive performance when compared with other models but there is no statistical difference in accuracies between LSTM-RF and LSTM-GBRT. Similarly, in weekly and monthly forecasting cases, the predictive improvement offered by incorporating ensemble learning algorithms is statistically significant while the predictive improvements between different conventional ensemble learning algorithms or LSTM-based hybrid models do not display statistically predictive performances.

Feature Importance
Analysis. Moreover, Figure 10 presents the features ranked by complying with the clarified variance the respective feature facilitates the LSTM-GBRT approach. In this case, the features are plotted against their relative importance, that is, the percent significance regarding the critical feature. For brevity, Figure 10 only presents the top 6 features with the sum of feature importance over 98%. It can be clearly seen that coal inventory at Tianjin Port influences the monthly CBCFI values but less important for daily and weekly data, which indicates that coal inventory is more likely to impact the long-term forecasting but not short-or midterm forecasting. In addition, domestic and overseas thermal coal spot rates and crude oil prices have obvious impacts on daily, weekly, and monthly CBCFI values, while coal inventory and fuel oil price are less important for daily and weekly CBCFI. Specifically, Guangzhou Port-Q5500 Index (Australian Coal)-EXT shows a great impact on daily, weekly, and monthly CBCFI values. Moreover, WTI crude oil spot price has an obvious impact on daily CBCFI values, while weekly CBCFI is more sensitive to Dubai crude oil spot price. is may result from these two crude oil indices that mainly serve       different regions. China, the world's biggest oil importer, has ramped up purchases of American crude for years [59]. WTI crude oil refers to oil extracted from wells in the US and sent via pipeline to Cushing, Oklahoma, which is the main benchmark for oil consumed in the United States [60]. Crude oil as a substitute for coal and thus Chinese domestic crude oil price would be more sensitive to the daily changes of the US crude oil prices. In addition, because crude oil is a substitute for coal, the fluctuation of coal price will influence the bulk coal shipping cost somehow, and thus the CBCFI values will be sensitive to the WTI crude oil prices in the short time scale. On the other hand, as Dubai crude oil price index is the main reference for Persian Gulf oil delivered to the Asian market and roughly half (44.8%) of Chinese imported crude oil originates from nine Middle Eastern nations [61], the CBCFI index will be affected by the fluctuation of Dubai crude oil price as well. Moreover, Nanovsky [62] introduced an oil price-distance interaction variable to explain how global trade behaves as a result of oil price changes. He found that when oil prices increase, international trade becomes more localized in that countries begin trading relatively more with their neighbors. In contrast, when they decrease, trade becomes more dispersed in that the distance between countries becomes less relevant. us, the price of crude oil usually presents volatility in one week, and thus, the weekly Chinese coastal shipping cost may look at the price of the Asian crude oil market more.

e Impact of the Supply and Demand on CBCFI.
As CBCFI is an index for shipping price, in addition to the factors discussed above, supply and demand are essential factors. For CBCFI, the supply should be available bulk fleet and the demand should be the amount of coal that needs to be shipped. On the supply side, specifically, according to the routes and ports of CBCFI, the supply should be the available bulk fleet at Tianjin Port, Jintang Port, Qinhuangdao Port, Caofeidain Port, and Huanghua Port. However, we did not find any available open-source fleet data.
On the demand side, specifically, the CBCFI represents the coal transportation need mainly from northern China to southern China. e national statistics ( Figure 11) indicate that most of the coal was used for thermal power generation (56.40%) and steel-making (18.1%). erefore, the southern thermal power generation and steel production are considered as indexes of the CBCFI-related demand. However, we only found the above data on a nationwide basis and cannot get the data only for southern China. To estimate the demand, we use the sum of utility electricity consumption of the southern coastal provinces with at least one port city included in the CBCFI routes, including Shanghai, Jiangsu, Zhejiang, Fujian, and Guangdong, as a substitution to the southern thermal power generation. For the steel production, we use the total domestic steel production as a substitution index of southern provinces' steel production.
Besides, the coal sources in China consist of 90% selfproduced mainly from northern China and 10% coal imports [47]. e coal production and coal imports also may have impacts on the CBCFI-related demand. Specifically, both the coal production increase and coal imports decrease may increase the CBCFI-related demand. erefore, domestic coal production and coal imports are also considered as the demand-related factors.
Note that the above demand and demand-related factors are only available on a monthly basis. Accordingly, southern utility electricity consumption, domestic steel production, domestic coal production, and coal imports are added to the monthly forecasting model.  demand and demand-related factors, particularly an obvious improvement in MAPE (12.67%) and RMSE (10.01%) of LSTM. And for hybrid models, the improvement percentages in MAPE of LSTM-GBRT and LSTM-RF are 5.32% and 5.16%, respectively, and of RMSE are 5.18% and 5.01%, respectively. Figure 12 presents the top 7 features with the sum of feature importance over 98%. Southern utility electricity consumption ranks the top, followed by Guangzhou Port-Q5500 Index (Australian Coal)-EXT, domestic steel consumption, and coal inventory at Tianjin Port. As not all self-produced coal needs to be transported, domestic coal production shows little importance for CBCFI forecasting. e above results illustrate that demand factors could lead to a higher monthly predictive accuracy.

Conclusion
e present study attempted at enhancing the forecasting accuracy of the CBCFI by formulating a novel hybrid LSTM-EL approach, which is capable of extracting the useful timedependent information in the data by combining the LSTM technique and ensemble learning algorithms. A rolling forecasting approach is developed for assessing LSTM-EL's forecasting accuracy in comparison with its corresponding single ensemble algorithms. Furthermore, critical factors that influence CBCFI values are discussed and experiments under daily, weekly, and monthly time scales in the rolling forecasting approach are conducted in order to test the performance generality of LSTM-EL approaches. e major intellectual advantages here consist of the emerging method by exploiting artificial neural network and ensemble learning methods to be the useful approach to obtain the shipping freight market's nonlinear and nonstationary features. According to the empirically achieved outcomes, domestic and overseas thermal coal spot rates and crude oil prices have obvious impacts on daily, weekly, and monthly CBCFI values, while coal inventory and fuel oil price are less important for daily and weekly CBCFI. In terms of forecasting accuracy, LSTM-EL approaches outperform the single EL models in three time-scale forecasting cases and generate better results than the naïve forecasts. Moreover, the accuracy improvement by LSTM-EL approaches for different CBCFI time-scale datasets is validated. Results indicate that hybrid LSTM-EL approaches perform best in daily CBCFI forecasting but achieve the most obvious improvement of accuracy in weekly forecasting. In addition, a DM test is implemented to evaluate whether there is any statistically significant difference between the hybrid and conventional models, and the results illustrate that LSTMbased hybrid models present statistical difference in predictive performance when compared with other models but there is no statistical difference in accuracies between LSTM-RF and LSTM-GBRT, so do the weekly and monthly forecasting cases. Overall, the LSTM-EL method has a high prospect to predict the CBCFI index in an accurate manner. e mentioned emerging method is capable of acting to be one effective tool to make the decisions regarding chartering and shipping based on uncertain properties and further being incorporated into management toolkit by shipping industry practitioners. e developed method and outcomes widen freight rates forecasting study and indicate probable subsequent study in relevant aspects fields. e deep-learning approach exhibits one prominent performance as opposed to the conventional statistics-related approach since it is capable of mapping the initial information for a nonlinear approach, which generates more effective influence. And long short-term memory (LSTM) based on the concept of recurrent neural network (RNN) presents an outstanding ability in time series predictions. On the other hand, ensemble learning methods refer to machine learning technique that combines several bases approaches in order to minimize the causes of error in learning approaches, such as noise, bias, and variance, for improving the overall predictive performance of the approach. In this paper, two prevailing approaches are focused on, (i) Random Forest (RF) and (ii) gradient boosting regression tree (GBRT).
(1) Long short-term memory (LSTM) LSTM, developed by Hochreiter and Schmidhuber, refers to on e special type of recurrent neural networks (RNN) [63]. LSTM consists of a unique set of memory cells that trains the data through a time backpropagation algorithm, capable of solving the longterm dependence problems of RNN, and thus is suitable for time series problems. e schematic diagram of the LSTM approach is displayed in Figure 13. LSTM has options for adding or deleting the data of its cell condition, as achieved with the use of cell gates. e standard LSTM can be expressed as follows. e respective step t and its corresponding input sequence are denoted as X � x 1 , x 2 , . . . , x t , and the three types of gates are input gate i t , output gate o t , and forget gate f t . e passed information can be determined whether to be remembered or forgotten by the output of the hidden layer h t−1 and input x t of the current layer. e activation function for limiting the outputted data inside the range of [0, 1] is as follows: e input gate aims at determining the appropriate input information (h t−1 , x t ) to the cell. Its activated function is to set the forgetting gate. C t denotes a "candidate" hidden state determined from the previous hidden state and the current input. C t expresses the internal memory of the unit. For achieving the purpose of retaining the corresponding information, C t integrates the previous memory, under the multiplication from the gate that is forgotten, and the newly hidden state below under the multiplication from the input gate.
(A.2) e output gate controls the data able to be outputted. Likewise, the activation function aims at setting the gate this is forgotten. After the memory cell state gets updated by the tan h activation function, the multiplication of dot determines the to-beoutput information. RF expresses one combined learning algorithm containing decision trees, introduced by Breiman [40]. us, each tree of RF receives the training in a separate manner on an independent training set under the selection in a random manner. As opposed to the conventional decision tree, RF superiority has the reflection within two aspects. On the one hand, the trees built show inconsistency as generated in various training sets of a bootstrap subsampling and various random subdivided sets pertaining to characteristics for the split in terms of the respective tree node. On the other hand, the subsets of features are selected randomly. In this case, RF can achieve both low bias and low variance output and is not easy to fall into overfitting [64]. For regression, the final prediction is the average of the predictions from the set of decision trees. In RF, out-of-bag (OOB) error rate generally serves to be one way for measuring evaluation indices' significance [64]. (3) Gradient boosting regression tree (GBRT) In terms of an established set D with n examples and m features D � (x i , y i ) (|D| � n, x i ∈ R m , y i ∈ R), a tree ensemble approach applies K additive functions for predicting the output.
where Γ � f(x) � ω q(x) (q: R m ⟶ T, ω ∈ R T ) denotes the regression tree space. Specifically, q denotes the configuration of the respective tree mapping one instance to the relevant leaf index, T expresses the number of leaves in the tree, and the respective f k represents one single tree structure q and leaf weight ω. Inconsistent with decision trees, the respective regression tree covers one continuous score on the respective leaf, and ω i is used for representing the score on the ith leaf.

B. Evaluation Criteria
Model performance evaluation criteria MAPE and RMSE have the following formulation: where n denotes the size pertaining to the dataset under the testing process and Y r and Y p represent the actual and forecasted data at time t, separately. Generally, the lower the RMSE and MAPE data are, the more accurate the approach will be. Nevertheless, specific to a set predicting process, practical results over and under the forecast receive the asymmetrical treat when using MAPE and RMSE [55]. us, direction matching rate (Dsta) is used to measure the data fluctuations (e.g., upward, stable, or downward), which is defined by Dsta � 1 n n t�1 u(t), e range of Dsta value is [0, 1]. e more approaching the Dsta data is to 1, the greater the precision pertaining to the direction-related predicting process concerned with the approach will be, and vice versa. In addition, we utilize mean absolute scaled error (MASE) for assessing whether the proposed predicting approach outperforms naïve forecasting method [56].
where e t denotes the prediction error determined to be (Y r − Y p ) and (Y p − Y p−1 ) refers to the naïve forecast's error. at is, MASE less than 1 indicates that the approach generates a more effective prediction than the calculated naïve predictions.
Data Availability e data applied here originate from the public Wind Economic Database, IEA, National Bureau of Statistics of China, and Statista, and the data could receive the assessment from https://www.wind.com.cn/en/edb.html, https:// www.iea.org/data-and-statistics, http://www.stats.gov.cn/ tjsj/, and https://www.statista.com/statistics, respectively.

Conflicts of Interest
e authors declare that they have no conflicts of interest.