China Coastal Bulk Coal Freight Index (CBCFI) reflects how the coastal coal transporting market’s freight rates in China are fluctuated, significantly impacting the enterprise’s strategic decisions and risk-avoiding. Though trend analysis on freight rate has been extensively conducted, the property of the shipping market, i.e., it varies over time and is not stable, causes CBCFI to be hard to be accurately predicted. A novel hybrid approach is developed in the paper, integrating Long Short-Term Memory (LSTM) and ensemble learning techniques to forecast CBCFI. The hybrid LSTM-based ensemble learning (LSTM-EL) approach predicts the CBCFI by extracting the time-dependent information in the original data and incorporating CBCFI-related data, e.g., domestic and overseas thermal coal spot prices, coal inventory, the prices of fuel oil, and crude oil. To demonstrate the applicability and generality of the proposed approach, different time-scale datasets (e.g., daily, weekly, and monthly) in a rolling forecasting experiment are conducted. Empirical results show that domestic and overseas thermal coal spot prices and crude oil prices have great influences on daily, weekly, and monthly CBCFI values. And in daily, weekly, and monthly forecasting cases, the LSMT-EL approaches have higher prediction accuracy and a greater trend complying ratio than the relevant single ensemble learning algorithm. The hybrid method outperforms others when it works with information involving a dramatic market recession, elucidating CBCFI’s predictable ability. The present work is of high significance to general commerce, commerce-related, and hedging strategic procedures within the coastal shipping market.
Nowadays, 90% of global trades are completed via sea transportation [
CBCFI represents the voyage-charter freight rate conditions pertaining to the market of bulk coal shipping in coastal areas. This indicates the unstable property pertaining to the coal bulk shipping market, as well as reflecting the developing states pertaining to the China economic condition and domestic business tendency. Thus, it refers to the “weatherglass” pertaining to the market of bulk coal shipping in coastal areas. Due to the mentioned feature, numerous internal personnel and specialists attempt at estimating the subsequent tendency pertaining to the market of bulk coal shipping in coastal areas by accurately predicting the bulk coal freight index for guiding company strategy decisions. Additionally, forecasting CBCFI value enables operating personnel and decision-making personnel for managing market trends and avoiding risks inside the coastal coal shipping market. Furthermore, it helps industries and manufactories with the domestic shipping system.
Previous studies indicate that the shipping freight index usually presents complicated instability features to be uncertain, cyclical, and nonlinear properties [
Recently, ensemble learning algorithms are extensively employed for analyzing the multivariable prediction. Ensemble learning algorithms (e.g., random forest (RF) and gradient boosting regression tree (GBRT)) are effective in determining the essential variables of a time series and investigate the inner relations among variables [
Ensemble learning algorithms’ fast advancement presents one novel idea for the way of utilizing multidata and improves the readability from the original data. We here combine AI and ensemble learning algorithms for formulating one emerging hybrid approach, an ensemble learning (LSTM-EL) approach (e.g., LSTM-GBRT and LSTM-RF) by exploiting Long Short-Term Memory (LSTM) for CBCFI forecasting. Inside the approach, the LSTM layer obtains details dependent on time within the data, and the GBRT/RF layer shows great robustness of ensemble learning in terms of the training of approach.
The present work has the following organization. Section
Since freight rates are uncertain and unstable, the approaches to quantitatively analyze the rates arouse long concern from the shipping industry. As a result, increasing literature proposes approaches to predict freight rates, many of which use the Baltic Dry Index (BDI) [
Traditional econometric approaches are initially covered, including vector error correction (VEC), generalized autoregressive conditional heteroskedasticity (GARCH), vector autoregression (VAR), and the autoregressive integrated moving average (ARIMA) approaches. Cullinane et al. [
According to Stopford [
Though NNs are capable of handling nonlinearity and show better robustness, it is hard to determine the configuration of the NN algorithm; in addition, it tends to fall into over or lacked training, easily resulting in local minimum trapping [
Although much research has been conducted to solve the shipping indices forecasting issues, most do not consider different spot rates or other related factors information. Accordingly, these methods cannot effectively reflect the critical factors that contribute to strengthening the predictive performance. To address this issue, we propose a hybrid model of LSTM and ensemble learning algorithm to handle the CBCFI forecasting problems. The LSTM approach has the ability to acquire the data determined by time and noticeably impacts the predicting process of time series, whereas this approach fails to appropriately mine the implicit relations between exogenous variables in predicting inflection point data. Thus, to better utilize feature information, it is necessary to incorporate an applicable ensemble learning algorithm to optimize the feature combination in order to construct the feature set that reflects the short/long-term trend of the CBCFI. For the ensemble learning part, boosting and bagging are two of the important ideas; they both combine a set of weak learners to create a strong learner that obtains better performance. In all kinds of machine learning methods, Gradient Boosting Regression Tree (GBRT) [
This section firstly presents the data source. As this dataset includes both historical freight index and other impacted variables, variable correlations are introduced and conducted in this section as well.
All datasets in the present study are supported by the Wind Economic Database [
The China (Coastal) Bulk Coal Freight Index (CBCFI) is a compound weekday index (null on holiday and weekend) that considers 14 shipping routes [
Illustration of the CBCFI composition and the corresponding shipping routes (DWT in bracket represents the “deadweight tonnage (DWT)” of the ship).
Previous studies of traditional shipping index (e.g., BDI) generally discuss the daily, weekly, and monthly forecasting cases to evaluate the performance of approaches in short-, mid-, and long-term predictions [
According to Figure
Tendency of daily CBCFI dataset adopted for predicting process.
To justify the uncertain and nonlinear characteristics of the CBCFI series, descriptive statistics of the daily CBCFI and its rate of change are provided, and the results are listed in Table
Descriptive statistics of daily CBCFI and rate of change of CBCFI.
Statistic | Daily CBCFI | Rate of change for CBCFI |
---|---|---|
Mean | 722.6500 | −0.0002 |
Median | 675.4800 | −0.0012 |
Standard deviation | 223.8494 | 0.0228 |
Maximum | 1,706.2000 | 0.1186 |
Min | 370.9900 | −0.1337 |
Kurtosis | 1.7801 | 5.6084 |
Skewness | 1.2186 | 0.2550 |
J-B | 658.9484 | 626.4104 |
0.000 | 0.000 | |
ADF [lags] | −4.5298 [ | −11.3960 [ |
0.0130 | 0.0164 | |
26.571 [ | 15.95 [ | |
0.000 | 0.000 | |
Observation | 2129 | 2128 |
From the above statistics, it can be observed that, in terms of the skewness and kurtosis, the skewness of CBCFI series and its rate of change series are greater than zero, and the kurtosis of each is either less than or greater than three. Therefore, the daily CBCFI and its rate of change series are left-skewed and right-skewed, respectively, which indicates that they have the characteristics of a sharp peak and a thick tail. In addition, the J-B test results indicate that both time series do not follow a normal distribution because they reject the hypothesis of the Jarque-Bera (J-B) statistic at 5% significant level. Moreover, both daily CBCFI and its rate of change series are indicated to be nonstationary, based on the results of the augmented Dickey–Fuller (ADF) unit root test; specifically, the
Within other markets, freight rates inside the shipping industry receive the formation based on the interacting processes pertaining to various factors, such as the prices of the transported cargo, the demand and supply, and the cost. Thus, if these elements influence the CBCFI, then they should be investigated. The demand for coastal transport arises from the need of exporters and importers to transport the coal to specifically domestic destinations. The “derived” demand is mainly affected by domestic economy and trade, such as the Import and Export Trade (IET), the industry production, domestic coal inventory, the contract rates, and spot rates of domestic thermal coal. As the domestic economy is improved, domestic trade will be promoted, and shipping transport will be more demanded. Moreover, random shocks based on emergency events (e.g., 2015 Tianjin explosions and 2019 COVID-19 outbreak) and cyclical and seasonal market movements of the coal transported by sea further substantiate that the demand for shipping transport depends on macroeconomic factors. Besides, transportation consumption and costs constitute other CBCFI determinates. On the promise of collecting data, the relevant data of the CBCFI are considered maximally, which could roughly be classified into three categories:
Impact factors description and the corresponding Pearson coefficient values.
Variable category | Variable description | Feature number | Pearson coefficient values |
---|---|---|---|
Domestic and overseas thermal coal spot rates | Qinhuangdao Port-Q5500 Index-FOB (a spot price is an offer to complete a commodity transaction immediately, while a futures contract locks in a price for future delivery. It leaves its point of origin) | 0.550 | |
Qinhuangdao Port-Q5000 Index-FOB | 0.575 | ||
Qinhuangdao Port-Q5500K Index-FOB | 0.512 | ||
Qinhuangdao Port-Q5000K Index-FOB | 0.530 | ||
Jintang Port-Q5500 Index-FOB | 0.554 | ||
Jintang Port-Q5000 Index-FOB | 0.571 | ||
Guangzhou Port-Q5500 Index (Australian Coal)-EXT (EX-tank (EXT) refers to the price of coal shipped from the warehouse, including the price before the coal is put into the warehouse and the warehouse usage fee) | 0.545 | ||
Guangzhou Port-Q5500 Index (Indian Coal)-EXT | 0.571 | ||
Coal inventories | Qinhuangdao Port | 0.007 | |
Caofeidian Port | −0.157 | ||
Guangzhou Port | −0.021 | ||
Tianjin Port | −0.295 | ||
Jintang Port | −0.197 | ||
Fuel oil prices | Fuel Oil 180 Singapore-FOB | 0.202 | |
Fuel Oil 380 Singapore-FOB | 0.205 | ||
China fuel oil futures closing price (a closing price is the final price at which it trades during regular market hours on any given day) (continuous contract (a continuous contract is a reinsurance contract that does not have a fixed contract end date, which will continue to be renewed and be in effect until one of the parties in the contract terminates it. Continuous contracts are different from standard reinsurance contracts in that they do not provide coverage for only a fixed period of time)) | 0.265 | ||
China fuel oil futures closing price (active contract (an active contract means that this future contract can be traded for a specific amount of time)) | 0.272 | ||
China fuel oil futures settlement price (the settlement price is the average price at which a contract trades, calculated at both the open and close of each trading day) (continuous contract) | 0.266 | ||
China fuel oil futures settlement price (active contract) | 0.273 | ||
Crude oil prices | Brent crude oil spot price | 0.152 | |
West Texas Intermediate (WTI) crude oil spot price | 0.157 | ||
Dubai crude oil spot price | 0.152 | ||
WTI crude oil futures settlement price | 0.181 | ||
Brent crude oil futures settlement price | 0.187 |
To give a clear and simple view of the correlations between CBCFI and relevant impact factors, the Pearson correlation coefficient [
In addition, to demonstrate the necessary property for selecting the ensemble learning algorithm to deal with the CBCFI prediction problem, the relationships between the 24 variables and CBCFI are checked. Then, one color-coded Pearson correlation matrix is generated. The numerical value one with the expression of dark blue indicates one overall positive linear correlation of two characteristics, whereas chartreuse indicates zero, demonstrating no linear correlation. As is shown in Figure
The Pearson correlation matrix of all features.
In this section, we first give a problem statement to present an overview of the prediction problem researched in this work. Then, the core concept and the flowchart with algorithm pseudocode of the proposed hybrid model structure are presented. At last, the prediction accuracy measurements are described.
The goal of this work is to predict the CBCFI values of the next day given historical data. We define the historical observations of the target CBCFI as
Given a CBCFI time series
Traditional freight indices prediction methods usually use the historical time series data of the target with ignorance of other impact factors. Generally speaking, the trend of CBCFI is reflected in two ways: historical CBCFI information and impact factor information. The historical time series information sometimes is sparse and thus not enough to produce accurate prediction, while some close impact factor information could reflect the movement of CBCFI from different aspects to a certain extent and with the support of the powerful database of Wind Economic Database (Wind Economic Database, refer to
Our proposed LSTM-EL model is composed of two layers: in the first layer, a cluster of LSTMs is constructed to generate the embedding features, and in the second layer, an ensemble learning method for final CBCFI prediction.
Figure
Structure diagram of LSTM-EL approach.
In the first layer, the dataset is first split into the in-sample and the out-of-sample. The preliminary embedded LSTM focuses on extracting the time-dependency information from variables of the in-sample and generates embedding features from the last LSTM layer, GBRT/RF is taken as an ensemble learning method to make the final predictions by combining preliminary embedding features from LSTM, and the prediction values of each individual tree are summed up to get the final value. The details of the proposed LSTM-EL model are illustrated in Algorithm
//construct training instance for all available time interval Given a training instance // Given the embedding features For For Output model // For Prune tree to minimize out-of-bag error Average all Output model
The idea behind GBRT is that each iterator is used to reduce the previous residual. To reduce these residuals, a new tree in the direction of the gradient descent of the loss function is created. After LSTM forms the training samples, the recursive form regression tree is as the equation to calculate
Before building the bilevel LSTM-EL prediction architecture, several hyperparameters should be determined. For the upstream model LSTM, the LSTM network with optimization of multiple hyperparameters has achieved acceptable performance when applied on sequence data [ Time-lag: the time-lag parameter has a significant impact on the performance of time series forecasting [ Embedding size: that is, the number of neurons for the last layer in the LSTM network represents the input-data dimension of the downstream ensemble learning models and further determines the complexity of GBRT and RF. If the embedding size is very high, then the LSTM will be overfitting on training instances and increase the training difficulties of the downstream models, and if its size is too small, then it will be unable to memorize the time-dependency information collected from the time-lag sequences.
However, to the best of our knowledge, there are no general rules to choose the time-lag and the hidden layers’ size. Therefore, we investigated the effect of key parameters while keeping the other parameters fixed, and grid search [
Value specified for key hyperparameters.
Hyperparameters | Search range |
---|---|
Time-lag | [2, 4, 6, 8, 10] |
Embedding size | [16, 32, 64, 128] |
Number of trees | [1, 1000] |
Depth of trees | [1, 50] |
For each combination of the time-lag and embedding size, the LSTM-EL model is designed and trained, and the corresponding optimal combination of the number of trees and depth of trees is selected using the grid search. Here, for the sake of brevity, different combinations of the time-lag
Key parameter sensitivity analysis for LSTM-GBRT under three time scales: (a) daily, (b) weekly, and (c) monthly CBCFI forecasting.
Key parameter sensitivity analysis for LSTM-RF under three time scales: (a) daily, (b) weekly, and (c) monthly CBCFI forecasting.
The best configurations of the proposed model and benchmark models.
Algorithm | Best configurations | Forecasting cases | ||
---|---|---|---|---|
Daily | Weekly | Monthly | ||
LSTM-GBRT | LSTM layer | |||
Time-lag | 8 | 10 | 6 | |
Number of hidden layers | 2 | 2 | 2 | |
Number of units in the hidden layers | 64 | 16 | 32 | |
GBRT layer | ||||
Number of trees | 101 | 31 | 41 | |
Depth of trees | 1 | 1 | 1 | |
LSTM-RF | LSTM layer | |||
Time-lag | 8 | 10 | 6 | |
Number of hidden layers | 2 | 2 | 2 | |
Number of units in the hidden layers | 64 | 16 | 32 | |
RF layer | ||||
Number of trees | 41 | 11 | 21 | |
Depth of trees | 11 | 11 | 1 | |
GBRT | Number of trees | 191 | 11 | 11 |
Depth of trees | 21 | 1 | 1 | |
RF | Number of trees | 11 | 31 | 11 |
Depth of trees | 1 | 1 | 1 | |
LSTM | Time-lag | 8 | 10 | 8 |
Number of hidden layers | 2 | 2 | 2 | |
Number of units in the hidden layers | 64 | 32 | 64 |
For measuring the prediction precision pertaining to the developed approach, several evaluation criteria are implemented, such as the RMSE as well as the MAPE. Generally, with the decline of MAPE and RMSE, the approach will be more precise. However, it is well known that, for a given prediction, actual outcomes above and below the prediction are treated asymmetrically when using MAPE and RMSE [
After determining the best network architecture for the prediction task, the training set was utilized to train our LSTM-EL model until convergence. Evaluations were conducted using the test set. To analyze the generality of the hybrid LSTM-EL structure, we use a dataset with day-to-day, week-to-week, and month-to-month bases. Specifically, the weekly and monthly data are calculated as the average of daily CBCFI. In addition, to avoid overfitting problem, early stopping and validation sets are utilized in the present study, and the percentages for training, testing, and validating sets are 60%, 20%, and 20%, respectively.
In our prediction of each forecasting approach, LSTM-RF, LSTM-GBRT, GBRT, and RF models are applied, all of which are evaluated by calculating MAPE, RMSE, Dsta, and MASE. A rolling approach is implemented to conduct a next-day/weekly/monthly CBCFI forecast. The approach uses the actual value of the predictor variable in the previous period for making a prediction in the testing set. Note that the time-lag is fixed, and new data are added for further
Illustration of rolling forecasting.
Note that, in weekly forecasting, each point represents the weekly CBCFI value and a new weekly CBCFI value is calculated by every new 5 daily CBCFI values (only workdays data). Likewise, in monthly forecasting, each point represents the monthly CBCFI value and a new monthly CBCFI value is calculated by the working days in each new month, automatically excluding weekends (Saturday and Sunday).
We next conduct the daily, weekly, and monthly CBCFI forecasting experiments, respectively. The CBCFI data from January 2012 to October 2020 are sample data. To evaluate the predictive performance of LSTM-EL models, we split the data into training data, validating data, and testing data. The ratio for each dataset is 6:2:2. Figure
MSE for validation and training data for (a) daily, (b) weekly, (c) monthly forecasting cases.
Table
Predictive performance of hybrid LSTM-EL and benchmark models for CBCFI prediction.
Methods | MAPE (%) | RMSE | MASE | Dsta |
---|---|---|---|---|
LSTM | 9.15 | 0.2419 | 1.1532 | 0.5703 |
GBRT | 7.84 | 0.2342 | 0.8915 | 0.5145 |
LSTM-GBRT | 6.00 | 0.1369 | 0.9132 | 0.8893 |
RF | 8.62 | 0.2034 | 0.8370 | 0.5042 |
LSTM-RF | 6.50 | 0.1531 | 0.9560 | 0.8539 |
LSTM | 9.47 | 0.1931 | 1.6532 | 0.4303 |
GBRT | 8.99 | 0.1815 | 0.6173 | 0.5324 |
LSTM-GBRT | 4.08 | 0.1232 | 0.9835 | 0.8034 |
RF | 7.32 | 0.1673 | 0.5525 | 0.4853 |
LSTM-RF | 4.43 | 0.1476 | 0.8941 | 0.8113 |
LSTM | 11.53 | 0.2571 | 1.0126 | 0.5863 |
GBRT | 10.89 | 0.2212 | 0.7940 | 0.5932 |
LSTM-GBRT | 5.04 | 0.1099 | 0.9203 | 0.9024 |
RF | 11.01 | 0.2236 | 0.7864 | 0.6072 |
LSTM-RF | 4.25 | 0.0943 | 0.8825 | 0.9043 |
For weekly CBCFI forecasting cases, consistent with daily CBCFI predicting processes, performance enhancement rates are positive for both errors and Dsta. The improvement percentages in MAPE of LSTM-GBRT and LSTM-RF are 54.62%, 39.48%, respectively, of RMSE are 32.12% and 11.77%, respectively, and of Dsta are 50.91% and 67.17%, respectively. Moreover, the improvement in MAPE for all two LSTM-EL approaches shows higher significance for the weekly CBCFI predicting processes than the daily prediction. Thus, using a hybrid structure to extract the time-dependent characteristics between features in LSTM-based prediction enhances accuracy. In comparison with daily information, the hybrid approach shows significant improvements for the weekly CBCFI predicting process.
Since predicting long-term CBCFI with low-frequency time-scale data raises several challenges, this study examines how the hybrid approach promotes monthly data forecasting to be accurate. Consistent with daily and weekly forecasting, hybrid approaches outperform the single ensemble learning approaches. Notably, the hybrid structure promotes the precision noticeably, MAPE improvement for GBRT and RF are 53.72% and 61.40%, respectively. RMSE improvements for LSTM-GBRT and LSTM-RF are 50.32% and 57.83%, respectively. Moreover, the MASE of the monthly data of hybrid approaches is less than 1, indicating that hybrid approaches in this approach outperform the average one-step naïve forecast.
Table
Predictive performance of hybrid LSTM-EL over three time scales.
Forecasting accuracy | Improvement percentage | |||||
---|---|---|---|---|---|---|
Daily forecasting | Weekly forecasting | Monthly forecasting | Daily forecasting | Weekly forecasting | Monthly forecasting | |
6.25% | 4.26% | 4.65% | 23.53% | 47.05% | 57.56% | |
0.1450 | 0.1354 | 0.1021 | 33.14% | 21.95% | 54.08% |
Figure
Actual CBCFI and forecasting results through LSTM-EL approaches and corresponding EL approaches for (a) daily, (b) weekly, and (c) monthly data.
Figures
To evaluate whether there is any statistically significant difference between the hybrid and conventional models, the Diebold–Mariano (DM) test [
DM test results for hybrid models and the benchmarks.
Data type | Tested model | Reference model | ||||
---|---|---|---|---|---|---|
LSTM | GBRT | RF | LSTM-RF | LSTM-GBRT | ||
Daily CBCFI forecasting | LSTM | — | ||||
GBRT | 2.4328 | — | ||||
RF | 2.4103 | −1.6784 | — | |||
LSTM-RF | 2.5123 | 2.2341 | 2.1231 | — | ||
LSTM-GBRT | 2.2763 | 2.2910 | 2.8723 | −1.2432 | — | |
Weekly CBCFI forecasting | LSTM | — | ||||
GBRT | −2.7084 | — | ||||
RF | 2.2034 | −1.6110 | — | |||
LSTM-RF | 2.2361 | 2.3414 | 2.6535 | — | ||
LSTM-GBRT | 2.2276 | 2.3012 | 2.5541 | 2.6287 | — | |
Monthly CBCFI forecasting | LSTM | — | ||||
GBRT | −4.5123 | — | ||||
RF | −5.2341 | −1.9883 | — | |||
LSTM-RF | −5.2011 | −6.1094 | −5.3312 | — | ||
LSTM-GBRT | −5.1998 | −6.2014 | −6.4234 | −2.6536 | — |
Moreover, Figure
Feature importance for (a) daily, (b) weekly, and (c) monthly CBCFI forecasting cases.
As CBCFI is an index for shipping price, in addition to the factors discussed above, supply and demand are essential factors. For CBCFI, the supply should be available bulk fleet and the demand should be the amount of coal that needs to be shipped.
On the supply side, specifically, according to the routes and ports of CBCFI, the supply should be the available bulk fleet at Tianjin Port, Jintang Port, Qinhuangdao Port, Caofeidain Port, and Huanghua Port. However, we did not find any available open-source fleet data.
On the demand side, specifically, the CBCFI represents the coal transportation need mainly from northern China to southern China. The national statistics (Figure
Energy consumption in China. (a) China’s energy consumption by source (data source: Samantha. W. Energy consumption in China from 2009 to 2019, by source
Besides, the coal sources in China consist of 90% self-produced mainly from northern China and 10% coal imports [
Note that the above demand and demand-related factors are only available on a monthly basis. Accordingly, southern utility electricity consumption, domestic steel production, domestic coal production, and coal imports are added to the monthly forecasting model.
Table
Improvement percentage of predictive performance with adding demand and demand-related factors.
Evaluation criteria | Models | Improvement percentage of monthly forecasting (%) |
---|---|---|
MAPE | LSTM | 12.67 |
GBRT | 9.97 | |
LSTM-GBRT | 5.32 | |
RF | 9.81 | |
LSTM-RF | 5.16 | |
RMSE | LSTM | 10.01 |
GBRT | 8.90 | |
LSTM-GBRT | 5.18 | |
RF | 7.87 | |
LSTM-RF | 5.01 |
Feature importance for CBCFI monthly forecasting with adding demand and demand-related factors.
The present study attempted at enhancing the forecasting accuracy of the CBCFI by formulating a novel hybrid LSTM-EL approach, which is capable of extracting the useful time-dependent information in the data by combining the LSTM technique and ensemble learning algorithms. A rolling forecasting approach is developed for assessing LSTM-EL’s forecasting accuracy in comparison with its corresponding single ensemble algorithms. Furthermore, critical factors that influence CBCFI values are discussed and experiments under daily, weekly, and monthly time scales in the rolling forecasting approach are conducted in order to test the performance generality of LSTM-EL approaches.
The major intellectual advantages here consist of the emerging method by exploiting artificial neural network and ensemble learning methods to be the useful approach to obtain the shipping freight market’s nonlinear and nonstationary features. According to the empirically achieved outcomes, domestic and overseas thermal coal spot rates and crude oil prices have obvious impacts on daily, weekly, and monthly CBCFI values, while coal inventory and fuel oil price are less important for daily and weekly CBCFI. In terms of forecasting accuracy, LSTM-EL approaches outperform the single EL models in three time-scale forecasting cases and generate better results than the naïve forecasts. Moreover, the accuracy improvement by LSTM-EL approaches for different CBCFI time-scale datasets is validated. Results indicate that hybrid LSTM-EL approaches perform best in daily CBCFI forecasting but achieve the most obvious improvement of accuracy in weekly forecasting. In addition, a DM test is implemented to evaluate whether there is any statistically significant difference between the hybrid and conventional models, and the results illustrate that LSTM-based hybrid models present statistical difference in predictive performance when compared with other models but there is no statistical difference in accuracies between LSTM-RF and LSTM-GBRT, so do the weekly and monthly forecasting cases. Overall, the LSTM-EL method has a high prospect to predict the CBCFI index in an accurate manner.
The mentioned emerging method is capable of acting to be one effective tool to make the decisions regarding chartering and shipping based on uncertain properties and further being incorporated into management toolkit by shipping industry practitioners. The developed method and outcomes widen freight rates forecasting study and indicate probable subsequent study in relevant aspects fields.
The deep-learning approach exhibits one prominent performance as opposed to the conventional statistics-related approach since it is capable of mapping the initial information for a nonlinear approach, which generates more effective influence. And long short-term memory (LSTM) based on the concept of recurrent neural network (RNN) presents an outstanding ability in time series predictions. On the other hand, ensemble learning methods refer to machine learning technique that combines several bases approaches in order to minimize the causes of error in learning approaches, such as noise, bias, and variance, for improving the overall predictive performance of the approach. In this paper, two prevailing approaches are focused on, (i) Random Forest (RF) and (ii) gradient boosting regression tree (GBRT). Long short-term memory (LSTM) LSTM, developed by Hochreiter and Schmidhuber, refers to on e special type of recurrent neural networks (RNN) [ LSTM has options for adding or deleting the data of its cell condition, as achieved with the use of cell gates. The standard LSTM can be expressed as follows. The respective step The input gate aims at determining the appropriate input information ( The output gate controls the data able to be outputted. Likewise, the activation function aims at setting the gate this is forgotten. After the memory cell state gets updated by the tan where Random Forest (RF) RF expresses one combined learning algorithm containing decision trees, introduced by Breiman [ Gradient boosting regression tree (GBRT) In terms of an established set where
LSTM structure diagram.
Model performance evaluation criteria MAPE and RMSE have the following formulation:
where
The range of Dsta value is [0, 1]. The more approaching the Dsta data is to 1, the greater the precision pertaining to the direction-related predicting process concerned with the approach will be, and vice versa.
In addition, we utilize mean absolute scaled error (MASE) for assessing whether the proposed predicting approach outperforms naïve forecasting method [
where
The data applied here originate from the public Wind Economic Database, IEA, National Bureau of Statistics of China, and Statista, and the data could receive the assessment from
The authors declare that they have no conflicts of interest.
This work was financially supported by the Key Research and Development Plan of the Ministry of Science and Technology, China (no. 2018YFB1601402), the Energy Foundation (project: Preliminary Investigation and Research on Freight Industry, no. VQ99QT1136Y17005), and the Open Research Fund for National Engineering Laboratory of Integrated Transportation Big Data Application Technology (no. CTBDAT201912).