Prediction of byproduct gas flow is of great significance to gas system scheduling in iron and steel plants. To quantify the associated prediction uncertainty, a two-step approach based on optimized twin extreme learning machine (ELM) is proposed to construct prediction intervals (PIs). In the first step, the connection weights of the twin ELM are pretrained using a pair of symmetric weighted objective functions. In the second step, output weights of the twin ELM are further optimized by particle swarm optimization (PSO). The objective function is designed to comprehensively evaluate PIs based on their coverage probability, width, and deviation. The capability of the proposed method is validated using four benchmark datasets and two real-world byproduct gas datasets. The results demonstrate that the proposed approach constructs higher quality prediction intervals than the other three conventional methods.
In the iron and steel industry, the utilization of byproduct gas is of great significance to reduce the cost of fuel consumption and greenhouse gas emissions [
In recent years, artificial intelligence methods such as neural networks (NNs) [
However, most of these models focus on point forecasting, which provides only deterministic forecasts with no information about the prediction probability. In practice, there is a large amount of uncertainty originating from temperature fluctuations and measurement errors, for example. The forecasting uncertainty will affect the decision-making process and increase the risk of scheduling, so it is imperative to quantify the uncertainty of prediction. The prediction interval (PI) is a well-known tool for quantifying the uncertainty of prediction. The PI provides not only a range within which the target values are highly likely to lie but also an indication of their accuracy [
Although there have been only a few studies on byproduct gas flow interval forecasting, the construction of PIs in other application areas [
However, when using NNs to perform interval forecasting, the initial values of the connection weights are usually generated randomly [ A twin ELM is adopted to construct PIs and the output weights of the twin ELM are pretrained using a pair of symmetric weighted objective functions. The pretraining method offers reasonable initial values and, as a characteristic of the ELM, only the output weights need to be tuned, which benefits the subsequent optimization process. A modified cost function called CWDC that considers the deviation of the PIs is proposed. CWDC provides a more comprehensive description of the PI performance. Experiments based on four benchmark datasets and two real-word byproduct gas flow datasets are performed to illustrate the capability of the proposed approach.
The remainder of this paper is organized as follows. Section
ELM is a type of single-layer neural network proposed by Huang et al. [
The objective function for training the ELM is
According to the Moore-Penrose inverse theorem,
Given a set
To evaluate the performance of PIs, PI coverage probability (PICP) and PI normalized average width (PINAW) are two typical indicators. The PICP is used to evaluate the reliability of the constructed PIs and is defined as
According to the concept of a PI, the value of the PICP is expected to be greater than or equal to the predetermined confidence level; otherwise, the PIs are invalid.
A relatively high PICP can be easily achieved if the width of the PIs is sufficiently large, but wider PIs are less informative in practice. The PINAW, which can quantitatively describe the width of the PIs, is defined as
To construct PIs to provide satisfactory performance, a higher PICP and lower PINAW are expected. However, the two indices conflict with each other. To find a compromise between them, a combined measure called coverage width-based criterion (CWC) has been proposed [
Our proposed method is derived from the LUBE method. As ELM is a new type of NN with simple structure and fast training speed, we adopt a twin ELM to construct the lower and upper bounds of PIs. The overall structure of the proposed method is shown in Figure
The overall structure of the proposed method.
Flowchart of the proposed method.
In the first step, the connection weights of the twin ELM are pretrained using a pair of symmetric weighted objective functions to construct raw prediction intervals. By using a symmetric pair of weights, two ELM models (
Then, the second step is to improve the quality of the PIs by further adjusting the output weights based on PSO.
In LUBE method, CWC can provide a good compromise between PICP and PINAW. It cannot describe the deviation of PIs, however. In order to evaluate PIs comprehensively, a modified cost function called CWDC that considers the deviation of the PIs is proposed. The CWDC is defined as
The training process is to minimize the objective function. However, when the PIs are poor-quality with zero-width (PINAW = 0%) and extremely low coverage probability (PICP = 0%), CWC will equal the minimum value 0. This will lead to a misleading evaluation of PIs. As for CWDC, by taking the deviation of PIs into consideration, the problem can be solved.
In the proposed method, the pretraining is designed to get reasonable initial values for the output weights before optimization. In the LUBE method, the initial values of the connection weights are usually generated randomly and the training process is generally slow because there is a large search space. In this paper, the output weights of the twin ELM are pretrained using a pair of symmetric weighted objective functions. At the beginning of the following optimization process, initial population of PSO is generated around the pretrained output weights as reasonable initial values.
The details of the pretraining process are elaborated as follows. For a given set
The residual
The twin ELM consists of two ELMs, which have the same input weights and bias as shown in Figure
In the above two equations,
The Lagrangian of (
According to (
The method for solving (
The optimization stage is performed to further adjust the connection weights of the pretrained twin ELM. The goal of the optimization is to achieve the PIs of the best quality through optimizing the connection weights with respect to the objective function (
Suppose that there are a group of
In (
The details of how PSO tunes the output weights of the twin ELM are as follows: Initialize the parameters of the modified PSO. Specify the maximum number of iterations, the population size, and the parameters of the flying progress, including the inertia weight Initialize the first population. Note that each particle in the population represents a set of output weights, including Calculate the fitness value of the target. For each particle in the population, the fitness value is calculated using ( Update the optimal values. The local best position Update the particle position and velocity according to ( Repeat steps (3)–(5) until the iteration number reaches the preset threshold value. Then, return the smallest value of the cost function CWDC and the best positions Evaluate the PIs that are generated by the optimal the twin ELM based on the test data.
The effectiveness of the proposed method is tested on four benchmarks. The details of the four benchmarks are described as follows.
Case 1 is a 5D mathematical function that is usually used for evaluating regression models, and it is defined as
Case 2 is a synthetic dataset used to model the data with heteroscedastic noise. The dataset is generated by the following function:
Case 3 investigates the relationship between the personal characteristics, dietary factors, and plasma levels of beta-carotene [
Case 4 is taken from the UCI repository [
The comparison results of various prediction methods are discussed in this section. To evaluate the performance of the pretraining method, the proposed method and the ELM without pretraining are first compared. Then the proposed method using CWDC as the objective function to is compared with the ELM using CWC (ELM-CWC method). The effectiveness of proposed method is also demonstrated by comparing it with the bootstrap method and the Bayesian method. The PI evaluation indices PICP, PINAW, and PINAD for the four cases are shown in tables. Considering the inherent randomness of PSO and ELM, each experiment is repeated 10 times for all the methods. The results of the median values of the indices are recorded.
The parameters of the PSO are assigned as follows:
For each dataset, the training set includes 200 samples and additional 100 observations is used for testing the models. To determine the optimal number of hidden nodes of ELM,
In the proposed approach, pretraining method is designed to provide a reasonable initial value for the following optimization process. To evaluate the effectiveness of the pretraining strategy, randomly initialized ELM (RI-ELM method) is employed as a comparative approach. In RI-ELM, all the weights of ELM are given randomly and the output weights are directly tuned by PSO. To take a better comparison, CWDC is employed as the cost function of both methods.
The iterative processes of PSO training of the two methods in the four cases are shown in Figure
Comparisons of proposed method and RI-ELM.
PINC | Method | PICP | PINAW | PINAD |
---|---|---|---|---|
Case 1 | RI-ELM | 90.50% | 64.59% | 0.66% |
Proposed method | 91.50% | 62.14% | 0.61% | |
|
||||
Case 2 | RI-ELM | 91.50% | 30.84% | 0.28% |
Proposed method | 93.00% | 29.64% | 0.27% | |
|
||||
Case 3 | RI-ELM | 90.50% | 34.59% | 1.66% |
Proposed method | 91.00% | 30.18% | 1.63% | |
|
||||
Case 4 | RI-ELM | 91.50% | 24.88% | 0.14% |
Proposed method | 92.00% | 22.48% | 0.14% |
Training processes of proposed method and RI-ELM for case 1 (a), case 2 (b), case 3 (c), and case 4 (d).
In order to test the effectiveness of the proposed criterion CWDC, the ELM using CWC (ELM-CWC) is compared with the proposed method. For the superior performance of pretraining based ELM, it is used to construct PIs. The results of ELM-CWC and the proposed method are summarized in Table
Comparisons of proposed method and ELM-CWC.
PINC | Method | PICP | PINAW | PINAD |
---|---|---|---|---|
Case 1 | ELM-CWC | 90.00% | 62.98% | 0.70% |
Proposed method | 91.50% | 62.14% | 0.61% | |
|
||||
Case 2 | ELM-CWC | 91.50% | 29.80% | 0.32% |
Proposed method | 93.00% | 29.64% | 0.27% | |
|
||||
Case 3 | ELM-CWC | 91.00% | 33.60% | 1.75% |
Proposed method | 91.50% | 30.18% | 1.63% | |
|
||||
Case 4 | ELM-CWC | 91.00% | 23.63% | 0.16% |
Proposed method | 92.00% | 22.48% | 0.14% |
PINAD is a new index that is considered in CWDC and is proposed to show the deviation PIs. PINAD and PICP have some correlation. A higher PICP implies that more prediction points lie in the constructed PIs, which may lead to a smaller PINAD. However, the PICP can only represent the proportion of the prediction points lying in the constructed PIs, while the PINAD can show the deviation degree of the data that do not lie in the constructed PIs. Thus, the PINAD is a necessary supplement to the PICP.
PIs obtained by the proposed method of the four case studies are illustrated in Figure
PIs obtained by the proposed method for Case 1 (a), Case 2 (b), Case 3 (c), and Case 4 (d).
As shown in Table
Test results in the four benchmark datasets.
Methods | Index | Case 1 | Case 2 | Case 3 | Case 4 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PICP | PINAW | PINAD | PICP | PINAW | PINAD | PICP | PINAW | PINAD | PICP | PINAW | PINAD | ||
Bayesian | Median | 91.00% | 65.78% | 0.79% | 86.00% | 31.37% | 0.44% | 94.50% | 37.90% | 1.49% | 94.00% | 26.50% | 0.11% |
STDEV | 1.48% | 0.78% | 0.10% | 1.29% | 0.86% | 0.05% | 0.98% | 1.31% | 0.12% | 1.57% | 0.85% | 0.03% | |
|
|||||||||||||
Bootstrap | Median | 88.00% | 62.50% | 0.78% | 92.00% | 29.98% | 0.28% | 91.50% | 36.93% | 1.69% | 92.00% | 24.11% | 0.17% |
STDEV | 1.52% | 0.79% | 0.09% | 1.16% | 0.72% | 0.04% | 0.94% | 1.37% | 0.14% | 1.52% | 0.87% | 0.04% | |
|
|||||||||||||
Proposed | Median | 91.50% | 62.14% | 0.61% | 93.00% | 29.64% | 0.27% | 91.50% | 30.18% | 1.63% | 92.00% | 22.48% | 0.14% |
STDEV | 1.45% | 0.65% | 0.09% | 1.06% | 0.60% | 0.04% | 0.82% | 1.17% | 0.10% | 1.49% | 0.75% | 0.02% |
As reported in Table
In this section, the proposed method is used to solve a real-world problem; in this case, the method is used to handle the uncertainty of blast furnace gas (BFG) generation and consumption prediction in the iron and steel industry.
BFG is an important type of byproduct gas from the steel production process; it is used for coking, sintering, hot-rolling, and other production processes. The historical datasets of two typical BFG generation and consumption processes are chosen as case studies to illustrate the advantages of the proposed method in practical applications. The first dataset is BFG generation. The second dataset covers BFG consumption in a hot-rolling process. Both datasets come from the energy management center of a steel plant in China, Jiangsu Province. The sample interval is 5 min, and the total number of sampling points in each dataset is 288. The dataset is divided into two parts: the training set (188 points) and the test set (100 points). One-step-ahead forecasting using the proposed method is implemented and compared with the other three methods.
Inputs of an NN are usually determined based on the characteristics of the system. BFG generation and consumption prediction can be seen as a time series prediction problem. The autocorrelation function (ACF) and partial autocorrelation function (PACF) are adopted as the criteria to determine the time lags of the input. The ACF and PACF of the BFG generation are displayed in Figures
ACF of BFG generation series.
PACF of BFG generation series.
For these two cases, the parameters of the PSO are assigned as follows:
PIs with PINC = 95% obtained by the proposed method are shown in Figures Relative to the Bayesian method, the proposed method provides satisfactory PICPs at all confidence levels in both cases. For example, at the confidence level with PINC = 90%, the PICPs of the PIs constructed by the proposed method are 91% for the blast furnace and 91% for the hot-rolling process. The PICPs of the PIs constructed by the Bayesian method are 87% for the blast furnace and 89% for the hot-rolling process, which are lower than the PINC. This result occurs because the Bayesian method makes assumptions about the error distributions, but the assumptions cannot always be fitted to the facts. The performance of the proposed method is clearly better than that of the Bayesian method. In both case studies, the proposed method provides better performance than the bootstrap method. Taking the case of the blast furnace as an example, the proposed method constructs PIs with PINAWs that are 2.80%, 3.08%, and 4.58% lower than those of the bootstrap method at the three confidence levels. This observation shows that the proposed method generates more informative PIs than the bootstrap method. The PINAWs of the PIs constructed by the proposed method are narrower than those of the LUBE method, and the PINADs of the PIs constructed by the proposed method are also much lower than those of the LUBE method. For example, the proposed method constructs PIs with PINADs that are 2.80%, 3.08%, and 4.58% lower than those of the LUBE method in the case of the hot-rolling process at the three confidence levels. The reason is that the CWDC proposed in this study considers the deviation of the PIs.
Comparisons of different methods for BFG generation.
PINC | Method | PICP | PINAW | PINAD |
---|---|---|---|---|
90% | Bayesian | 87.00% | 43.22% | 0.60% |
Bootstrap | 90.00% | 48.16% | 0.57% | |
LUBE | 90.00% | 46.45% | 0.52% | |
Proposed method | 91.00% | 45.36% | 0.46% | |
|
||||
95% | Bayesian | 95.00% | 57.50% | 0.16% |
Bootstrap | 96.00% | 56.81% | 0.20% | |
LUBE | 96.00% | 54.33% | 0.21% | |
Proposed method | 96.00% | 53.73% | 0.18% | |
|
||||
99% | Bayesian | 99.00% | 70.74% | 0.01% |
Bootstrap | 99.00% | 72.12% | 0.01% | |
LUBE | 99.00% | 68.56% | 0.01% | |
Proposed method | 99.00% | 65.54% | 0.01% |
Comparisons of different methods for BFG consumption in the hot-rolling process.
PINC | Method | PICP | PINAW | PINAD |
---|---|---|---|---|
90% | Bayesian | 89.00% | 32.76% | 0.47% |
Bootstrap | 90.00% | 35.71% | 0.52% | |
LUBE | 91.00% | 33.52% | 0.50% | |
Proposed method | 91.00% | 32.29% | 0.44% | |
|
||||
95% | Bayesian | 96.00% | 41.90% | 0.23% |
Bootstrap | 96.00% | 42.87% | 0.20% | |
LUBE | 95.00% | 40.90% | 0.27% | |
Proposed method | 96.00% | 39.06% | 0.22% | |
|
||||
99% | Bayesian | 99.00% | 59.17% | 0.03% |
Bootstrap | 99.00% | 58.50% | 0.04% | |
LUBE | 99.00% | 55.62% | 0.04% | |
Proposed method | 99.00% | 54.60% | 0.03% |
PIs with PINC = 95% obtained by the proposed method for BFG generation.
PIs with PINC = 95% obtained by the proposed method for BFG consumption in the hot-rolling process.
According to the results of the analysis, all four methods can be used to construct PIs for byproduct gas generation and consumption forecasting. The PIs generated by the four methods in most cases have valid PICPs and reasonable PINAWs and PINADs. However, it is clear that the proposed approach constructs higher quality PIs than the other three conventional methods. Thus the proposed method is an effective way to improve the quality of PIs for byproduct gas flow forecasting applications.
To quantify the uncertainties of byproduct gas forecasting, a two-step approach based on a twin ELM is proposed to construct high quality PIs. The twin ELM is first pretrained using a pair of symmetric weighted objective functions. Then, the output weights are further adjusted by PSO. During the procedure, a novel objective function, which accounts for the PI coverage probability, width, and deviation, is constructed to obtain optimal PIs. The effectiveness of the proposed method has been successfully verified through tests and comparisons with well-established methods using both benchmark datasets and real-world byproduct gas datasets. Demonstrated results show that not only valid PICPs and narrow PINAWs are obtained, but also small PINADs are achieved. The results have illustrated that the proposed method constructs high quality PIs for byproduct gas forecasting application. The constructed PIs can be applied to gas system scheduling for making reliable decisions.
Although we have only verified the effectiveness of the proposed method in iron and steel plants, it could be easily extended to other fields such as wind speed and electric load forecasting. Furthermore, as with other applications of ELM, the performance of the proposed method might be improved though the combination of the proposed method with structure selection strategy, for example, pruning strategy [
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research is supported by the Key Research Program of the Chinese Academy of Sciences Grant no. KFZD-SW-415 and Science and Technology Service Network Initiative no. KTJ-SW-STS-159.