Data-Driven Dynamic Modeling for Prediction of Molten Iron Silicon Content Using ELM with Self-Feedback

Silicon content ([Si] for short) of the molten metal is an important index reflecting the product quality and thermal status of the blast furnace (BF) ironmaking process. Since the online detection of [Si] is difficult and larger time delay exists in the offline assay procedure, quality modeling is required to achieve online estimation of [Si]. Focusing on this problem, a data-driven dynamic modeling method is proposed using improved extreme learning machine (ELM) with the help of principle component analysis (PCA). First, data-driven PCA is introduced to pick out the most pivotal variables from multitudinous factors to serve as the secondary variables of modeling. Second, a novel data-driven ELM modeling technology with good generalization performance and nonlinear mapping capability is presented by applying a self-feedback structure on traditional ELM. The feedback outputs at previous time together with input variables at different time constitute a dynamic ELM structure which has a storage ability to tackle data in different time and overcomes the limitation of static modeling of traditional ELM. At last, industrial experiments demonstrate that the proposed method has a better modeling and estimating accuracy as well as a faster learning speed when compared with different modeling methods with different model structures.


Introduction
Blast furnace (BF) is a giant countercurrent reactor and heat exchanger in metallurgical industry and is the first step towards the production of steel [1].During the BF ironmaking system working, the solid raw materials including iron ore and coke are charged layer by layer from the top of the BF, while the compressed air and some auxiliary fuels are introduced through tuyeres just equipped above the hearth for smelting to produce molten iron.The complex chemical reactions and transport phenomena take place in the different zones along the top to the bottom of the BF.Gas-solid, solid-solid, and solid-liquid phases interacted in it and are accompanied with features like high temperature, high pressure, multiphase coupling, and multiphysics field which coexist simultaneously [1][2][3].As one of the most complex industrial reactors, the BF has received broad interests both theoretically and experimentally due to its complexity and the key role of iron and steel industry on national economy.However, it is true that the operation and control of an industrial BF is a serious problem and still relies on the manual operation of foremen experientially [1,2].So far, there remain some open problems both in metallurgical fields and in engineering control fields, such as the closed-loop control or operational optimization for the whole BF ironmaking process [4][5][6].
Undoubtedly, the most crucial obstacle for closed-loop control of BF is that the current regular instruments do not have the ability to feed the need of online measurement for molten iron quality, such as the silicon content ([Si]) in the final hot metal.In the past decades, through continuous efforts and attempts, a great number of models and algorithms have been developed trying to tackle the modeling problem for silicon content prediction.These existing methods include linear model based methods like ARX and ARMAX models [6][7][8], partial least squares based methods [9], and nonlinear intelligent based methods like artificial neural network (ANN) model [10][11][12] and support vector machine (SVM) model [1,2,13,14].Though these existing methods have made some achievements in practical application, most of these studies are only focused on the static modeling for [Si] prediction while little attention has been paid to dynamical modeling of this quality parameter.
The BF ironmaking process is a complicated dynamic system with many influential factors and large time lag.To capture the system dynamics, the time series and time delays of the relevant input and output variables should be taken into account during the process modeling.This also means that the existing static prediction models cannot capture the process nonlinear dynamics very well and thus do not provide much accuracy estimation.Therefore, the selffeedback structure which can construct a dynamic system may appear more important for the BF system with serious nonlinear dynamics and large time lag.Moreover, most of the existing prediction models are trained by gradient-based algorithms such as back propagation (BP) algorithm and its variants.It is clear that the learning speed of such intelligent models is insufficiently fast as larger number of training data may be required.Moreover, the BP-like algorithm usually suffers from high computational burden, poor generalization ability, and local optima and overweighting problems [15].
Based on the work of ELM proposed by Huang et al. [15][16][17][18], this paper proposed a data-driven dynamic modeling method to predict molten iron silicon content using ELM with the help of principle component analysis (PCA) [28,29].In the design procedure of this predictive model, data-driven PCA for reducing the input variables space of ELM has been constructed.Moreover, output self-feedback architecture has been introduced to establish a dynamic ELM model for practical BF dynamic system.This self-feedback structure enables ELM to overcome the static mapping limitation of its feedforward network structure.This improvement can further optimize the application of ELM in the area of dynamic time-series prediction.Lastly, performance of the proposed dynamic ELM based prediction model is compared with other well-known modeling algorithms by industrial experiments on 2 # BF in Liuzhou Iron & Steel Group Co. of China.

Description of BF Ironmaking System
2.1.Process Description.The BF ironmaking is a continuous production process conducted in a closed vertical furnace where materials reduction from iron ore to molten iron takes place every time using carbon coke and gas in high temperature and high pressure environment.Due to the advantages like simple technology, high productivity, and high production efficiency, at present and a long period in the future, the BF smelting will still be the most important way of ironmaking.Indeed, due to the large quantity production, even small improvements of the process can result in considerable profit.Thus the ironmaking BF is regarded as a significant item in the economic development of any country.
Figure 1 is the schematic diagram of a typical BF ironmaking process, which mainly consists of hot blast main, feeding system, air supply system, gas filtration system, slag treatment system, fuel injection system, and so forth.The inner part of a furnace main is divided into five zones: the throat, the stack, the belly, the bosh, and the hearth from top to bottom.When a BF ironmaking system runs, the solid raw materials consisting of coke and fresh ore are charged layer by layer with definite quantities from the top, while the preheated compressed air, together with pulverized coal, is introduced at the bottom through tuyeres, entering just above the hearth, which is a crucial region of BF where the final molten metal product gathers.The hot air at approximately 1200 ∘ C passes upward through the charge and reacts with the descending coke and the supplementary injected oil to generate carbon dioxide, which then changes to CO and H 2 at high temperature.A lot of heat energy is released during this period that can heat up the hearth as high as 2000 ∘ C. The generated CO and H 2 further reduce the descending iron ore to form hot metal accumulating in the hearth, and some unreduced impurities (mainly SiO 2 ) form the slag (mainly CaSiO 3 ) floating on the hot metal being lighter.The liquid hot metal and slag are periodically tapped out by opening claylined tapholes for the subsequent processing.Generally, it will take 6∼8 hours for each period of BF ironmaking [30].

Importance of Modeling for Silicon Content Prediction.
For many countries, such as China, the steel industry is playing an important role in the national economy, and there are thus extensive interests in operational control and optimization of ironmaking BF for saving energy and reducing cost.Generally speaking, control of the BF system often means controlling the hot metal temperature and components, such as silicon content, sulfur content, and phosphorus content in hot metal within acceptable bounds, among which the silicon content is the most important one [31].
For a practical BF production process, silicon content ([Si]) is an important index indicating the chemical heat of molten iron.High silicon content means a large quantity of slag, and this would be easier to wipe off the phosphorus and sulphur in the hot metal.However, excessive silicon content will make cast iron stiff and brittle and even lead to lower yield of metal and easier splashing.In addition, high silicon content will result in a corresponding increase of SiO 2 in the slag, thereby influencing slagging speed of calclime, extending converting time, and intensifying corrosion to furnace lining.From an energy point of view, it would be desired to operate the BF process at low molten metal silicon content, still avoiding the risk of cooling the hearth which may result in chilled hearth.Generally, the content of silicon content should be controlled in 0.5%∼0.7%.
Nowadays, it is still an insoluble dilemma to realize the closed-loop control of molten iron quality in ironmaking BF.The main bottleneck is that the direct online measurement on this quality parameter of molten iron is difficult to be realized with the existing conventional measuring means.Moreover, the offline assaying process for this index takes a long lag time, usually more than 1 hour.Therefore, online prediction based molten iron quality modeling must be established.Effective online prediction or estimation for silicon content not only can offer useful information for operators to judge the inner smelting state and operational condition, but also plays a key role in realizing closed-loop control and operational optimization as well as energy-saving and cost-reducing.

Modeling Strategy Using PCA and ELM with Self-Feedback
Data-driven black-box model is a kind of input-output mode.It relies on the development of novel nonlinear signal processing and data analysis technologies along with computer hardware and software technologies and does not require any prior information about the process.The main thought of data-driven model is to approximate the input-output relationships using the strong nonlinear approximation power of some mathematical tools or artificial intelligence technologies, like artificial neural network, fuzzy logic, and support vector machines [32].
The proposed data-driven modeling strategy for silicon content prediction is shown in Figure 2. First, since the BF is a complicated high-dimensional nonlinear dynamic system combined with numerous coupled factors, data-driven PCA technology with a strong ability to handle high-dimensional nonlinear correlated data is introduced to pick a few key factors as the input variables of model so as to reduce the dimension and difficulty for prediction modeling.Then, considering that the BF ironmaking system is a nonlinear system with dynamic time-vary characteristic, the ELM with better nonlinear mapping and fast process capability modeling technology is brought in this paper.In the meantime, output self-feedback structure is put into use on the basis of traditional ELM in this method, and the output variables derived from previous time are fed back to the network input layer.These feedback outputs together with input variables at different time constitute a dynamic ELM structure which has a storage capacity and has the ability to tackle data in different time, thus overcoming the limitation of static modeling of traditional ELM.
Remark 1.As shown in Figure 2, the dynamic ELM based estimation model is developed to achieve the following nonlinear dynamic mapping: where  = [  parameter  cannot be measured online and the offline assaying process takes a long time which is usually more than one hour, the actual value of  cannot be obtained in real time in practice.Therefore, to achieve the desired dynamic estimation of , the estimated ỹ( − 1), . . ., ỹ( −  O ) at past will be used to construct the self-feedback structure for the proposed dynamic ELM based estimation model.Remark 3. The proposed modeling strategy has two advantages.
(i) The dynamic property of time series and time delays is considered by introducing the output and inputs in previous time through a self-feedback structure.This self-feedback connection enables ELM to overcome the static mapping limitation of its feedforward network structure.Thus the improved version of ELM can capture the process nonlinear dynamics very well by remembering prior input and output states and using both the prior and current states to calculate new output value.
(ii) Different from the BP-like modeling algorithm usually suffering from high computational burden, poor generalization ability, and local optima and overweighting problems, the ELM based modeling benefits from much faster learning speed, higher generalization performance, and ease of implantation and use (no extra parameters need to be tuned except the predefined network architecture).

Selection of Secondary Variables by PCA-Based Dimension
Reduction.PCA is a kind of method trying to grasp the main contradiction part in statistical analysis process and analyze the main influencing factors from multiple objects in order to simplify the complex problems.Actually, the principle components conducted by PCA are the combination of column vectors picked by varimax from input matrix.Since correlations and noises always existed in practical industrial data, principle components with a small variance are usually some noisy information.Abandoning this data will not cause a crucial information loss and can even achieve denoising to some extent.Data set as shown in the following equation is considered here: where X × = [x 1 , x 2 , . . ., x  ] is the measured  data array on  variables,   is the score vector, and V  is the characteristic unit vector of covariance matrix X T X, named load vector.The variance of   is   which is also the eigenvalue of X T X and satisfies Var(  ) =   ,  1 ≥ ⋅ ⋅ ⋅ ≥   ≥ 0. PCA is also a procedure used to explain the variance in a single data matrix.
The principal component decomposition of X in (2) can be represented as follows: where   V  T is the th principal component and E is a matrix of residuals.It is to be noted that the score vectors are orthogonal and so are the loading vectors which are of unit length.
Equation (3) indicates that a rank  matrix X can be decomposed as the sum of  rank 1 principal components.The number of principle components kept in (3) is determined by the total variance.The variance contribution and the total variance of principal component can be represented as follows: where   is the th principle component variance contribution and   is the total variance of the first  terms.Usually, the total variance varies should be larger than 85%.Only in this case can the data dimension be reduced on the premise of not losing useful information.
After data dimension reduction and noise filtering through PCA, the data measurements are represented as where  is the number of remaining principle components, U  is the score vector of the first  terms, and V  is the loading vector of the first  terms.

Remark 4.
A problem of the PCA-based dimension reduction is that the conducted principle components are comprehensive representation of the original higher-dimension physical variables.However, by computing the component matrix which contains the correlations between the principle component and the original physical variable, one can obtain the lower-dimension physical variables which related to the principle components mostly, according to some specific requirements.

ELM with Self-Feedback Connection.
Extreme learning machine (ELM) is an algorithm for single hidden layer feedforward networks (SLFNs) with additive or radial basis function (RBF) hidden nodes whose learning speed can be thousands of times faster than conventional feedforward network learning algorithm like BP algorithm while reaching better approximation performance.In real application, net tends to be used for a finite data set.Huang and Babri prove that a SLFN with at most  hidden nodes and with almost any nonlinear activation function can learn  distinct observations with zero error [18].And based on the work of [17], the SLFNs (with  hidden neurons) with arbitrary chosen input weights and bias were proved to have the ability to learn  distinct observations with arbitrary small error.
The procedure of the algorithm used here can be summarized as follows: for  arbitrary distinct samples (X  , Y  ), where X  = [ 1 ,  2 , . . .,   ] T ∈ R  and Y  = [ 1 ,  2 , . . .,   ] T ∈ R  , the output of a SLFN with Ñ hidden nodes can be represented by where a  and   are the learning parameters of hidden nodes,   is the output weight, and (a  ,   , X) is the output of the th hidden node with respect to the input data X.
(i) For additive hidden node, (a  ,   , X) is given by (a  ,   , X) = (a  ⊙ X +   ),   ∈ , where a  is the input weight vector connecting the input layer to the th hidden node,   is the bias of the th hidden node, and a  ⊙ X denotes the inner product of vector a  and X in R  .
(ii) For RBF hidden node, (a  ,   , X) is given by (a  ,   , X) = (  ‖X − a  ‖),   ∈  + , where a  and   are the center and impact factor of th RBF node and  + indicates the set of all positive real values.
Remark 5.For the prediction modeling problem considered in this paper, X  = [ 1 ,  2 , . . .,   ] T are the data of the model inputs variables selected by PCA.Y  = [ 1 ,  2 , . . .,   ] T are the data of molten iron silicon content that are to be estimated online, which means that  = 1.Moreover, the additive hidden node is used in our prediction modeling due to its simple structure.
In supervised batch learning, the learning algorithms use a finite number of input-output samples for training.For  arbitrary distinct samples (X  , Y  ), if a SLFN with Ñ additive hidden nodes can approximate these  samples with zero error, it then implies that there exist a  ,   , and   such that Equation ( 7) can be written compactly as where And the solution of the above linear system can be solved by the inverse of matrix  by the Moore-Penrose method, which is where H † is the Moore-Penrose generalized inverse of the hidden layer output matrix H [15][16][17].
Remark 6.For the simplicity of the paper, the prediction modeling process based on ELM with additive hidden node is summarized as follows: giving a training set Z = {(X  , Y  ) | X  ∈ R  , Y  ∈ R  ,  = 1, . ..} for prediction modeling and hidden neuron number Ñ, the input weight a  and bias   can be assigned arbitrarily to calculate the output matrix H of hidden layer by using (9).After that, the output weight  can be calculated by (12), which is essential for estimating output only based on estimating inputs.
Remark 7. The hidden node number Ñ is the only parameter that needs to be predefined in the presented modeling method.In order to achieve optimal approximation ability of training and realize fast convergence aiming at complex industrial data, a proper (maybe optimal) Ñ can be determined as the one which results in the lowest validation error through several trainings and validations.

Model Development.
In this section, a medium-sized blast furnace (as shown in Figure 3) with the working volume of 2000 m 3 in Liuzhou Iron & Steel Group Co. is chosen to perform the validation of the proposed silicon content prediction method.On the foundation of process mechanism and existing monitoring instruments status, measurable parameters influencing silicon content are determined as blast temperature ( ∘ C), blast pressure (kPa), oxygen enrichment percentage (%), flow rate of rich oxygen (m 3 /h), gas permeability (m 3 /min⋅kPa), gas volume of bosh (m 3 /min),  bosh gas index (m 3 /min⋅m 2 ), blast kinetic energy (kJ/s), blast humidity (g/m 3 ), flow rate of cold air (m 3 /h), feed blast ratio (wt%), resistance coefficient, volume of coal injection (kg/t), theoretical burning temperature ( ∘ C), actual wind speed (m/s), and furnace top pressure (kPa).Figure 4 is a schematic diagram of this actual BF ironmaking system and its measurement system.Through this figure, one can get an intuitive understanding of the detection position distribution of each measurable parameter.In Figure 4, the direct detecting parameters are explained as shown in Table 1, and the indirect detecting variables and their calculation formulas by the direct detecting parameters are listed in Table 2.
Considering the impact of strong correlation between the selected 16 input variables, PCA is used to determine the key input variables that influence the molten iron silicon content mostly.According to (4), the eigenvalue and the variance contribution rate of each component can be calculated as shown in Figure 5.It can be summarized that the cumulative variance contribution rate of the first 6 terms is 98.723%.This means these 6 principal components are sufficient to describe the major variances in the data.Then, by computing the component matrix of principle components, 6 process variables can be determined as the secondary input of the [Si] prediction model.These secondary variables include hot blast pressure  1 (kPa), hot blast temperature  2 ( ∘ C), oxygen enrichment percentage  3 (%), volume of coal injection  4  (Kg/t), blast humidity  5 (g/m 3 ), and gas volume of bosh  6 (m 3 /min).
Figure 6 displays the modeling data sets collected from the 9 to 21 October 2013.To better exhibit the performance of ELM based prediction model, optimal learning parameters need to be made.For the sake of simplicity, we mainly discuss the reason of selecting the optimal number of hidden nodes for the ELM algorithm.The optimal number of hidden units is selected as the one which results in the lowest validation error.Through experiments analysis, the optimal number of  hidden nodes with sigmoidal function is set as Ñ = 25.The corresponding modeling result of the developed ELM model with self-feedback structure is shown in Figure 7, where the good modeling accuracy with practical data has been demonstrated.

Experiments Results.
The developed ELM based prediction model has been tested on 2 # blast furnace in Liuzhou Steel of China for quite a long time.Figure 8 shows the estimated results using the proposed modeling method for predicting [Si], where the figure compares the predicted trend with the actual one.Moreover, in order to show the superiority of the proposed method more intuitively, comparisons with various popular prediction models have been made.Here, back propagation neural network without self-feedback (SFB) connection (BP NN for short), BP NN with SFB connection, and traditional ELM without SFB connection have been chosen to conduct the prediction comparison on the same observations.Moreover, the Levenberg-Marquardt algorithm is used in traditional BP learning algorithm, which appears to be the fastest method for training moderatesized feedforward neural networks [16].From Figure 8, it can be seen that the proposed model has the best estimation performance among all the developed prediction models.For example, it results in the best estimation trend and accuracy, and the shapes of the estimated curve values match the measured ones very well and better than that with other three methods.
It is well known that a good model should have its estimated error autocorrelation close to a white noise.So, in this text, we draw the autocorrelation function of estimating error of different models as shown in Figure 9.It can be seen from these figures that the autocorrelation results of algorithm like BP NN without SFB connection and ELM (sigmoid) without SFB connection are much worse than that with a SFB structure, respectively.Although one can obtain that the measuring error autocorrelations of the proposed ELM with SFB connection and BP NN with SFB connection are all satisfactory and close to the shape of the white noise here, the above estimation result (as shown in Figure 8) confirmed the effectiveness and superiority of the proposed method in predicting accuracy.
The estimation and generalization performance of the developed models can be further evaluated quantitatively by calculating the validation accuracy on the testing data set using the standard statistical measures, such as the root mean squared error (RMSE) where  stands for the number of data points in the time series to be estimated,   is the actual value of time series, and ỹ is the estimated value at time  by the prediction model.Table 3 shows the calculated RMSE, , and consuming time in training or testing procedure using different methods, respectively.It can be seen from this table that the ELM with feedback connection presented in this paper obtains a much less RMSE and  than other contrastive model algorithms.Moreover, the proposed model (sigmoid) spent the smallest 0.000546 s CPU time for training and an even faster time 0.000089 s for testing and obtained a very reasonable result.Through the analysis, it can also be seen that, no matter what kinds of algorithm are used, both training and testing results of a model with dynamic feedback are much better than the model without feedback structure when other conditions are the same, which confirm the effectiveness of our developed dynamic self-feedback model structure.
Moreover, the results of practical application indicate that the performance of the developed model is superior to other models and can overcome the problem of "over fitting" excellently.And the gap between every training and testing is small, which enhances the reliability of the proposed method.The method can also overcome blindness of predefined parameters selection of conventional algorithm; thus convenience is provided for the operators.

Conclusions
This paper proposed a data-driven modeling for prediction of molten iron silicon content using PCA and ELM with selffeedback structure.Unlike other methods used for silicon content prediction, the proposed method can predict silicon content more accurately with an extremely fast speed than conventional algorithm, which feed the need for real-time control.Apart from selecting the number of hidden nodes, no other control parameter has to be chosen; thus convenience is provided for the operators.Moreover, the modified ELM with self-feedback structure can overcome the static mapping limitation of traditional ELM and so can cope with dynamic time-series prediction problems very well.Performance of the proposed modified ELM based prediction model is compared with BP algorithm and different model structure on practical industrial data obtained from 2 # BF in Liuzhou Steel Company of China.The accuracy can basically meet the requirements of actual operation.

Figure 2 :
Figure 2: Strategy diagram of nonlinear intelligent modeling for silicon content prediction.

Figure 4 :
Figure 4: Schematic diagram of blast furnace system.
contribution rate of the first 6 components is 98.723%

Figure 5 : 6 Figure 6 :
Figure 5: Eigenvalue and variance contribution rate of each component.

Figure 7 :Figure 8 :
Figure 7: Modeling results with the proposed method.

Figure 9 :
Figure 9: Autocorrelation function of estimating error of different models.
) and the standard deviation  of estimation error | ỹ −   |  = √ 1   ∑ =1 (     ỹ −       − 1   ∑ =1      ỹ −        ) 2 , 1 ,  2 , . . .,   ] are the values of secondary variables selected by PCA and  is the quality parameter that needs to be estimated.The values of  I ,  O ∈ Ζ + are selected according to the time delays and time series of the relevant input and output variables and the sampling frequency of quality parameter  which is generally sampled slower than the process data  significantly.
Remark 2. Note that, in the learning period of the proposed ELM based dynamic estimation model using the training databases, ( − 1), . . ., ( −  O ) are the actual (sampling) values of .After the proposed ELM model is trained and validated well, it will be applied in practice.Since the quality y(t − 1)

Table 1 :
Direct detecting parameters and their instrumentations.

Table 3 :
Some data statistics of each algorithm.