Research on TFe Content of Hematite Based on LU-TELM-SOA and Selection of Band

Iron ore is an important raw material for the steel industry, so it is of great economic significance to determine the grade of the iron ore quickly and accurately. And the TFe content is the main indicator that determines the grade of the iron ore and whether the iron ore can be smelted directly. Unlike manual methods and methods for chemical analysis, the paper uses the selection of band for the near-infrared spectrum based on the pruning method and the two-hidden-layer extreme learning machine based on LU decomposition and seagull optimization algorithm (LU-TELM-SOA) to identify the TFe content. First of all, the paper proposes the selection of band based on the pruning method to retain the sensitive band of the near-infrared spectrum. Aiming at the problems of poor stability and low accuracy of a single LU-TELM (the two-hidden-layer extreme learning machine based on LU decomposition) model, the paper proposes LU-TELM-SOA. The experimental results show that LU-TELM-SOA has the advantages of high accuracy and strong stability.


Introduction
Steel is an important metal material that is indispensable for the industrial development of the country and determines the industrial level and technological level. Among them, the iron ore is the raw material of steel. At present, the iron ore with high industrial value includes the magnetite ore and the hematite ore. The chemical formula of hematite is Fe 2 O 3 , and its theoretical TFe content is 70%. Hematite is the most important industrial iron ore in the world, accounting for more than 60% of the total reserves in the iron ore. The iron ore is widely distributed, but the total reserves of Australia, Russia, and Brazil account for more than half of the global reserves. The TFe content of the iron ore is the main indicator that determines the grade of the iron ore and whether it can be directly smelted. The TFe content of the iron ore used in industrial production ranges from approximately 23% to 70%. The iron ore used in steel mills requires an iron grade of 55% or more and low sulfur, phosphorus, and other elements. The traditional methods for determining the TFe content include chemical titration analysis, instrumental analysis, and X-ray fluorescence spectrometry, but they are time-consuming and labor-intensive. Therefore, the paper uses the near-infrared spectrum and extreme learning machine (ELM) to propose a new method that can quickly and accurately detect the TFe content, which is mainly applied to hematite ore. And the near-infrared spectrum is an electromagnetic radiation wave between visible light (Vis) and mid-infrared (MIR), and the spectral analysis is used to identify substances and determine their chemical composition and relative content based on their spectrum.
The neural networks have strong learning ability, the ability to approximate nonlinear functions, and good fault tolerance and parallelism. The feedforward neural network with s simple structure is one of the types of neural networks and can approximate any function with arbitrary precision when the number of neurons in the hidden layer is large enough. The simplest structure of the feedforward neural network is the single-hidden layer feedforward network (SLFN). There are many types of SLFN, including the back-propagation algorithm (BP) and its improved algorithm. Although SLFN is widely used, the parameters of SLFN need to be adjusted iteratively by using the gradient descent algorithm. When the training data is large, the gradient descent algorithm will lead to slow learning speed and long training time. Lowe [1] pointed out that the center of the radial basis function neural network (RBF) can be randomly selected and proposed the idea of randomly selecting SLFN parameters for the first time. And the random vector functional-link net (RVFL) [2] only needs to calculate the output weights. ELM [3,4] was proposed based on SLFN randomly which selects the input weights and thresholds of the hidden layer and calculated the output weights by using the least square method and the Moore-Penrose generalized inverse. The parameter setting is simple, the ability of generalization is better than the traditional SLFN algorithm, and the training speed is faster. Therefore, ELM has been used in many fields, and the current directions are as follows: (1) The dimensionality reduction and multiple hidden layers: Castano et al. [5] proposed the ELM based on PCA (PCA-ELM) by using principal component analysis (PCA) to achieve the dimensionality reduction. PCA-ELM removes the noise of data and simplifies the structure of the model, thereby improving the learning speed and modeling accuracy. An effective filter maximum relevance minimum redundancy (MRMR) [6] has the same function as PCA. The twohidden-layer ELM (TELM) [7] and the multiple hidden layer ELM (MELM) [8] were proposed by adding hidden layers. TELM and MELM calculate the weights and thresholds between the remaining hidden layers through the least square method.
(2) The solution of the optimal outer weights based on the swarm intelligence algorithm: the swarm intelligence algorithm simulates the group behaviors and the individual behaviors of animals and plants, which can optimize parameters. At present, the swarm intelligence algorithms that have been applied to ELM include the particle swarm optimization algorithm (PSO) [9] and the differential evolution algorithm (DE) [10]. The swarm intelligence algorithm regards weights and thresholds as individuals and finds the best individuals through the global search and local search.
(3) The selection of the optimal number of nodes in the hidden layer: the different numbers of nodes will lead to the different training errors of ELM, so it is important to obtain the number of nodes with the smallest training error. The current improvement methods include the pruning method, the increment method, and the adaptive method. Rong et al. [11] removed the nodes of the hidden layer with low correlation through statistical testing. Miche et al. proposed OP-ELM [12] that uses the fast leave-one-crossing algorithm to gradually solve the optimal nodes of the hidden layer and TROP-ELM [13] that adds the idea of norm regularization to OP-ELM. Huang and Chen proposed I-ELM [14] and EM-ELM [15]. The network with smaller nodes of the hidden layer is set firstly, and the number of nodes increases until the training error or the number of nodes reaches the preset value. Lan et al. proposed CS-ELM [16] and TS-ELM [17] by using different methods to screen randomly generated nodes of the hidden layer.
(4) The improvement of stability based on the ensemble learning: the main idea of ELM based on ensemble learning is to construct different ELM models and get the final output by use different methods. One is to obtain multiple ELM models on the same training set and combine them by weighted linear combination [18,19] and voting method [20]. Another way is to divide the whole training set into several small training sets and construct multiple ELM models based on different training sets [21,22]. The genetic ensemble of ELM proposed by Xue et al. [23] generates different ELM models by optimizing the parameters of the hidden layer and integrates the optimal models by sorting.
The paper studies the stability of ELM. Because of the random value of ELM, the stability of a single ELM model is not good, and the results and accuracy of different ELM models are slightly different. And the accuracy of ELM needs to be improved. Thus, the weight is used to deal with different ELM models to improve the stability and accuracy of ELM. Specifically, the ELM model with high accuracy has a larger weight, while the ELM model with low accuracy has a smaller weight. The weights of different ELM models need to get the optimal value, so that the combined model can get the best performance. The weight optimization generally uses the swarm intelligence algorithm, and the monarch butterfly optimization (MBO) [24], the moth search algorithm (MSA) [25], the Harris hawks optimization (HHO) [26], and the seagull optimization algorithm (SOA) [27] have appeared in recent years. The paper selects SOA proposed in 2019. The improvements of SOA include the evolutionary multiobjective seagull optimization algorithm (EMoSOA) [28], the whale optimization with seagull algorithm (WSOA) [29], the improved SOA (ISOA) [30,31], and the oppositionbased seagull optimization algorithm (OSOA) [32], which have changed a little.
Because of the high dimension of the near-infrared spectrum, the selection of band based on the pruning method is proposed to select sensitive bands, so as to achieve the purpose of dimension reduction. Based on the idea of the LU triangulation extreme learning machine (LU-ELM) [33] that uses LU decomposition instead of the traditional generalized inverse methods, the paper takes the two hidden layer extreme learning machines based on LU decomposition (LU-TELM) as the modeling method of a single ELM model. Aiming at the problems of poor stability and low accuracy of a single LU-TELM model, the paper proposes the twohidden-layer extreme learning machine based on LU decomposition and seagull optimization algorithm (LU-TELM-SOA). In a word, LU-TELM-SOA and the selection of band based on the pruning method are proposed in the paper. The second section explains the principles of LU-2 Journal of Sensors TELM and LU-TELM-SOA. The third section describes the prediction of TFe content of hematite, including the selection of band based on pruning method, LU-TELM, and LU-TELM-SOA. The fourth section concludes the paper.

Materials and Methods
2.1. Two-Hidden-Layer Extreme Learning Machine Based on LU Decomposition. TELM [7] calculates the weights and thresholds between the remaining hidden layers through the least square method. The input parameters and the structure parameters are explained in Figure 1.
TELM contains a single hidden layer, which is ELM. For the hidden layer, the output and weight are, respectively, 2.1.2. Second Stage of TELM. TELM contains two hidden layers. When another hidden layer is appended between the hidden layer of the first stage and the output layer, the prediction output is According to the weight β, the expected output is According to H1 * = H1, the structural parameters between the two hidden layers are Then, the final prediction output is The updated weight is Therefore, the predicted output of TELM is Unlike formula (6) using the generalized inverse to solve β, where M = H T 2 H 2 and b = H T 2 T. Therefore, the process of solving β can be replaced by solving a system of the linear equations in the form of formula (8). According to the matrix operation of the sequential Gauss elimination method, the coefficient matrix M can be decomposed into a unit lower triangular matrix L and an upper triangular matrix U, namely, According to the rules of matrix multiplication, by comparing the first row elements at both ends of formula (10), the paper can get The paper continues to compare the first column of elements at both ends of the equation to get Then, the paper compares the remaining elements in the second row at both ends of the equation to get The paper continues to compare the remaining elements in the second column at both ends of the equation to get Then, the paper continues like this. After calculating the elements in the first i − 1 row of U and the elements in the first i − 1 column of L, the paper can continue to calculate the elements in the i row of U and the elements in the i column of L. The calculation formula is as follows: After solving for L and U, the paper can see from formulas (8) and (9) The paper solves the first linear equations. By comparing the elements at both ends of the equation, the calculation formula for the element y k of y is where b k is the element at the corresponding position of b. Then, the paper continues to solve the second linear equation system. By comparing the elements at both ends of the equation, β n = y n u nn , Finally, the output weight β of LU-TELM is solved.
2.2. Seagull Optimization Algorithm. SOA [27] simulates the migration and attack behavior of seagulls. The migration is the global search, and the attack is the local search.

Migration.
To prevent collisions, SOA uses an additional variable A to represent the motion behavior of seagulls.
In formula (19), f C makes A decreases linearly, and the range of A is [2,0]. And t is the current number of iteration and t = 0, 1, 2, ⋯, max. The current position is P S ðtÞ, so the new location is C S ðtÞ = A * P S ðtÞ. For seagulls, the direction of the best position is In formula (20), B = 2 * A 2 * r d balances the global search and local search. And r d is a random value and r d ∈ ½0, 1. The best location for the population is P bS ðtÞ. According to M S ðtÞ, the new position of the seagull after moving to the best position is 2.2.2. Attack. Seagulls prey through spiral motion, and the motion in the XYZ plane is described as In formula (22), ris the spiral radius. And θ is a random angle value and the range is ½0, 2π. And u and v are the correlation constant. Therefore, the attack position of the seagull is 2.3. Two-Hidden-Layer Extreme Learning Machine Based on LU Decomposition and Seagull Optimization Algorithm. Since the hidden layer thresholds and the input weights of LU-TELM are given at random, the model is different each time and the accuracy of prediction is also different. In response to this problem, the idea of optimization is adopted to assign different weights to the different LU-TELM models. The model with a good prediction effect has a greater weight. Thus, the soft-sensing model with higher forecast accuracy and higher stability is established.

Journal of Sensors
The basic idea of LU-TELM-SOA is to optimize the weights of multiple LU-TELM models through SOA, so that the final error of the model is minimized. If the weight of a single model is too large, the final error will increase and the stability of the model will decrease. Therefore, the paper specifies the range of the weight parameters as ½0, mðm ∈ ½0:15,0:50Þ. The number of LU-TELM models is NðN ≥ 2Þ. Finally, the different weights of each model are accumulated based on the optimal weights obtained by SOA and summed to obtain the LU-TELM-SOA model. The ultimate objective of LU-TELM-SOA is to reduce the error between the predicted output and the actual output.
In formula (24), x is the expected output of LU-TELM-SOA, and x 1 , x 2 , ⋯, x N are the predicted output of LU-TELM. And a 1 , a 2 , ⋯, a N are weights of LU-TELM.
The flow chart of LU-TELM-SOA is shown in Figure 2.

Results and Discussion
3.1. Acquisition and Processing of Sample. The collection area of the iron ore sample in the paper is the Anshan Iron Mine in China, which is widely distributed and has large reserves. First, the core was obtained along the orientation of the band created by the siliceous and iron, and the samples were obtained by cutting in the direction perpendicular to the core column. Then, the sample was ground, and the spectral data was obtained by using the SVC HR-1024 portable spectrometer. At the same time, the chemical methods were used to obtain the TFe content of the sample. Finally, the number of hematite samples obtained was 91. As shown in Figure 3, the spectral curves of hematite are parallel, but there are serious data fluctuations around 1900 nm and 2500 nm. To remove the interference of data fluctuations, the paper constructs the fluctuating residual to eliminate the bands with severe data fluctuations. The fluctuating residual is defined as the absolute error between adjacent bands, namely, In formula (25), i = 2, 3, ⋯, N represents the i-th band in the sample, j = 1, 2, ⋯, M represents the j-th sample, GPði, jÞ represents the reflectance of the j-th sample in the i-th band, GPði-1, jÞ represents the reflectance of the j-th sample in the i-1-th band, and E C ði, jÞ represents The fluctuation residual of the i-th band of the j-th sample.
When the data fluctuates normally, the fluctuating residual fluctuates around 0 and the floating range is small or changes regularly. On the contrary, it floats around 0 and the floating range is large, with irregular changes. Due to a large number of samples and the abnormal samples have been deleted, either the mean value of fluctuating residual or the median value of fluctuating residual can be used. If no abnormal samples are deleted, the median value of fluctuating residual is used.

Journal of Sensors
It can be seen from Figure  The spectral data has a total of 937 bands without serious data fluctuations, and each band has a different effect on the  The spectral data after deleting the abnormal wave band. Journal of Sensors TFe content. If it is the sensitive band, the evaluation index of the model will be significantly reduced after the band is deleted. Otherwise, it will be significantly improved, which the evaluation index is better than the limiting condition. Based on the idea, the RMSEc of the TELM model becomes larger (and R2c becomes smaller), that is, the limiting condition is better than the evaluation index of the model, after the band is deleted. And the sensitive bands are exhaustively selected. RMSE and R are the limiting conditions, and RMSEc and R2c are the evaluation index of the model. As shown in Table 1, the bands meeting the selected condition need to be retained. The paper uses the TELM and sets the ratio of the training set to the test set to 3 : 1. The input is all sensitive bands, and the output is the TFe content. The number of neurons in the hidden layer is 1000, and the activation function is the dsig function (gðxÞ = ð1 − e −x Þ/ð1 + e −x Þ). For the screening of a single wave band, the optimal result is selected through 1000 cycles. Let RMSE = 3:1648 and R = 0:7913. The paper takes PCA-TELM as a comparison. The number of principal components input by PCA-TELM is 10 because the cumula-tive contribution rate is greater than 99%. And the remaining parameters are the same as TELM. The results show that TELM with the band as input is better than PCA-TELM. As shown in Table 2, the TELM with 351 bands is selected based on the principle of simplifying the network structure. And the result of the selection of band based on the pruning method is shown in Figure 5. 3.3. LU-TELM. LU-TELM uses LU decomposition to solve β, the input is 351 bands, and the other settings are the same as the TELM with 351 bands input. As shown in Table 3, the experimental results show that LU-TELM is better than TELM.
3.4. LU-TELM-SOA. Since the weights and thresholds are randomly selected, LU-TELM is unstable. Because the       Journal of Sensors performance of each model is different, they are assigned to different weights. In the paper [27], the seagull optimization algorithm (SOA) was proposed, and the SOA is better than the particle swarm optimization algorithm (PSO), the genetic algorithm (GA), and so on. Then, the SOA is used to find the best weight distribution. The paper sets the parameters of SOA based on [27], and f c = 2, u = 1, and v = 1. The population of seagulls is better at 100 [27], and the paper sets n = 1000 to improve the optimization performance. The larger the maximal number of iterations is, the better the model is, so the paper sets max = 1000. Let P = xyzC1 and C1 as the adjustment factor, and C1 = 0:000001. For point C beyond the search space, if C > m then C new = jC − mj. If C ≤ 0, then C new = −C. The paper stipulates that the range of NðN ≥ 2Þ weight parameters is ½0, mðm ∈ ½0:15,0:50Þ. In SOA, RMSEc is used as the fitness function. When m = 0:4, the performance of the model with different number of LU-TELM models is compared. When N = 15, RMSE C will be greater than 10 if m = 0:4, which is abnormal. Thus, the paper chooses the model with m = 0:2 and the best performance. When N = 10, the performance of LU-TELM-SOA is the best (Figure 6 and Table 4).
In the paper, PSO and GA are used to replace SOA, and LU-TELM-PSO and LU-TELM-GA algorithms are proposed. The paper compares the performance of LU-TELM-SOA, LU-TELM-PSO, and LU-TELM-GA under the same conditions, where the same conditions refer to N = 10 and m = 0:4. The inertia coefficient of PSO is 1.0 (ω = 1:0), and the acceleration constant of PSO is 1.0 (c1 = c2 = 1:0). The crossover probability of GA is 0.4, and the mutation probability of GA is 0.2. The other parameters of PSO and GA are the same as SOA. The experimental results show that RMSEc of LU-TELM-SOA is better than that of LU-TELM-PSO and LU-TELM-GA, and LU-TELM-SOA can avoid falling into local minimum too early (Figure 7). According to the performance of the model from good to bad, the order is LU-TELM-SOA, LU-TELM-PSO and LU-TELM-GA in the paper (Table 4).
For the range of weight parameters, there is no obvious performance advantage or disadvantage in m ∈ ½0:15, 0:50 ( Figure 8 and Table 5).

Conclusions
In the paper, the selection of band based on the pruning method and LU-TELM-SOA is used to establish the model of TFe content of hematite with higher accuracy and better stability. The 351 sensitive bands are screened out by the selection of band based on the pruning method, and TELM with 351 sensitive bands input is better than PCA-TELM. And the LU-TELM is superior to TELM, and LU-TELM-SOA proposed in the paper solves the stability problem of LU-TELM. Finally, the RMSEc of the TFe content of hematite obtained was 1.7109, achieving high accuracy. The SOA algorithm used in the paper has the inherent problems of  9 Journal of Sensors swarm intelligence algorithm, and it needs to be improved in the future. For example, the swarm intelligence algorithm has some problems in the balance of global search and local search. And it is also a research direction for choosing swarm intelligence algorithms with better performance than SOA.

Data Availability
The research data of the paper can be obtained from the corresponding author. The software of data processing and programming is Matlab R2018b.