Adaptive Hybrid Soft-Sensor Model of Grinding Process Based on Regularized Extreme Learning Machine and Least Squares Support Vector Machine Optimized by Golden Sine Harris Hawk Optimization Algorithm

Soft-sensor technology plays a vital role in tracking and monitoring the key production indicators of the grinding and classifying process. Least squares support vector machine (LSSVM), as a soft-sensor model with strong generalization ability, can be used to predict key production indicators in complex grinding processes. +e traditional crossvalidation method cannot obtain the ideal structure parameters of LSSVM. In order to improve the prediction accuracy of LSSVM, a golden sine Harris Hawk optimization (GSHHO) algorithm was proposed to optimize the structure parameters of LSSVM models with linear kernel, sigmoid kernel, polynomial kernel, and radial basis kernel, and the influences of GSHHO algorithm on the prediction accuracy under these LSSVMmodels were studied. In order to deal with the problem that the prediction accuracy of the model decreases due to changes of industrial status, this paper adopts moving window (MW) strategy to adaptively revise the LSSVM (MW-LSSVM), which greatly improves the prediction accuracy of the LSSVM.+e prediction accuracy of the regularized extreme learningmachine with MW strategy (MW-RELM) is higher than that of MW-LSSVM at some moments. Based on the training errors of LSSVM and RELM within the window, this paper proposes an adaptive hybrid soft-sensing model that switches between LSSVM and RELM. Compared with the previous MW-LSSVM, MW-neural network trained with extended Kalman filter(MW-KNN), and MWRELM, the prediction accuracy of the hybrid model is further improved. Simulation results show that the proposed hybrid adaptive soft-sensor model has good generalization ability and prediction accuracy.


Introduction
As one of the most important operating procedures in a beneficiation plant, the grinding and classifying process is to grind larger-sized metal ore to a reasonable size, expose useful metal components in the ore, and prepare for the next stage of the flotation process [1]. Some production indicators (such as granularity, impurity content, and iron content) become decisive indicators of whether the grinding process can be performed normally and will have an important impact on subsequent ore processing processes (especially flotation operations) [2]. erefore, it is essential to monitor key production indicators in real time. Due to technical and economic conditions and the harsh industrial environment, there is no effective sensor to monitor the variables related to product quality in real time, which may reduce industrial control performance, increase production costs, and cause obstacles to green and healthy production [3]. With the development of data acquisition technologies and data processing algorithms, data-driven models have established it as an effective tool to solve this problem by estimating online variables that are difficult to measure directly [4]. ese data-driven models are often referred to as "datadriven soft sensor," which is an important application of data analysis techniques designed to estimate difficult-tomeasure variables by using easy-to-measure variables [5]. In the past ten years, soft-sensor technology has been widely used in industrial data modeling due to its low cost, high implementation efficiency, and easy operation [6][7][8][9].
In the grinding and classifying process, many experts and scholars have made great contributions to the development of soft-sensor models for the grinding process. In the initial stage of soft-sensor modeling of the grinding process, due to the limitations of theoretical knowledge and calculation tools, some researchers (such as Casali) tried to establish a mechanism model to predict the granularity of the grinding process [10][11][12]. When establishing the mechanism model, the entire grinding process needs to be carefully analyzed, and a large number of algebraic and differential equations are used to describe the entire grinding process.
e mechanism model needs to assume that all external conditions need to be in an ideal state. However, the actual industrial environment is complex and changeable, which will cause a large deviation in the mechanism model and the prediction accuracy will be greatly reduced. erefore, in recent years, few researchers are willing to use mechanism soft-sensor models for grinding and classifying process. Some scholars use the idea of case-based reasoning to build soft-sensing models [13][14][15]. e principle of casebased reasoning is to compare the similarity between the test samples and the existing samples in the database and infer the output value of the test samples from the most similar training samples in the database. is method is simple in theory, high in implementation efficiency (less time required to calculate the output value of the test sample), strong in antiinterference, and has certain practicability. is method requires accurate classification of training samples based on their numerical characteristics. When there are many samples in the database, it is difficult to find a suitable clustering algorithm to accurately classify the samples. is inaccurate classification will cause case-based reasoning to have a large error when inferring the output of the test sample, which will reduce the prediction accuracy of the model. Similar to the case-based reasoning model, Yan et al. built a soft-sensing model of uncertainty reasoning relationships based on the cloud model [16], which is more robust, especially when part of the training data is missing and still guarantees that its prediction accuracy will not decrease significantly. However, this method still faces the problem of fuzzy membership function partitioning of the sample. When the membership of the samples is not accurate, the prediction accuracy of the model will still decrease. With the continuous progress of statistical theory, some scholars have established soft-sensor models of the grinding process using a nonlinear multiple regression model. Partial least squares (PLS) model is used by some experts in soft-sensor modeling of grinding process due to its nonlinear fitting ability [17][18][19]. A common disadvantage of multiple regression models is that they are particularly sensitive to changes in the external environment. When the external environment changes, the existing model parameters will not accurately reflect the current state of industrial production, which will result in a decrease in the prediction accuracy of the model and a problem of "model degradation" [20]. Tang et al. selected multiple PLS models as the soft-sensor model on the grinding process to predict the load parameters of the ball mill [21,22], which determines the number of PLS submodels based on multiple features of the sample set, uses an adaptive weighted fusion algorithm to integrate multiple PLS results, and obtains output so as to improve the adaptability of the soft-sensing model and the adjustment ability when the industrial status changes suddenly. However, the model parameters that need to be determined by this method will increase geometrically, which increases the computational burden of the model, and the training time of the model is also greatly increased.
Compared with other types of soft-sensor models, a neural network soft-sensor shows greater advantages in which the neural network model only needs to focus on the input and output of the model and does not need to pay too much attention to the specific industrial production process. Most neural network models have strong generalization ability. Once the number of neurons is determined, only the appropriate model parameters need to be found to track key industrial production indicators [23]. erefore, more experts and scholars are willing to study neural network models to set up soft-sensing models for grinding and classifying process. Common neural network soft-sensing models include BP neural network [15,24,25], RBF neural network [26][27][28][29][30], support vector machine (SVM) [31,32], and extreme learning machine (ELM) [33]. e traditional training methods of these neural networks often cannot find suitable model parameters, which makes the prediction accuracy of the model unable to be further improved. Natural heuristic algorithms provide better ideas for solving this problem. e neural network is treated as a nonlinear function, the natural heuristic algorithm optimizes the error function formed by the nonlinear function, and the model parameters corresponding to the optimal value of the error function are reasonable. Some scholars applied this kind of algorithms to find the parameters of neural network models and obtained better experimental results [34][35][36]. Xie et al. proposed an improved black hole algorithm (GSLBH) with strong optimization ability [37] and used GSLBH to find suitable weights and biases for ELM models with different kernel functions [23]. e experimental results show that GSLBH-ELM has better prediction ability under the condition of stable industrial status. In many industrial environments, the characteristics of the equipment and other processing behaviors frequently change, such as equipment aging, and the change of industrial raw materials [6]. is working condition change will make the optimization algorithm no longer able to find suitable structural parameters for the neural network model and the accuracy of the model is difficult to improve by this method. Wang et al. introduced the idea of model migration to wavelet neural network, which can revise structural parameters as the environment changes [38].
is method requires frequent updating of model parameters, which increases the computational burden. Dai et al. [39] proposed a robust random vector functional link network, which can maintain its generalization ability without major changes in the face of changes of the industrial status. Some other scholars have proposed a hybrid soft-sensing model combining multiple neural 2 Complexity networks [2,40], whose main advantage is that it can play the prediction capabilities of different neural networks in different states so that the prediction accuracy of the model is always maintained at a high level. However, the fatal disadvantage of this model are that there are too many model parameters, the training time of the model is too long, and the working efficiency of model is extremely low, which are contrary to the efficiency of the soft-sensor model. How to make the most of the advantages of the hybrid model while ensuring the implementation efficiency of the soft sensor has become a problem to be solved. Compared with the traditional SVM model, the least squares SVM (LSSVM) is widely used to predict key production indicators in the steel and chemical industries due to its small training volume, low training difficulty, and strong generalization ability [41]. LSSVM transforms the quadratic programming problem of SVM parameters into linear equations by constructing the constraints of the equation, which reduces the difficulty of model solving [42]. However, the two types of parameters (the penalty factor and kernel parameter) which affect the prediction accuracy of the model in LSSVM need to be set manually [43]. In the LSSVM model, the reasonable selection of penalty factors and kernel function parameters becomes the key to whether the prediction accuracy of the model can meet the requirements. e most common parameter selection method is crossvalidation [44], the model parameters obtained by this method have strong limitations, and the optimal parameter combination is likely to fall far short of prediction accuracy. In contrast, the natural heuristic algorithm provides a better solution for finding the optimal model parameters of LSSVM.
e Harris Hawk optimization (HHO) algorithm is a new type of natural heuristic algorithm proposed by Mirjalili et al. [45]. is algorithm has strong exploration and weak exploitation, and it has been proved in Xie [37] that the golden sine operator can expand the search range of agents, so this paper introduces the golden sine operator to the HHO (GSHHO) when the HHO is switched to the exploitation phase, which expands exploitation range of the algorithm and enhances the exploitation of the algorithm.
is paper uses GSHHO algorithm to optimize the error function of LSSVM, and the model parameters corresponding to the minimum value of the obtained error function are used as the optimal parameter of LSSVM. In order to explore the optimization ability of GSHHO algorithm for LSSVM with different kernel functions, GSHHO algorithm is used to optimize LSSVM models with linear kernel, sigmoid kernel, polynomial kernel, and radial basis kernel, and the prediction results of different LSSVMs are compared.
e model with the best prediction result is currently the most ideal LSSVM soft-sensor model. rough the observation and analysis of the dataset, the industrial status selected in this paper has been in dynamic changes. erefore, local LSSVM models are established by using moving window (MW) strategy to predict the ore granularity at different periods.
is MW-LSSVM model can respond well to changes of industrial status. At the same time, it was found that the regularized extreme learning machine (RELM) based on MW strategy (MW-RELM) also obtained similar prediction results as MW-LSSVM. In order to take advantage of different neural network soft-sensor models in different situations, based on the MW strategy, this paper proposes a hybrid soft-sensor model with LSSVM and RELM continuously switching (MW-LSSVM-RELM). Because the RELM model has the characteristics of extremely short training time, the introduction of RELM into MW-LSSVM does not affect the implementation efficiency of the soft-sensing model, and at the same time improves the prediction accuracy of the soft-sensing model. Neural network trained with extended Kalman filter (KNN) has proven to be a predictive model with strong generalization ability [46,47]. is paper compares the experimental results of KNN with LSSVM and RELM and also compares the experimental results of MW-KNN with MW-LSSVM-RELM. Finally, simulation experiments prove that this hybrid model based on moving window strategy is more practical. e rest of this paper is organized as follows. Section 2 introduces LSSVM and RELM, Section 3 introduces GSHHO algorithm, Section 4 introduces the working principle of the hybrid model proposed in this paper, Section 5 conducts simulation experiments, and Section 6 draws conclusions.

Least Squares Support Vector Machine Regression Model.
e core idea of the support vector machine is to derive an estimate f(x) of the unknown relationship y � f(x) between the input variable x and the output variable y in a given training set: where x ∈ R N×p , N is the number of samples, p is the dimension of the variables, f(x) ∈ R N×1 , and φ(x) represents a high-dimensional nonlinear mapping of x, w, and b are constants. Let the given training sample set x � (x 1 , y 1 ), . . . , (x N , y N ) , x i and y i are the corresponding input and output of the ith sample. e optimization problem of SVR can be expressed as follows: e unequal constraints of this function can be expressed as follows: where ξ i and ξ * i are slack variables, ε > 0 is the allowable error threshold, and C is the penalty factor.
Compared with the original SVR, LSSVM regression replaces the inequality constraint with the linear least squares criterion and applies it to the loss function, whose objective function can be expressed as follows: At this time, the constraints of the objective function become e following Lagrange function can be constructed from mathematical theoretical knowledge: where α i (i � 1, 2, . . . , N) is the Lagrangian multiplier. So, the optimal KKT condition of L(w, b, e, a) can be expressed as follows: By calculating the variables w and e i , the optimization problem can be transformed into the following linear equations: After solving the equations shown in equation (8), the final result of LSSVM regression can be expressed as follows: where α i and b are obtained by solving the linear equations in equation (8). In equation (10), K(x, x i ) is a kernel function that satisfies the Mercer condition. e kernel functions commonly used in LSSVM are shown in Table 1 [48], which give the parameters that LSSVM needs to be manually adjusted in the case of different kernel functions. C in Table 1 is the penalty factor for LSSVM. e method of adjusting these parameters is generally crossvalidation. is method has a simple idea and is easy to operate. However, the limitation of this method is large, and the obtained structural parameters are probably not optimal solutions. Natural heuristic algorithm provides a good solution for finding the optimal parameters of the model. It works by optimizing the error function of the model, and the parameter corresponding to the minimum value of the obtained error function is the optimal parameter. Commonly used algorithms include genetic algorithm (GA) and particle swarm optimization (PSO) algorithm, but a large number of experiments have proved that many algorithms have better optimization capabilities than GA and PSO [37,45,49]. is paper proposes a golden sine Harris Hawk optimization algorithm to select reasonable structural parameters for the LSSVM soft-sensor model. is paper will use the golden sine Harris Hawk algorithm to optimize the LSSVM with the four kernel functions in Table 1 and compare the experimental results of these four LSSVMs. e specific implementation steps are described in the following sections.

Regularized Extreme Learning Machine Regression Model.
Extreme learning machine (ELM) is a single hidden layer feed-forward network (SLFN) proposed by Huang et al. [50].
is neural network has a simple structure, extremely short training time, efficient execution, and good generalization ability [4,23,51,52]. Principles of ELM training and prediction have been explained in details in [23]. e key work of training ELM is to solve the weight β between the hidden layer and the output layer, and the calculation of β is realized by where H + is the Moore-Penrose generalized inverse of the hidden layer output matrix H, H + can also be expressed as (H T · H) − 1 · H T , and T represents the output matrix. After solving the structural parameters, the trained model can be used to predict the output variables. e main disadvantage of ELM is its instability, it is too sensitive to the interference of data noise, and it is prone to overfitting. To solve this difficulty, Deng et al. [53] introduced regularization parameters when calculating the weight β, which greatly enhanced the generalization ability of the ELM model and made the ELM model more practical. According to statistical learning theory, the actual predicted risk of learning is composed of empirical risk and structural risk. A model with good generalization capabilities should make the best trade-off between the two risks. erefore, the actual risk can be expressed by the weighted sum of these two risks. By introducing a weighting factor c for empirical risks, their ratio can be adjusted [53]. Empirical risk can be expressed by squared error ‖ε‖ 2 . Structural risk is represented by variables ‖β‖ 2 that maximize the distance to the interface [54][55][56]. e mathematical model of the regularized ELM model can be expressed as follows: where N is the number of samples, N is the number of neurons, and ε � [ε 1 , ε 2 , . . . , ε N ]. To solve the optimization problem shown in equation (12), the following Lagrangian equation can be constructed: where α i is the Lagrangian multiplier. e optimal KKT condition of L(β, ε, α) can be expressed as follows: ree sets of equations in equation (14) can be combined to calculate the final expression of β: e ELM that obtains the weight β between the hidden layer and the output layer through equation (15) is called a regularized ELM (RELM). Adjustment of c can adjust the ratio between empirical risk and structural risk. e model will obtain the best generalization ability when these two risks reach the optimal compromise [53]. erefore, compared with traditional ELM, RELM has better anti-interference ability, stronger generalization ability of neural network, and higher prediction accuracy. e implementation steps of RELM are described as follows: Step 1: determine the number of hidden layer neurons N of the RELM model according to the size of the samples, and determine the activation function g(x).
Step 2: given a training set of samples, randomly initialize weights w i (i � 1, 2, . . . , N) and biases values b i (i � 1, 2, . . . , N) between the input layer and the hidden layer and calculate the output matrix H of the hidden layer.
Step 3: calculate the weight β between the hidden layer and the output layer according to equation (15).
Step 4: input the query set into the RELM model to obtain the output.

Improved Harris Hawk
Optimization Algorithm [23]. It has the characteristics of strong exploration and extremely high efficiency. erefore, once HHO algorithm was proposed, it attracted the attention of many scholars [57,58]. According to the current physical energy consumption of the prey, Harris Hawk will adopt different hunting strategies to capture the prey. Let energy of the current prey be E, and it can be calculated by

Harris Hawk Optimization Algorithm. e Harris Hawk optimization (HHO) algorithm is a new natural heuristic algorithm proposed by Mirjalili et al. inspired by the hunting behavior of Harris Hawk
where E 0 is a random number between (− 1, 1), t is the current number of iterations, and T is the specified total number of iterations. When E 0 is between − 1 and 0, prey is physically flagging. When E 0 is between 0 and 1, prey is strengthening. It can be seen from equation (16) and Figure 1 that as t increases, the prey energy E decreases. Harris Hawk will judge whether it should keep exploration phase or switch to the exploitation phase based on the E. When |E| ≥ 1, the prey has more energy, the Harris Hawk is in the exploration phase, and the position update strategy in the exploration phase can be realized by where LB and UB represent upper and lower bounds, r 1 , r 2 , r 3 , r 4 , and q are random numbers between 0 and 1, X(t) and X(t + 1) represent the current location of the Harris Hawk and the location of the next iteration, respectively, X rand (t) represents the current position of an agent in the Harris Hawk population, X pery (t) represents the current position of the prey (the current optimal location),

Kernel function Expression
Parameters to be adjusted Linear and X m (t) represents the coordinate average of all agents in the Harris Hawk population, whose calculation formula is described as follows: where N is the population size. When |E| < 1, Harris Hawk transforms into the exploitation phase and began to capture the prey. Scientific research shows that Harris Hawk has seven ways to attack the prey [59]. In HHO, only four methods of attacking prey were selected: "Soft besiege," "Hard besiege," "Soft besiege with progressive rapid dives," and "Hard besiege with progressive rapid dives." A random number r ∈ (0, 1) is also defined in the process of the eagle attacking the prey, which indicates whether the prey successfully escaped.
In the exploitation phase, when r ≥ 0.5 and |E| ≥ 0.5, the prey has enough energy, but still fails to escape. At this time, the Harris Hawk attacked the prey with a "soft besiege" method. e movement of Harris Hawk is expressed as follows: where r 5 is a random number between 0 and 1. e other parameters have the same meaning as those in equation (17). When r ≥ 0.5 and |E| < 0.5, the prey has no enough energy to escape and the Harris Hawk uses a "hard besiege" method to attack the prey. e movement of Harris Hawk can be expressed as follows: where ΔX(t) has the same meaning as the parameter in equation (19). When r < 0.5 and |E| ≥ 0.5, the prey is difficult to be caught by the Harris Hawk. Harris Hawk uses a more clever method "Soft besiege with progressive rapid dives" to capture the prey. e movement pattern of this attack method is described as follows: where J, X prey , and X(t) have the same meaning as in equation (19), D represents the dimension of the agents, and S represents a 1 × D random matrix, and Levy means Levy flight function. e laws of movement of most animals in nature conform to the Levy flight function [60], and the expression of the Levy flight function is expressed as follows: where u and v are a random number between 0 and 1 and the value of β defaults to 1.5. In the end, the Harris Hawk determined the movement method of the next iteration according to the fitness value of the objective function as follows: where F(x) is the calculated fitness function value, and other parameters have the same meanings as the parameters in equation (21). When r < 0.5 and |E| < 0.5, although the prey is exhausted, it still tries to escape the capture of Harris Hawk. At this time, Harris Hawk uses "Hard besiege with progressive rapid dives". e movement pattern of this attack method is expressed as follows: where X m (t) has the same meaning as in equation (17), and the remaining parameters have the same meaning as in equation (21). Similar to "Soft besiege with progressive rapid dives", in the end, the Harris Hawk determines the movement pattern of the next iteration according to the fitness value of the objective function shown in equation (25). e implementation flowchart of Harris Hawk optimization algorithm is shown in Figure 2.

Golden Sine
Operator. e golden sine operator is a search strategy derived from the golden sine algorithm (Gold-SA) [60]. is operation can expand the search range of agents. From the geometric meaning of the trigonometric function, it can be known that the process of continuously moving a point on a sine function is equivalent to the process of continuously scanning this point on a unit circle, but in Gold-SA, agents do not move according to a standard sine 6 Complexity curve, but move according to a gold sine curve [37]. During each iteration, the ability that agents search within a certain local improved is expanded, when the agents move according to this golden sine curve. is golden sine curve path shape in two-dimensional space is shown by the red curves in Figure 3 [37].
In Gold-SA, the agents to search according to the golden sine curve can be expressed as follows: where r 1 is a random number between 0 and 2π, r 2 is a random number between 0 and π, X best (t) is the current optimal location, and m 1 and m 2 is the coefficient obtained by the gold selection method. is method can reduce the search space of the agents, improve the search efficiency of the agents, and make the agents move to the target location faster. In order to simplify the operation and improve the execution efficiency of the algorithm, Xie et al. [37] assign m 1 and m 2 in equation (26) with suitable constants. is approach guarantees that the search ability of the agents will not weaken, while increasing the stability of the algorithm. e constant values given to m 1 and m 2 are set as follows: when m 1 and m 2 in equation (25) becomes constants in equation (27), the movement mode shown in equation (26) becomes the golden sine operator.

Golden Sine Harris Hawk Optimization (GSHHO)
Algorithm. HHO is an algorithm with very strong exploration. Its global search mechanism can effectively reduce the possibility of the algorithm falling into a local optimum. However, when HHO enters the exploitation phase, its exploitation is not particularly strong. In order to expand the exploitation range of HHO, this paper introduces the golden sine operator into the exploitation phase of HHO to enhance its exploitation. During each iteration, when |E| < 1, all agents first perform the golden sine operation in equation (26) and then select one of four exploitation methods for the next operation according to the size of r and E. Compared with the implementation steps of the original HHO in Figure 2, GSHHO only adds a golden sine search operation to the exploitation phase, but the exploitation of the algorithm has been greatly improved. e pseudocode of GSHHO algorithm is described as follows: Determine the number of populations as N and the number of iterations as T; Initialize population X i (i � 1, 2, . . . , N); while t < T Determine the optimal position according to the fitness value of all agents, and specify this optimal position as the position of the current prey; Calculate the value of E according to formula (16); if |E| ≥ 1 e agents perform exploration according to formula (17); if |E| < 1 e agents perform golden sine search according to formula (26); if r ≥ 0.5 , |E| ≥ 0.5 e agents perform exploitation according to formula (19); else if r ≥ 0.5 , |E| < 0.5 e agents perform exploitation according to formula (20); else if r < 0.5 , |E| ≥ 0.5 e agents perform exploitation according to formula (23); else if r < 0.5 , |E| < 0.5 Complexity e agents perform exploitation according to formula (25) t � t + 1 Output the optimal value and optimal location found by HHO

Golden Sine Algorithm to Optimize LSSVM Model.
As mentioned at the end of Section 2.1, using natural heuristics algorithm to select the structural parameters of LSSVM is a good idea. is paper adopts GSHHO algorithm to optimize the structural parameters of LSSVM with 4 kernel functions listed in Table 1. It can be seen from Table 1 that when the kernel function of LSSVM is different, the parameters to be adjusted will also be different, and the different parameters are the internal parameters of the kernel function. For each different kernel function, the parameter set to be adjusted is Θ. For example, when the kernel function is rbf function, Θ � C, σ 2 . Let the error function formed by LSSVM be where n represents the number of training samples, X i represents the input of the training samples, Y i represents the actual output corresponding to the training sample X i , and f represents the unknown nonlinear relationship derived from input X i and output Y i in the LSSVM model. When using GSHHO algorithm, the size of Θ determines the dimensions of the solution space, and error function is the objective function that GSHHO algorithm needs to be optimized. e smaller the objective function value found in the solution space, the smaller the training error of the LSSVM model and the higher the prediction accuracy of model for the query samples. When the iteration is completed, the optimal position obtained is optimal Θ found by GSHHO algorithm. When the different kernel functions listed in Table 1 are optimized, the results that GSHHO optimize LSSVM models will definitely be different. In the subsequent simulation experiments, this paper will compare the prediction results of GSHHO-LSSVM models with different kernel functions and find the optimal GSHHO-LSSVM model and the corresponding optimal parameters. e whole process of GSHHO-LSSVM model predicting query sample is divided into the following steps: Step 1: construct the error function using the training sample set (X 1 , Y 1 ), . . . , (X n , Y n ) and any given Θ according to the function construction rule of f in Section 2.1 Step 2: find the minimum value of the error function and the coordinate Θ best corresponding to the minimum value according to the GSHHO algorithm implementation rule shown in Section 3.2.2 Step 3: after the iteration, output the optimal position Θ best and use Θ best as the optimal model parameter of the LSSVM Step 4: train the LSSVM model again using the optimal parameter Θ best and the training sample set (X 1 , Y 1 ), . . . , (X n , Y n ) Step 5: input the test samples to the trained LSSVM model to obtain prediction results

Moving Window Strategy.
In the actual grinding and classifying process, changes in various external conditions will cause the industrial process to change at any time (such as different types of ore, aging of processing equipment, and frequent switching of equipment working modes), and historical databases cannot contain all possible states and conditions in the future industrial process [61]. When the above problems occur, the existing soft-sensor models can no longer accurately describe the current status of industrial production, which will cause the prediction accuracy of the model to decrease. In order to cope with the frequent changes of industrial status, this paper introduces moving window (MW) strategy to adaptively revise the parameters of the soft-sensor model. e idea of MW strategy was 8 Complexity proposed by Michalski et al. [62], and its main work is to select a specific set of data to adjust the parameters of the soft-sensor model. In the vast majority of cases, a fixed amount of the latest data is considered the most relevant to the current state of the industry [20]. erefore, when predicting the current production indicators, some training samples that are most relevant to the current moment can be selected from the database to build a local soft-sensor model, and the prediction value of such a local model for the current production indicator is likely to be closest to the real value. e implementation process of the MW strategy is shown in Figure 4.
Suppose the size of the window is L and the moving step is S, the training set in the window at the initial moment is Model y � f ori (x) is trained according to D ori , and the predicted value of a test sample t 1 that is closest to D ori in time is When new samples are added, the window will slide forward according to the step size S. At this time, the training set in the window can be expressed as follows: Model y � f 1 (x) is trained according to D 1 , the predicted value of a test sample t 2 that is closest to D 1 in time is e model training phase of the MW strategy can be repeatedly applied to the online phase without the need for other online algorithms, which means that it can be combined with all existing soft-sensor models [20] and these advantages also make the MW strategy become a very popular adaptive strategy.

Hybrid Soft-Sensor Model Based on Moving Window
Strategy. Section 3.2.3 details the process of GSHHO algorithm to optimize LSSVM. Compared with the traditional crossvalidation method, GSHHO algorithm can find more suitable parameters for LSSVM, which further improves the prediction accuracy of the LSSVM soft-sensor model. However, in the face of changes in the working conditions, the ability of algorithm to optimize LSSVM has a certain limit. Because a hyperplane formed by single LSSVM model cannot characterize all industrial status in the training sample, this will cause a bottleneck in improving the accuracy of the soft-sensor model. MW strategy can effectively solve this problem. Based on the optimal LSSVM structural parameters discovered by GSHHO algorithm, the introduction of MW strategy into LSSVM will further improve the model prediction accuracy. When a current query sample is needed to be predicted, LSSVM selects the window data closest to the query sample to build a local model and uses this local model to predict the query sample. When the window slides m times, m local models of LSSVM will be established, which also means that m hyperplanes are used to characterize the industrial status at different time periods, and the hyperplanes of these local models can more realistically reflect the industrial conditions of certain periods and improve the prediction accuracy of the models. e specific implementation process of MW-LSSVM is as described in formulas (29)-(23) in Section 4.1. However, the MW strategy has a major drawback: the computational burden increases when training the model. If the window slides m times, the soft-sensing model needs to be retrained m times, so it is necessary to maintain a balance between improving the prediction accuracy of the model and reducing the calculation burden. For the regression model of SVM, one of its biggest advantages is that the training time of the model is very short after the structural parameters are determined. LSSVM is more simplified and training time is shorter than traditional SVM training process. erefore, although the calculation load of LSSVM is increased after the introduction of MW strategy, it is far less than the acceptable training time. is advantage of LSSVM further enhances the utility of MW-LSSVM. It is found that the RELM model can obtain good prediction results similar to the MW-LSSVM model after the MW strategy is introduced. In order to further improve the prediction accuracy of MW-LSSVM, this paper proposes a hybrid model of LSSVM and RELM based on MW (MW-LSSVM-RELM), which can take advantage of LSSVM and RLEM under different conditions. e implementation mechanism of this hybrid model is shown in Figure 5.
When using this hybrid model, the computational burden of the model will further increase, but RELM is also a model with extremely high execution efficiency [53]. Although the data in each window requires training of the two models, the time required to train two models is still very short, much less than the acceptable time. e computational efficiency of the hybrid model will be explained later in the simulation experiment section. e operation of this hybrid soft-sensor model is described as follows: Step 1: establish local LSSVM model1 and RELM model2 according to the training samples in the window Step 2: compare the training errors of model1 and model2, and select a model with a smaller training error from these two local models as the current soft-senor model Step 3: predict query samples with the model identified in Step 2 Step 4: when a new sample is added, the window slides according to the specified step size, and then repeat the operations of Steps 1-3

Technique of Grinding and Classifying Process.
Based on the processing of iron ore in a beneficiation plant, this paper studies the grinding and classifying process of iron ore. Xie et al. have introduced the grinding classification process in Complexity details [23], and this article only briefly describes the grinding and classifying process, whose schematic diagram of the grinding and classifying process is shown in Figure 6 [23]. e grinding and classifying process is mainly composed of two parts: the first part is the first stage of closed-circuit grinding process, and the second part is the second stage of closed-circuit grinding process. e iron ore is first sent to Step size  the first ball mill for grinding through a belt. After a period of grinding, the pulp produced by the first ball mill is sent to the spiral classifier to classify the ore. e ore with larger granularity (size of the ore) is sent to the first ball mill for grinding again; the ore with smaller granularity flows out with the overflow product and then enters the second stage ball mill for fine grinding after fine sieve work. e hydrocyclone will separate the ore particles whose granularity meets the production standards. For the particles that do not meet the requirements, they will enter the second stage ball mill to continue grinding until the size of the ore meets the standards. Similar to Ref. [23], this paper still selects 10 easy-to-measure variables as secondary variables for the soft-sensor model and adopts these variables to predict the ore granularity during the grinding process. Information of secondary variables and objective variables are shown in Tables 2 and 3 [23]. Table 3 shows some data of the grinding process soft-sensor modeling. e collection positions of these secondary variables have been indicated in Figure 6.

Data Dimension Reduction.
For the neural network softsensor model, too large dimension of the input variables makes the network topology very large and the training process becomes complicated. At the same time, there may be redundancy between the variables, which will interfere with the prediction process of the neural network, so highdimensional data information needs to be reduced [23,63]. In the case of unstable industrial status, this paper also uses the KPCA dimensionality reduction method used in Ref. [23] to reduce the data in Table 3, and the results are listed in Table 4. e contribution percentage of the first 5 variables in Table 4

Simulation Experiments.
In this paper, five input variables after KPCA processing are used to predict the content of ore granularity to 200 mesh during the grinding and classifying process. is paper collected 1,700 samples from an existing database, of which 1500 were used as training samples and 200 were used as test samples. In order to make it clear that the prediction accuracy of the soft-sensor models proposed in this paper is higher than the prediction accuracy of other models, three quantitative indicators were selected to describe the prediction accuracy of the soft-sensor model, whose calculation methods of these three quantitative indicators are listed in Table 5. Many scholars use these three quantitative indicators to evaluate the predictive performance of the model [24,25,29,30]. In Table 5, n represents the number of test samples, y i represents the predicted value of the objective variables, y i represents the true value of the objective variables, and y represents the average of all objective variables. e smaller MAE and RMSE, the higher the prediction accuracy of the model. e larger R 2 , the stronger the fitting ability of the soft-sensor model.

GSHHO Algorithm to Optimize LSSVM Model.
In order to study the optimization capability of GSHHO algorithm for LSSVM with different kernel functions, this paper uses GSHHO algorithm to optimize the four different LSSVMs listed in Table 1.
e traditional three-layer crossvalidation method (CV) and HHO-optimized LSSVM were used as comparative experimental results to prove the superiority of GSHHO algorithm in this paper. e population of GSHHO algorithm and HHO algorithm is set to 30 and the number of iterations is set to 100. Figures 7-10 show the results that these three methods optimize the LSSVM parameters, where (a) of each figure represents the prediction results of the soft-sensor model, and (b) represents the prediction error of the soft-sensor model. In order    Figure 11. e quantitative indicator results of ore granularity predicted by CV-LSSVM, GSHHO-LSSVM, and HHO-LSSVM are listed in Table 6 to further illustrate the prediction accuracy of different soft-sensor models. It can be seen from the experimental results in these figures and tables that LSSVM with different kernel functions show different prediction capabilities. From Figures 8   and 9 and Table 6, it can be seen that compared with traditional crossvalidation, the optimization capabilities of GSHHO algorithm and HHO algorithm for LSSVM with linear functions and ploy function do not show great advantages, the prediction results of these three methods are not ideal, and it is impossible to perform real-time monitoring of the particle size of the ore. From Figure 10 and Table 6, for LSSVM with sigmoi d function, the optimization capabilities of GSHHO algorithm are significantly better than HHO and CV. Compared with traditional HHO algorithm, GSHHO algorithm has both strong exploration Complexity 13 and exploitation. It can be seen from the convergence curve of Figure 11(d) that GSHHO can find more suitable minimum value for the error function in equation (28), which makes prediction accuracy of GSHHO-LSSVM higher. However, compared with the LSSVM with rbf function, the prediction result of LSSVM with sigmoi d function is still not optimal. Combining the quantitative indicators in Table 5 and the results in Figures 9 and 10, it can be seen that the prediction result of LSSVM with rbf optimized by CV is better than with sigmoi d optimized by GSHHO algorithm. Looking further at the results in Figure 10 and Table 6, it can be seen that compared with HHO-LSSVM and CV-LSSVM, GSHHO-LSSVM with rbf function shows better performance. By the way, in the convergence curve of Figure 11, when GSHHO algorithm optimizes the error function, the convergence speed is faster and the convergence accuracy is higher. Table 7 shows the parameter values found by GSHHO algorithm and HHO algorithm when optimizing different soft-sensor models. Combining all the above conclusions, it can be concluded that GSHHO algorithm has a great advantage in optimizing the LSSVM model, and the optimization result of the LSSVM model with rbf function is the best.

Simulation Experiments of Regularized Extreme
Learning Machine Model. As explained in Section 2.2, RELM is a neural network model with strong generalization ability and high implementation efficiency. is section studies the prediction accuracy of RELM for ore granularity. GSHHO-LSSVM with rbf function has been proved to be a prediction model with good prediction accuracy, so this GSHHO-LSSVM will be compared with the prediction result of the RELM model. e traditional ELM experimental results will be used as a comparison of the RELM model. e number of hidden layer neurons in the ELM model and RELM model is 45.sigma function is used as the kernel function of ELM and RELM, and regularized parameter 1/λ of RELM is 0.001. In the case of stable industrial status, the GSLBH-ELM proposed by Xie et al. [23] has been proven to be an effective soft-sensor model. When predicting the ore granularity, GSLBH-ELM with arctan function has the highest prediction accuracy. is paper will also use GSLBH-ELM with arctan function to predict the ore granularity when this industrial status is constantly changing, and the results obtained will be compared with the experimental results of RELM, ELM, and LSSVM. e experimental results of these four models are shown in Figure 12 and Table 8.
It can be seen from the experimental results that the GSLBH-ELM model does not show good prediction ability. e main reason is that the number of hidden layer neurons selected by GSLBH-ELM is too small. e small number of neurons does not reflect the dynamic industrial characteristics of this experiment. So, GSLBH algorithm cannot find more reasonable weights and biases. If the number of neurons in the GSLBH-ELM is increased, the parameters that the algorithm needs to be optimized will increase geometrically and the time required to train the model will be too long, which is contrary to the efficiency of the softsensor model. erefore, when the industrial process changes dynamically, GSLBH-ELM is not suitable for predicting the ore granularity. Combining the results shown in Figure 12 and Table 8, the prediction performance of RELM is slightly better than ELM, especially in Figure 12(c), and it can be clearly seen that the prediction value of RELM is more accurate than ELM at some moments. However, the prediction accuracy of GSHHO-LSSVM is still higher than that of RELM and ELM.
In order to study the ability of the neural network model trained with the extended Kalman filter (KNN) to predict the granularity in the grinding and classifying process, this paper compares the prediction results of KNN with RELM and LSSVM. In this experiment, the structural parameters of the KNN model and the hyperparameters of the Kalman filter method are shown in Table 9, where I represents the identity matrix, the size of the matrix P is the square of the number of model parameters, and std � 0.01, the size of the matrix Q is the square of the number of output. e experimental results are shown in Figure 13, and the quantitative indicators of the prediction results are shown in Table 8. As can be seen from the results in Figure 13 and  Table 8, compared with RELM and LSSVM, KNN does not show a large advantage, and its prediction accuracy is better than RELM, but still worse than LSSVM.

Simulation Experiment of Hybrid Soft-Sensor Model
Based on Moving Window Strategy. Under the circumstances that the industrial status is changing at any time, this paper introduces moving window strategy into the hybrid soft-sensor model to improve the prediction accuracy of the model. e operation of this hybrid model has been described in details in Section 4.2. e window size in this experiment is 90, and the step size is 15. e LSSVM in the hybrid model uses the best performing model (the kernel function is rbf kernel, and the structural parameters are those found by GSHHO in Table 6 Tables 10 and 11.
From the experimental results in Figure 14 and Table 10, it can be seen that when building a local soft-sensing model, the MW-ELM model is very prone to overfitting due to the small number of training samples, there are some abnormal points in the test results of MW-ELM, and the prediction accuracy of the model is greatly reduced, even worse than the ELM prediction results in Section 5.3.3. MW-RELM overcomes the shortcomings of MW-ELM, and the introduction of regularization parameters 1/λ makes the model avoid the interference of noise and greatly improves the generalization ability of the model. Because MW-RELM can adaptively adjust the structural parameters of the model according to changes in the working conditions, compared with the single model in Section 5.3.3, the prediction accuracy of MW-RELM has been greatly improved. From the results of Figure 15 and Table 11, it can be seen that the prediction results of MW-LSSVM are slightly better than MW-RELM. e hybrid model    18 Complexity  Step of update weight Initial weight covariance P Data covariance R Process covariance Q achieve the purpose of online prediction of the ore granularity in a dynamic industrial environment.
In order to find out which model is selected every time the window slides, this article shows the model switching process in Figure 16, where "1" indicates that the LSSVM model is used as the local soft-sensor model and "− 1" indicates that the RELM model is used as the local softsensing model. It can be seen from the results in Figure 16 that the LSSVM model is used as the local soft-sensing model is the majority, and the role of the RELM model is to "revise" when LSSVM performs poorly. In order to explore the working efficiency of the MW-LSSVM-RELM model, the training time required for the MW-LSSVM-RELM model to predict a new query sample is listed in Table 12. e results in Table 12 show that although the training time of MW-LSSVM-RELM is longer than that of MW-RELM and MW-LSSVM, its training time is still very short and the model is extremely efficient, much less than the acceptable training time.
Based on the above experimental results, it can be seen that the MW-LSSVM-RELM model has better prediction ability than other single soft-senor models and models that introduce the MW strategy into a single soft-sensor model.        Complexity accuracy and higher work efficiency and can achieve online prediction of ore granularity. It can be seen from the experimental results in Figure 13 and Table 9 that after the extended Kalman filtering method is used to adjust the parameters of the neural network, the neural network has good generalization ability, and its prediction accuracy is even better than RELM. In this section, in order to study the influence of moving window strategy on KNN prediction accuracy, this paper introduces moving window strategy into KNN. e window size and sliding steps are the same as MW-LSSVM-ELM. In order to improve the working efficiency of the model, the number of training iterations is set to 100, and other parameters remain unchanged.
e experimental results of MW-KNN are shown in Figure 17 and Table 11. It can be seen from the results that the prediction accuracy of KNN has been greatly improved after the introduction of the sliding window strategy. Prediction accuracy of KNN is better than MW-RELM and MW-LSSVM, but still not as good as MW-LSSVM-ELM. Compared with MW-LSSVM-ELM, MW-KNN has a major disadvantage, that is, it is less efficient. From the results in Table 12, it can be seen that although the number of iterations of KNN changes from 200 to 100, the time for MW-KNN to predict a test sample is still much longer than MW-LSSVM-ELM. From two aspects of work efficiency and prediction accuracy, the performance of the MW-LSSVM-ELM model is better than that of MW-KNN.
Combining all the above results, it can be concluded that the predictive ability of the MW-LSSVM-ELM model is superior to ELM, RELM, LSSVM, and KNN, even if these four models introduce moving window strategy, the prediction accuracy of MW-LSSVM-ELM is still better than them, and MW-LSSVM-ELM is more efficient than other models. Complexity GSHHO algorithm, and also introduces the optimization process of LSHVM by GSHHO. In order to improve the adaptive ability of soft-sensor models, this paper adopts MW strategy into the LSSVM model and the RELM model to revise the structural parameters of the models. Based on MW technology, this paper proposes the MW-LSSVM-RELM model to further improve the prediction accuracy of the models. e following conclusions are drawn through the final simulation experiments results:

Conclusions
(1) Compared with the traditional CV method and HHO algorithm, GSHHO algorithm can find more suitable structural parameters for LSSVM, which makes the prediction accuracy of LSSVM higher (2) e introduction of the MW strategy into the LSSVM model, the RELM model, and KNN model can greatly improve the ability of models to respond to changes of the industrial status and make these models maintain good generalization ability in different states (3) e MW-LSSVM-RELM model proposed in this paper combines the advantages of the LSSVM model and the RELM model in different industrial status, which can make this hybrid model have better prediction capabilities than a single soft-sensor model In the database of the grinding and classifying process, there are many samples where data are missing for various reasons. Conventional soft-sensor models cannot take advantage of this data-missing sample, which contains a lot of key information that can characterize the state of the industry. Abandoning these samples means losing a lot of useful information and wasting data resources. Based on the existing research work, this team will focus on deep neural network models because deep neural networks can better describe the strong nonlinear and dynamic changes of industrial processes, and the team will study the semisupervised model of the existing soft-sensor model so that these data-missing samples can be used for high-quality online prediction of key variables in the grinding and classification process.

Data Availability
ere are no data available for this paper.

Conflicts of Interest
e authors declare that there are no conflicts of interests regarding the publication of this article.

Authors' Contributions
Wei Xie participated in the algorithm simulation and draft writing. Jie-Sheng Wang participated in the concept, design, interpretation, and commented on the manuscript and the critical revision of this paper. Cheng Xing, Sha-sha Guo, Meng-wei Guo, and Ling-feng Zhu participated in the data collection and analysis of the paper.