Improving the Solution of Least Squares Support Vector Machines with Application to a Blast Furnace System

The solution of least squares support vector machines (cid:2) LS-SVMs (cid:3) is characterized by a speciﬁc linear system, that is, a saddle point system. Approaches for its numerical solutions such as conjugate methods Sykens and Vandewalle (cid:2) 1999 (cid:3) and null space methods Chu et al. (cid:2) 2005 (cid:3) have been proposed. To speed up the solution of LS-SVM, this paper employs the minimal residual (cid:2) MINRES (cid:3) method to solve the above saddle point system directly. Theoretical analysis indicates that the MINRES method is more e ﬃ cient than the conjugate gradient method and the null space method for solving the saddle point system. Experiments on benchmark data sets show that compared with mainstream algorithms for LS-SVM, the proposed approach signiﬁcantly reduces the training time and keeps comparable accuracy. To heel, the LS-SVM based on MINRES method is used to track a practical problem originated from blast furnace iron-making process: changing trend prediction of silicon content in hot metal. The MINRES method-based LS-SVM can e ﬀ ectively perform feature reduction and model selection simultaneously, so it is a practical tool for the silicon trend prediction task.


Introduction
As one kernel method, SVM works by embedding the input data x, z ∈ X into a Hilbert space H by a high-dimensional mapping Φ • , and then trying to find a linear relation among the high-dimensional embedded data points 1, 2 .This process is implicitly performed by specifying a kernel function which satisfies k x, z Φ x T Φ z , that is, the inner product of the embedded points.Given observed samples {x i , y i } n i 1 with size n, SVM formulates the learning problem as a variational problem of finding a decision function f that minimizes the regularized risk functional 3, 4

Formulation of LS-SVM
The primal problem of LS-SVM can be formulated following unified format: for both regression analysis and pattern classification.In 2.1 n is the total number of training samples, x i is the ith input vector, y i is the ith output value/label for regression/classification problem, e i is the ith error variable, C > 0 is the regularization parameter, and b is the bias term.The Lagrangian of 2.1 is given below: where α i is the ith Lagrange multiplier.For the convex program 2.1 , it is obvious that the Slater constraint qualification holds.Therefore, the optimal solution of 2.1 satisfies its Karush-Kuhn-Tucker system ∂L ∂e i 0 −→ α i Ce i , i 1, . . ., n, ∂L ∂α i 0 −→ y i w T Φ x i b e i , i 1, . . ., n.

2.3
After eliminating variables w and e the Karush-Kuhn-Tucker system 2.3 can be reformulated following saddle point system 10 : where K ij : k x i , x j Φ x i T Φ x j , I stands for unit matrix, 1 n denotes an n-dimensional vector of all ones, and y y 1 , . . ., y n T .

Solution of LS-SVM
In this section, we give a brief review and some analysis of the three mentioned numerical algorithms for solution of LS-SVM.

Conjugate Gradient Methods
The kernel matrix K is a symmetric positive semidefinite matrix and the diagonal term 1/C is positive, so the matrix H : K 1/C I is symmetric and positive definite.Through the following matrix transformation where the saddle point system 2.4 can be factorized into a positive definite system 11 Suykens et al. suggested the use of the CG method for the solution of 3.3 and proposed to solve two n order positive definite systems.More exactly, their algorithm can be described as follows.
Step 1. Employ the CG algorithm to solve the linear equations Hη 1 n and get the intermediate variable η.
Step 2. Solve the intermediate variable μ from Hμ y by the CG method.
Step 3. Obtain Lagrange dual variables α μ − bη and bias term b 1 T n μ/1 T n η.The output of any new data x can subsequently be deduced by computing the decision function

Null Space Methods
In what was mentioned previously, to get the intermediate variable η and μ two n order positive definite systems need to be solved by CG methods.Chu et al. 8 proposed an interesting method to the numerical solution of LS-SVM by solving one n − 1 order reduced system of linear equations.The improved method suggested by Chu et al. can be seen as one kind of null space method.The saddle point system 2.4 can be written as Hα 1 n b y, 1 n α 0.

3.4
Chu et al. specified a particular solution of 1 n α 0 as α 0 and the null space of 1 n α 0 as

3.5
Through solving the following reduced system of order n − 1 for the auxiliary unknown ν, the solution of the saddle point system 2.4 can be obtained as α Zν and b 1/n 1 T n y − Hα .

Minimal Residual Methods
The vector sequences in the CG method correspond to a factorization of a tridiagonal matrix similar to the coefficient matrix.Therefore, a breakdown of the algorithm can occur corresponding to a zero pivot if the matrix is indefinite.Furthermore, for indefinite matrices the minimization property of the CG method is no longer well defined.The MINRES method proposed by Paige and Saunders 9 is a variant of the CG method that avoids the LU factorization and does not suffer from breakdown.It minimizes the residual in the 2 -norm which is an efficient numerical algorithm for solving symmetric but indefinite systems; the corresponding convergence behavior of the MINRES method for indefinite systems has been analyzed by Van der Vorst 12 .The purpose of this paper is to employ the MINRES method to solve the saddle point system 2.4 directly.Next we gave a brief review of the MINRES algorithm.Let x 0 be an initial guess for the solution of the symmetric indefinite linear system Ax b.One can obtain the iterative sequence x m , m 1, 2, . . .such that where r m b − Ax m is the mth residual for m 1, 2, . .., and K m A, r 0 span r 0 , Ar 0 , . . ., A m−1 r 0 3.8 is the mth Krylov subspace.Lanczos methods can be used to generate an orthonormal basis of K m A, r 0 , and then only two basis vectors are needed to compute x m ; see, for example, 12 .The detailed implementation of the MINRES algorithm can be found in 12 .
It has been shown that rounding errors are propagated to the approximate solution with a factor proportional to the square of the condition number of coefficient matrix 12 ; one should be careful with the MINRES method for ill-conditioned systems.

Some Analysis on These Three Numerical Algorithms
The properties of short recurrences and optimization 12 make the CG method the first choice for the solution of a symmetric positive definite system.Suykens et al. transformed the n 1 order saddle point system 2.4 into two n order positive definite systems which are solved by CG methods.However, it is time consuming to solve two n order positive definite systems with large scales.To overcome this shortcoming, Chu et al. 8 transformed equivalently the original n 1 order system into an n order symmetric positive definite system, and then the CG method can be used.This method can be seen as a null space method.Unfortunately, the transformation may destroy heavily the sparse structure and increase greatly the condition number of the original system.This can hugely slow down the convergence rate of the CG algorithm.Theoretical analysis about the influence of the transformation on the condition number is indispensable, but it is rather difficult.We leave it as an open problem.In this paper, the MINRES method is directly applied to solve the original saddle point problem of n 1 order.Similar to the CG method, the MINRES method also has properties of short recurrences and optimization.
In light of the analysis mentioned above, the MINRES method should be the first choice for the solution of LS-SVM model, since it avoids solving two linear systems and destroying the sparse structure of the original saddle point system simultaneously.

Experiments on Benchmark Data Sets
In this section we give the experimental test results on the accuracy and efficiency of our method.For comparison purpose, we implement the CG method proposed by Suykens and Vandewalle 6 and the null space method suggested by Chu et al. 8 .All experiments are implemented with MATLAB version 7.8 programming environment running on an IBM compatible PC under Window XP operating system, which is configured with Intel Core 2.1 Ghz CPU and 2 G RAM.The generalized used Gaussian RBF kernel k x, z exp − x − z 2 /σ 2 is selected as the kernel function.We use the default setting for kernel width σ 2 , that is, set kernel width as the dimension of inputs.
We first compare three algorithms on three benchmark data sets: Boston, Concrete, and Abalone, which are download from UCI 13 .Each data set is randomly partitioned into 70% training and 30% test sets.We also list the condition numbers of coefficients matrices solved by three methods for the analysis of the computing efficiencies.As shown in Tables 1-3 the condition number for the CG method is the least one and the condition number for the null space method significantly increases.
The columns of Cond in Tables 1, 2, and 3 show that compared with the CG method the condition number for the MINRES method increases a bit, but much less than the condition number of the null space method.The orders of linear equations solved by the CG method, the null space method, and the MINRES method are n, n − 1, and n 1, respectively.The condition numbers for the CG method and the MINRES method are very close, but we have to solve two systems of n − 1 order using CG methods.Hence, the running time of the MINRES method should be less than that of the CG method.CPU column in Tables 1-3 shows that the MINRES method-based LS-SVM model costs much less running time than the CG method and the null space method-based LS-SVM model in all cases of setting C.So the MINRES method-based LS-SVM model is a preferable algorithm for solving LS-SVM model.In the next subsection, we will employ the MINRES method-based LS-SVM model to solve a practical problem.

Application on Blast Furnace System
Blast furnace, one kind of metallurgical reactor used for producing pig iron, is often called hot metal.
The chemical reactions and heat transport phenomena take place throughout the furnace as the solid materials move downwards and hot combustion gases flow upwards.The main principle involved in the BF iron-making process is the thermochemical reduction of iron oxide ore by carbon monoxide.During the iron-making period, a great deal of heat energy is produced which can heat up the BF temperature approaching 2000 • C. The end products consisting of slag and hot metal sink to the bottom and are tapped periodically for the subsequent refining.It will take about 6-8 h for a cycle of iron-making 11 .BF iron-making process is a highly complex nonlinear process with the characteristics of high temperature, high pressure, concurrence of transport phenomena, and chemical reactions.The complexity of the BF and the occurrence of a variety of process disturbances have been obstacles for the adoption of modeling and control in the process.Generally speaking, to control a BF system often means to control the hot metal temperature and components, such as silicon content, sulfur content in hot metal, and carbon content in hot metal within acceptable bounds.Among these indicators, the silicon content often acts as a chief indicator to represent the thermal state of the BF, an increasing silicon content meaning a heating of the BF while a decreasing silicon content indicating a cooling of the BF 11, 14 .Thus, the silicon content is a reliable measure of the thermal state of the BF, and it becomes a key stage to predict the silicon content for regulating the thermal state of the BF.Therefore, it has been the active research issue to build silicon prediction model in the recent decades, including numerical prediction models 15 and trend prediction models 11 .
In this subsection, the tendency prediction of silicon content in hot metal is transformed as a binary classification problem.Samples with increasing silicon content are  denoted by 1 whereas a decreasing silicon content is denoted by −1.In the present work, the experimental data is collected from a medium-sized BF with the inner volume of about 2500 m 3 .The variables closely related to the silicon content are measured as the candidate inputs for modeling.Table 4 presents the variables information from the studied BF.There are totally 801 data points collected with the first 601 points as train set and the residual 200 points as testing set.The sampling interval is about 1.5 h for the current BF. Figure 1 illustrates the evolution of the silicon content in hot metal.There are in total 15 candidate variables listed in Table 4 from which to select model inputs.Generally, too many input parameters will increase the complexity of model while too little inputs will reduce the accuracy of model.A tradeoff has to be taken between the model complexity and accuracy when selecting the inputs.Therefore, it is necessary to screen out less important variables as inputs from these 15 candidate variables.Here, the inputs are screened out by an integrative way that combines F-score method 16 for variables ranking and cross-validation method for variables and model parameters selection.

Journal of Applied Mathematics
F-score is an effective tool for feature selection in data mining and can give feature ranking by evaluating the discrimination of two sets with real values.For those 15 candidate variables in Table 4, their F-scores are defined as follows:  4 gives the results of F-scores of all 15 variables, which are ranked according to the F-score values.As one kernel-based learning model, the kernel parameter σ 2 , and regularized parameter C play an important role in LS-SVM, so one should pay attention to selecting proper parameters.Grid search-based ten-fold cross-validation is executed on the train set for searching the optimal σ 2 ,C .The searching grid for model parameters is set as 2 −5 , 2 −4 , . . ., 2 10 × 2 −5 , 2 −4 , . . ., 2 10 .

4.2
Mean accuracy in Table 4 stands for the average accuracy under ten-fold crossvalidation experiments of LS-SVM model on some grid points with the best performance.In the current work, we first select the variable with highest F-score as model input and then add variables one by one according to their F-scores.Mean accuracy under all kinds of input variables can be achieved and the results are shown in Table 4.The following are shown by the mean accuracy column: 1 at the beginning, the mean accuracy increases gradually as more candidate variables are taken as model inputs; 2 the largest mean accuracy appears when CO 2 is included within the input set; 3 when the mean accuracy is beyond the maximum, it will fluctuate as the residual variables are added by turns into the input set.These results indicate that, as the studied BF is concerned, the optimal input set is Si, S, BI, FS, BV, CO 2 with the model parameters setting σ 2 , C 2 9 , 2 8 .Table 5 lists the LS-SVM model accuracy including with/without feature and model selection versions on testing set.In the case of without feature and model selection version, all candidate variables are selected as inputs, and we use the default setting for LS-SVM model; that is, set kernel width σ 2 equal to the dimension of input variable and set regularized parameter C as 1.The information in the second row of this table, such as 34/42, denotes that there are 42 times predicted results that are ascending trend, and 34 times predictions are successful.The confidence level of the LS-SVM model without model and feature selection fluctuates severely between the ascending and descending prediction from 80.95% to 58.86%.The difference of confidence levels of LS-SVM model with model and feature selection between ascending and descending prediction is reduced to 2.19% indicating that model and feature selection procedure enhances the stability of the LS-SVM model obviously.As the last column of Table 5 shows, TSA of LS-SVM model with feature and model selection procedure is significantly improved compared with LS-SVM model without feature and model selection, so the selection procedure is indispensable for the current practical application.mentioned numerical algorithms when performing feature and model selection procedure.The cost time of the MINRES method is reduced significantly compared with the other algorithms.In a word, the feature and model selection procedure can be effectively performed for the MINRES method-based LS-SVM, and it is meaningful for practical using.

Conclusions and Points of Possible Future Research
In this paper, we have proposed an alternative, that is, the MINRES method, to the solution of LS-SVM model which is formulated as a saddle point system.Numerical experiments on UCI benchmark data sets show that the proposed numerical solution method of LS-SVM model is more efficient than the algorithms proposed by Suykens and Vandewalle 6 and Chu et al. 8 .To heel, the MINRES method-based LS-SVM model including feature selection from extensive candidate and model parameter selection is proposed and employed for the silicon content trend prediction task.The practical application to a typical real BF indicates that the proposed MINRES method-based LS-SVM model is a good candidate to predict the trend of silicon content in BF hot metal with low running time.However, it should be pointed out that despite the MINRES method-based LS-SVM model displaying low running time, lack of metallurgical information may be the root to the limited accuracy of the current prediction model.So there is much work worth investigating in the future to further improve the model accuracy and increase the model transparency, such as constructing predictive model by integrating domain knowledge and extracting rules.The extracted rules can account for the output results with detailed and definite inputs information, which may further serve for the control purpose by linking the output results with controlled variables.These investigations are deemed to be helpful to further improve the efficiency of predictive model.
. y i w T Φ x i b e i , i 1, . . ., n, 2.1

Figure 1 :
Figure 1: Evolution of silicon content in hot metal.

4 . 1 where x i , x i and x i −
stand for the mean of the ith attribute of the whole training, positive and negative examples, respectively, while x i j, and x i j,− are the ith variable of the jth positive and negative instance, respectively.Hence, a variable ranking can be achieved through Fscore method.Table

Table 1 :
Experimental results of three methods on Boston data set.
Cond † denotes the condition number, CPU ‡ stands for running time, MSE * is mean square error.

Table 2 :
Experimental results of three methods on Concrete data set.

Table 3 :
Experimental results of three methods on Abalone data set.

Table 4 :
A list of input variables.
Table 6 lists the running time of three

Table 5 :
Predictive results of LS-SVM model with/without feature and model selection.means 99 observations are ascending trend; TSA † stands for testing set accuracy. *

Table 6 :
Running time of three numerical methods on model identification.