The solution of least squares support vector machines (LS-SVMs) is characterized by a specific linear system, that is, a saddle point system. Approaches for its numerical solutions such as conjugate methods Sykens and Vandewalle (1999) and null space methods Chu et al. (2005) have been proposed. To speed up the solution of LS-SVM, this paper employs the minimal residual (MINRES) method to solve the above saddle point system directly. Theoretical analysis indicates that the MINRES method is more efficient than the conjugate gradient method and the null space method for solving the saddle point system. Experiments on benchmark data sets show that compared with mainstream algorithms for LS-SVM, the proposed approach significantly reduces the training time and keeps comparable accuracy. To heel, the LS-SVM based on MINRES method is used to track a practical problem originated from blast furnace iron-making process: changing trend prediction of silicon content in hot metal. The MINRES method-based LS-SVM can effectively perform feature reduction and model selection simultaneously, so it is a practical tool for the silicon trend prediction task.
As one kernel method, SVM works by embedding the input data
The primal problem of LS-SVM can be formulated following unified format:
In this section, we give a brief review and some analysis of the three mentioned numerical algorithms for solution of LS-SVM.
The kernel matrix
Suykens et al. suggested the use of the CG method for the solution of (
Employ the CG algorithm to solve the linear equations
Solve the intermediate variable
Obtain Lagrange dual variables
The output of any new data
In what was mentioned previously, to get the intermediate variable
The vector sequences in the CG method correspond to a factorization of a tridiagonal matrix similar to the coefficient matrix. Therefore, a breakdown of the algorithm can occur corresponding to a zero pivot if the matrix is indefinite. Furthermore, for indefinite matrices the minimization property of the CG method is no longer well defined. The MINRES method proposed by Paige and Saunders [
It has been shown that rounding errors are propagated to the approximate solution with a factor proportional to the square of the condition number of coefficient matrix [
The properties of short recurrences and optimization [
In light of the analysis mentioned above, the MINRES method should be the first choice for the solution of LS-SVM model, since it avoids solving two linear systems and destroying the sparse structure of the original saddle point system simultaneously.
In this section we give the experimental test results on the accuracy and efficiency of our method. For comparison purpose, we implement the CG method proposed by Suykens and Vandewalle [
We first compare three algorithms on three benchmark data sets: Boston, Concrete, and Abalone, which are download from UCI [
Experimental results of three methods on Boston data set.
Boston data set, 506 samples, 13-d inputs, |
|||||||||
---|---|---|---|---|---|---|---|---|---|
|
Conjugate gradient method | Null space method | MINRES method | ||||||
Cond† | CPU‡ | MSE* | Cond | CPU | MSE | Cond | CPU | MSE | |
|
4 | 0.3281 | 49.1027 | 366.9451 | 0.8438 | 49.1027 | 45.6283 | 0.2500 | 49.1027 |
|
8 | 0.4688 | 39.4132 | 369.0926 | 0.6250 | 39.4132 | 31.6150 | 0.3438 | 39.4132 |
|
15 | 0.3438 | 29.7686 | 388.1770 | 0.7656 | 29.7686 | 26.3388 | 0.3125 | 29.7686 |
|
28 | 0.4531 | 24.2532 | 460.1920 | 0.7656 | 24.2532 | 29.2813 | 0.3281 | 24.2532 |
|
60 | 0.2500 | 21.0322 | 474.6254 | 1.0625 | 21.0322 | 61.3493 | 0.4219 | 21.0322 |
0 | 116 | 0.3438 | 15.5875 | 566.2504 | 1.2500 | 15.5875 | 119.071 | 0.1875 | 15.5875 |
1 | 234 | 0.7188 | 13.6449 | 946.4564 | 1.1250 | 13.6449 | 239.374 | 0.4375 | 13.6449 |
2 | 472 | 0.9531 | 13.0252 | 1945.300 | 1.0625 | 13.0252 | 482.447 | 0.6875 | 13.0252 |
3 | 924 | 0.9375 | 10.9810 | 2244.342 | 1.4063 | 10.9810 | 944.042 | 0.6406 | 10.9810 |
4 | 1734 | 1.3594 | 10.3168 | 5229.460 | 1.2500 | 10.3168 | 1776.31 | 0.8906 | 10.3168 |
5 | 3801 | 1.5469 | 10.2063 | 10785.92 | 1.4844 | 10.2063 | 3876.97 | 1.1406 | 10.2063 |
6 | 7530 | 2.0469 | 11.3937 | 24998.71 | 1.9063 | 11.3937 | 7682.07 | 1.2969 | 11.3937 |
7 | 14618 | 2.4531 | 11.7750 | 47781.41 | 2.2188 | 11.7750 | 14932.2 | 1.6875 | 11.7750 |
8 | 29769 | 3.0625 | 12.9925 | 61351.85 | 2.9844 | 12.9925 | 30382.8 | 2.3750 | 12.9925 |
9 | 58387 | 3.4063 | 14.0194 | 101181.8 | 3.5938 | 14.0194 | 59619.0 | 2.6875 | 14.0194 |
10 | 119285 | 4.0313 | 17.2330 | 285440.0 | 4.8281 | 17.2330 | 121708 | 3.5313 | 17.2330 |
Cond† denotes the condition number, CPU‡ stands for running time, MSE* is mean square error.
Experimental results of three methods on Concrete data set.
Concrete data set, 1030 samples, 8-d inputs, |
|||||||||
---|---|---|---|---|---|---|---|---|---|
|
Conjugate gradient method | Null space method | MINRES method | ||||||
Cond | CPU | MSE | Cond | CPU | MSE | Cond | CPU | MSE | |
|
7 | 2.0781 | 140.8498 | 738.137223 | 3.6719 | 140.8498 | 51.5204280 | 1.6406 | 140.8498 |
|
13 | 2.3594 | 111.2384 | 745.054714 | 3.6563 | 111.2384 | 39.7005872 | 1.8906 | 111.2383 |
|
25 | 2.8125 | 89.3458 | 796.627246 | 3.8281 | 89.3458 | 31.7895802 | 2.0938 | 89.3459 |
|
50 | 2.4844 | 74.4146 | 850.938881 | 3.9688 | 74.4146 | 51.4318149 | 2.0469 | 74.4146 |
|
102 | 3.0000 | 60.2984 | 954.604170 | 4.3906 | 60.2984 | 104.293122 | 2.2969 | 60.2984 |
0 | 199 | 3.4219 | 50.4491 | 1397.13474 | 4.8438 | 50.4491 | 202.202543 | 2.8281 | 50.4490 |
1 | 399 | 4.0625 | 43.5416 | 1737.97400 | 5.7188 | 43.5416 | 406.110983 | 3.2500 | 43.5416 |
2 | 787 | 4.8750 | 41.5463 | 2369.65769 | 6.4219 | 41.5463 | 799.643656 | 3.8594 | 41.5463 |
3 | 1561 | 6.3125 | 36.5797 | 5375.87469 | 7.3750 | 36.5797 | 1586.70628 | 4.2500 | 36.5797 |
4 | 3197 | 8.0000 | 33.4861 | 7342.75323 | 8.7188 | 33.4861 | 3247.47638 | 5.0156 | 33.4861 |
5 | 6411 | 10.3281 | 33.1452 | 18274.6591 | 10.8438 | 33.1452 | 6510.73913 | 6.2188 | 33.1452 |
6 | 12530 | 13.8750 | 33.4936 | 37192.6189 | 12.9063 | 33.4936 | 12732.7611 | 8.2813 | 33.4936 |
7 | 25614 | 18.5781 | 33.8690 | 73008.8645 | 15.8594 | 33.8690 | 26010.1838 | 11.0156 | 33.8690 |
8 | 51260 | 25.1250 | 32.6925 | 126475.189 | 19.5938 | 32.6925 | 52056.7280 | 14.8906 | 32.6925 |
9 | 101053 | 33.9531 | 35.1044 | 249234.605 | 25.2969 | 35.1044 | 102657.615 | 19.9219 | 35.1043 |
10 | 199734 | 46.3125 | 40.4777 | 557864.123 | 32.9219 | 40.4777 | 202961.396 | 27.0625 | 40.4777 |
Experimental results of three methods on Abalone data set.
Abalone data set, 4177 samples, 7-d inputs, |
|||||||||
---|---|---|---|---|---|---|---|---|---|
|
Conjugate gradient method | Null space method | MINRES method | ||||||
Cond | CPU | MSE | Cond | CPU | MSE | Cond | CPU | MSE | |
|
42.341 | 20.343 | 5.3623 | 2955.9655 | 39.7344 | 5.3623 | 369.9433 | 12.9531 | 5.3623 |
|
84.059 | 22.984 | 5.1143 | 3028.4495 | 41.1406 | 5.1143 | 332.0862 | 14.2344 | 5.1143 |
|
167.846 | 26.343 | 4.7978 | 3043.4615 | 42.6719 | 4.7978 | 306.5691 | 16.0156 | 4.7978 |
|
337.691 | 33.281 | 4.6923 | 3567.1431 | 46.8125 | 4.6923 | 338.1679 | 18.4688 | 4.6923 |
|
666.823 | 39.315 | 4.4360 | 4888.3227 | 50.8438 | 4.4360 | 667.7842 | 22.3125 | 4.4360 |
0 | 1327.351 | 47.531 | 4.4744 | 5355.5805 | 54.6563 | 4.4744 | 1329.291 | 26.2656 | 4.4744 |
1 | 2700.547 | 58.015 | 4.4217 | 9894.2450 | 59.8438 | 4.4217 | 2704.345 | 34.1719 | 4.4217 |
2 | 5275.703 | 74.859 | 4.3948 | 8239.2388 | 69.2031 | 4.3948 | 5283.506 | 42.5469 | 4.3948 |
3 | 10709.216 | 94.765 | 4.4169 | 18279.897 | 80.4219 | 4.4169 | 10724.46 | 54.7813 | 4.4169 |
4 | 21357.750 | 124.359 | 4.5053 | 24472.420 | 97.7500 | 4.5053 | 21388.43 | 71.3906 | 4.5053 |
5 | 42427.822 | 177.171 | 4.6144 | 105161.60 | 133.2656 | 4.6144 | 42489.70 | 103.6406 | 4.6144 |
6 | 85153.757 | 221.468 | 4.6857 | 185913.18 | 155.9219 | 4.6857 | 85276.97 | 129.3750 | 4.6857 |
7 | 171369.064 | 312.078 | 4.7145 | 212162.90 | 212.4531 | 4.7145 | 171614.1 | 181.8750 | 4.7145 |
8 | 344731.082 | 430.640 | 4.8621 | 705659.56 | 289.4531 | 4.8621 | 345216.0 | 260.5469 | 4.8621 |
9 | 681509.920 | 602.765 | 5.2294 | 1162595.7 | 395.6250 | 5.2294 | 682494.5 | 360.5625 | 5.2294 |
10 | 1363883.053 | 840.625 | 5.6517 | 3106655.0 | 549.4844 | 5.6517 | 1365853 | 488.6250 | 5.6517 |
The columns of Cond in Tables
In this subsection, the tendency prediction of silicon content in hot metal is transformed as a binary classification problem. Samples with increasing silicon content are denoted by +1 whereas a decreasing silicon content is denoted by −1. In the present work, the experimental data is collected from a medium-sized BF with the inner volume of about 2500 m3. The variables closely related to the silicon content are measured as the candidate inputs for modeling. Table
A list of input variables.
Variable name [unit] | Abbreviation | Range |
|
Mean accuracy |
---|---|---|---|---|
Latest silicon content (wt%) | Si | 0.13–1.13 | 0.1269 | 81.786% |
Sulfur content (wt%) | S | 0.012–0.077 | 0.0570 | 82.857% |
Basicity of ingredients (wt%) | BI | 0.665–1.609 | 0.0229 | 81.786% |
Feed speed (mm/h) | FS | 16.725–297.510 | 0.0132 | 83.214% |
Blast volume (m3/min) | BV | 1454.30–5580.200 | 0.0054 | 83.747% |
CO2 percentage in top gas (wt%) | CO2 | 7.921–22.892 | 0.0048 | 83.750% |
Pulverized coal injection (ton) | PCI | 0.230–98.533 | 0.0037 | 83.214% |
CO percentage in top gas (wt%) | CO | 9.267–27.374 | 0.0036 | 82.500% |
Blast temperature (°C) | BT | 1086.100–1239.700 | 0.0031 | 83.571% |
Oxygen enrichment percentage (wt%) | OEP | −0.001–14.688 | 0.0019 | 83.393% |
H2 percentage in top gas (wt%) | H2 | 2.564–4.065 | 0.0005 | 83.214% |
Coke load of ingredients (wt%) | CLI | 2.032–5.071 | 0.0004 | 82.857% |
Furnace top temperature (°C) | TP | 62.703–264.130 | 0.0002 | 82.679% |
Blast pressure (kPa) | BP | 59.585–367.780 | 0.0001 | 83.214% |
Furnace top pressure (kPa) | TP | 8.585–199.790 | 0.0001 | 82.679% |
Evolution of silicon content in hot metal.
There are in total 15 candidate variables listed in Table
Mean accuracy in Table
Predictive results of LS-SVM model with/without feature and model selection.
Inputs | ( |
Ascend (99*) | Descend (101) | TSA† |
---|---|---|---|---|
15 | (15, 1) | 34/42 = 80.95% | 93/158 = 58.86% | 127/200 = 63.5% |
6 | (29, 28) | 73/94 = 77.66% | 80/106 = 75.47% | 153/200 = 76.5% |
99* means 99 observations are ascending trend; TSA† stands for testing set accuracy.
Running time of three numerical methods on model identification.
Algorithm | Conjugate gradient method | Null space method | MINRES method |
---|---|---|---|
CPU | 1948 | 2800 | 1488 |
In this paper, we have proposed an alternative, that is, the MINRES method, to the solution of LS-SVM model which is formulated as a saddle point system. Numerical experiments on UCI benchmark data sets show that the proposed numerical solution method of LS-SVM model is more efficient than the algorithms proposed by Suykens and Vandewalle [
However, it should be pointed out that despite the MINRES method-based LS-SVM model displaying low running time, lack of metallurgical information may be the root to the limited accuracy of the current prediction model. So there is much work worth investigating in the future to further improve the model accuracy and increase the model transparency, such as constructing predictive model by integrating domain knowledge and extracting rules. The extracted rules can account for the output results with detailed and definite inputs information, which may further serve for the control purpose by linking the output results with controlled variables. These investigations are deemed to be helpful to further improve the efficiency of predictive model.
This work was partially supported by National Natural Science Foundation of China under Grant no. 11126084, Natural Science Foundation of Shandong Province under Grant no. ZR2011AQ003, Fundamental Research Funds for the Central Universities under Grant no. 12CX04082A, and Public Benefit Technologies R&D Program of Science and Technology Department of Zhejiang Province under Grant No. 2011C31G2010136.