The Forward Least-Squares Approximation (FLSA) SVM is a newly-emerged Least-Square SVM (LS-SVM) whose solution is extremely sparse. The algorithm uses the number of support vectors as the regularization parameter and ensures the linear independency of the support vectors which span the solution. This paper proposed a variant of the FLSA-SVM, namely, Reduced FLSA-SVM which is of reduced computational complexity and memory requirements. The strategy of “contexts inheritance” is introduced to improve the efficiency of tuning the regularization parameter for both the FLSA-SVM and the RFLSA-SVM algorithms. Experimental results on benchmark datasets showed that, compared to the SVM and a number of its variants, the RFLSA-SVM solutions contain a reduced number of support vectors, while maintaining competitive generalization abilities. With respect to the time cost for tuning of the regularize parameter, the RFLSA-SVM algorithm was empirically demonstrated fastest compared to FLSA-SVM, the LS-SVM, and the SVM algorithms.
As with the standard Support Vector Machine (SVM), the Least Squares Support Vector Machine (LS-SVM) optimizes the tradeoff between the model complexity and the squared error loss functional [
A range of algorithms, aiming at easing the nonsparseness of LS-SVM solutions, have been available. Suykens et al. proposed to prune training samples with the minimal Lagrangian multiplier [
Another class of sparse LS-SVM algorithms views each column of the kernel matrix as the output of a specific “basis function” on the training samples. The “basis function” is selected iteratively into the solution. Among them is the kernel matching pursuit algorithm which adopts a squared error loss function [
Unfortunately, the exhaustive search for the optimal basis function at each iteration in the FLSA-SVM is computationally expensive. To tackle this problem, the Reduced FLSA-SVM (RFLSA-SVM) is proposed in which a random selection of the basis functions is adopted. The RFLSA-SVM also has a lower memory requirement since the input Gramian matrix for training is now rectangular, in contrast to the square one of the FLSA-SVM. Compared to the FLSA-SVM, the RFLSA-SVM risks increasing number of support vectors. Nevertheless, the paper empirically proves that the FLSA-SVM and the RFLSA-SVM variant both provide sparse solutions, in comparison to the conventional LS-SVM and the standard SVM, as well as the other sparse SVM algorithms developed upon the idea of “basis functions.”
Further, the technique of “contexts inheritance” is proposed which is another effort to reduce the time complexity for training the proposed RFLSA-SVM and the FLSA-SVM. Taking the RFLSA-SVM algorithm, for example, “contexts inheritance” takes advantages of the connection between any two RFLSA-SVMs whose kernel functions are identical, but value settings on the regularization parameter are different. The intermediate variables, the by-products of training the RFLSA-SVM with a smaller regularization parameter, can be inherited to be the starting point for training the RFLSA-SVM with a greater one. This property, referred to as “contexts inheritance”, can be further utilized in the tuning of the regularization parameter for both the RFLSA-SVM and the FLSA-SVM.
The paper is organized as follows. Section
Given a set of
Introducing
The constraints of (
Equation (
The algorithm selects one basis function iteratively to span the solution. The following describes how the algorithm selects a basis function at each iteration.
At the end of the
For the calculation of
With introduction of the residue matrices
After the identification of all the
Other than selecting an extra basis function
After
At each iteration of the FLSA algorithm, it costs a major share of the computational efforts to solve the optimization problem formulated by (
Meanwhile, it is noted that, in the FLSA algorithm, the sequence of local approximation errors
Assuming a uniform distribution of z, the maximum of a sample
The proposition suggests that the probability of reaching a value that has a quantile of
It is thus proposed to select basis functions randomly from
INPUT: (i) The data set (ii) (iii) A dictionary of INITIALIZATION: (i) Generate a permutation of integers between 1 and
(ii) Current residue vector training data: (iii) The matrix
(iv) A variable FOR (v) The residue vector is reduced by (vi) Update the dictionary matrix and prune the candidate basis functions which can be represented as a linear combinations of the previously selected ones: FOR IF BACK SUBSTITUTION: The functions. performed for the solution: FOR OUTPUT: The solution is defined by
The RFLSA-SVM also differs from the FLSA-SVM with respect to the interpretation of the value of the regularization parameter. For the FLSA-SVM, the value of the parameter, provided it does not exceed the column rank of
In terms of a single round of training, the time complexity of RFLSA-SVM is
Assuming an RFLSA-SVM whose solution contains
Now consider the training for the RLFSA-SVM whose solution is parameterized by
It can be seen that the upper triangular submatrix on the upper left corner is in fact
Hence, in order to construct the linear system of the coefficient matrix the residue matrices for the first the target vector
The residue matrix
It is rather clear that the FLSA-SVM can also benefit from the technique of “contexts inheritance.” The technique of “contexts inheritance” makes the tuning of the regularization parameter
A set of experiments were performed to evaluate the performance of the proposed RFLSA-SVM algorithm. It was first applied to the two-spiral benchmark [
The 2D “two-spiral” benchmark is known to be difficult for pattern recognition algorithms and poses great challenges for neural networks [
Two-spiral dataset.
Figure
(a) the two-spiral pattern recognized by the RFLSA-SVM using 180 support vectors with
In conclusion, on the small but challenging “two-spiral” problem, the RFLSA-SVM achieved the outstanding generalization performance when the number of support vectors is large enough. Given a smaller set of support vectors, in an effort to ease the nonsparseness of its solution, the RFLSA-SVM still managed acceptable generalization performance. Thus the RFLSA-SVM offers more flexibility in choosing the number of support vectors.
The FLSA-SVM algorithm was applied to 4 binary problems: the Ringnorm dataset and Banana, Image, Splice and Ringnorm which are all accessible at
Benchmark information.
#Training | #Test | #Feature | |
---|---|---|---|
Banana | 400 | 4900 | 2 |
Splice | 1000 | 2175 | 60 |
Image | 1300 | 1010 | 18 |
Ringnorm | 3000 | 4400 | 20 |
Tables
Test correctness (%).
RFLSA-SVM |
FLSA-SVM |
SVM |
LS-SVM |
FSALS-SVM |
PFSALS-SVM |
D-OFR | |
---|---|---|---|---|---|---|---|
Banana | 89.29 (24, 1) |
|
|
88.92 (23, 0.6369) | 89.14 (25, 2−1) | 89.12 (23, 2−1) | 89.10 (40, 2−2) |
Splice |
|
|
89.75 (23, 2−7) | 89.84 (23, 0.0135) | 89.93 (23, 2−6) | 89.93 (28, 2−6) | 89.33 (380, 2−6) |
Image | 97.82 (20, 2−4) | 97.92 (180, 2−5) | 97.82 (27, 2−3) | 97.92 (27, 0.0135) |
|
98.02 (25, 2−2) | 97.92 (480, 2−3) |
Ringnorm |
|
98.66 (27, 2−5) | 98.68 (2−6, 2−5) | 97.07 (29, 0.1192) |
|
|
98.59 (47, 2−5) |
Number of support vectors (best in bold).
RFLSA-SVM | FLSA-SVM | SVM | LS-SVM | FSALS-SVM | PFSALS- SVM | D-OFR | |
---|---|---|---|---|---|---|---|
Banana | 24 |
|
94 | 400 | 145 | 141 | 40 |
Splice | 680 |
|
595 | 1000 | 507 | 539 | 380 |
Image | 380 |
|
221 | 1300 | 272 | 278 | 480 |
Ringnorm |
|
27 | 1624 | 3000 | 556 | 575 | 47 |
The test correctness for the RFLSA-SVM, as well as the FLSA-SVM with the number of support vectors ranging from
The value of
|
Image | Splice | ||
---|---|---|---|---|
Best accuracy: 98.32 | Best accuracy: 89.98 | |||
FLSAs |
RFLSAs |
FLSAs |
RFLSAs |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
These statistics showed that, allowing slight degradation of the classification accuracy, the sparseness of the RFLSA-SVM’s solutions can be further enhanced.
To demonstrate the merits of the “contexts inheritance” technique, the RFLSA-SVM was compared with the SVM, the LS-SVM, and the FLSA-SVM, in terms of the time cost of tuning the regularization parameter denoted as
Table
Training time (in CPU seconds) of the FLSA-SVM, the RFLSA-SVM, the SVM, and the LS-SVM on the banana dataset.
|
FLSA-SVMs | RFLSA-SVMs | log2( |
SVMs | LS-SVMs | |
---|---|---|---|---|---|---|
|
|
|
SMO | CG | ||
1 | 0.0630 | 0.0000 |
|
0.0620 | 3.0930 | 0.0888 |
20 | 0.1880 | 0.0150 |
|
0.0630 | 1.6880 | 0.0948 |
40 | 0.1880 | 0.0160 |
|
0.0620 | 0.9840 | 0.1067 |
60 | 0.1870 | 0.0160 |
|
0.0630 | 0.5940 | 0.1247 |
80 | 0.1720 | 0.0310 |
|
0.0620 | 0.3280 | 0.1418 |
100 | 0.1720 | 0.0320 |
|
0.0780 | 0.2190 | 0.1625 |
120 | 0.1560 | 0.0310 |
|
0.0630 | 0.1400 | 0.1901 |
140 | 0.1560 | 0.0470 |
|
0.0620 | 0.0780 | 0.2279 |
160 | 0.1400 | 0.0470 |
|
0.0460 | 0.0620 | 0.2621 |
180 | 0.1250 | 0.0470 |
|
0.0460 | 0.0470 | 0.3100 |
200 | 0.1410 | 0.0620 |
|
0.0310 | 0.0470 | 0.3929 |
220 | 0.1250 | 0.0620 |
|
0.0310 | 0.0470 | 0.4734 |
240 | 0.1090 | 0.0790 |
|
0.0460 | 0.0310 | 0.5691 |
260 | 0.1090 | 0.0780 |
|
0.0460 | 0.0320 | 0.6891 |
280 | 0.1100 | 0.0930 |
|
0.0310 | 0.0310 | 0.8653 |
300 | 0.1100 | 0.0940 |
|
0.0460 | 0.0310 | 1.0743 |
320 | 0.0940 | 0.0930 |
|
0.0460 | 0.0310 | 1.2615 |
340 | 0.0780 | 0.1090 |
|
0.0620 | 0.0320 | 1.6641 |
360 | 0.0780 | 0.1250 |
|
0.0620 | 0.0310 | 2.0141 |
380 | 0.0780 | 0.1250 |
|
0.0780 | 0.0310 | 2.4151 |
400 | 0.0630 | 0.1410 |
|
0.1250 | 0.0310 | 3.0980 |
| ||||||
|
0.6260 | 0.0470 | NA | NA | NA | |
|
1.9220 | 0.4850 | NA | NA | NA | |
|
2.6420 | 1.3430 | 1.2110 | 7.6080 | 16.2263 |
For the RFLSA-SVM algorithm, the row entry starting with
In contrast to the RFLSA-SVM, the FLSA-SVM selects support vectors into the solution by solving an optimization problem rather than random sampling of the training set. Thus for the FLSA-SVM, these two rows correspond to the time cost for selecting
It can be seen that the time cost of tuning the regularization parameter, given a dictionary matrix of 60 columns, was only 0.047 seconds by the RFLSA-SVM and 0.626 seconds by the FLSA-SVM. These was much less than the 1.211 seconds required by the SVM, the 7.608 seconds implemented by the CG method and the 12.2263 seconds by the SMO for the LS-SVM. Using the full dictionary matrix of 240 columns, it took 0.480 seconds for the RFLSA-SVMs which is still much less time cost in comparison to the LS-SVM and the SVM.
Tables
Training time (in CPU seconds) of the FLSA-SVM, the RFLSA-SVM, the SVMs, and the LS-SVM on the splice dataset.
|
FLSA-SVMs | RFLSA-SVMs | log2( |
SVMs | LS-SVMs | |
---|---|---|---|---|---|---|
|
|
|
SMO | CG | ||
1 | 1.1400 | 0.0000 |
|
1.0470 | 0.5000 | 0.8369 |
50 | 3.6100 | 0.0940 |
|
1.0320 | 0.5310 | 0.8666 |
100 | 3.5780 | 0.2500 |
|
1.0310 | 0.5150 | 0.9016 |
150 | 3.5150 | 0.2820 |
|
1.0310 | 0.5160 | 0.9350 |
200 | 3.4530 | 0.3910 |
|
1.0320 | 0.5000 | 1.0185 |
250 | 3.3750 | 0.5160 |
|
1.0310 | 0.5000 | 1.1168 |
300 | 3.2820 | 0.6090 |
|
1.0470 | 0.4690 | 1.2397 |
350 | 3.1560 | 0.7350 |
|
0.9690 | 0.4530 | 1.4596 |
400 | 3.0160 | 0.8440 |
|
0.8600 | 0.4380 | 1.7516 |
450 | 2.8750 | 2.2340 |
|
0.7660 | 0.4060 | 2.1006 |
500 | 2.7040 | 1.1090 |
|
0.7500 | 0.3750 | 2.6293 |
550 | 2.5150 | 1.2500 |
|
0.7820 | 0.3750 | 3.3424 |
600 | 2.3120 | 1.3910 |
|
0.8280 | 0.3590 | 4.3024 |
650 | 2.0930 | 1.5940 |
|
0.9380 | 0.3280 | 5.5413 |
700 | 1.8440 | 1.7040 |
|
0.9380 | 0.2970 | 6.9829 |
750 | 1.5940 | 1.8280 |
|
0.9380 | 0.2970 | 8.7769 |
800 | 1.3120 | 1.9690 |
|
0.9530 | 0.2970 | 10.4062 |
850 | 1.0310 | 2.1250 |
|
0.9530 | 0.2970 | 11.8561 |
900 | 0.7040 | 2.2660 |
|
0.9380 | 0.3120 | 12.3971 |
950 | 0.4060 | 2.4220 |
|
0.9370 | 0.3120 | 13.4612 |
1000 | 0.0310 | 2.5940 |
|
0.9530 | 0.3120 | 13.7259 |
| ||||||
|
8.3280 | 0.3440 | NA | NA | NA | |
|
18.6710 | 1.5330 | NA | NA | NA | |
|
47.5460 | 26.2070 | 19.7540 | 8.3890 | 105.6486 |
Training time (in CPU seconds) of the FLSA-SVM, the RFLSA-SVM, the SVMs, and the LS-SVM on the image benchmark.
|
FLSA-SVMs | RFLSA-SVMs | log2( |
SVMs | LS-SVMs | |
---|---|---|---|---|---|---|
|
|
|
SMO | CG | ||
1 | 1.0940 | 0.0000 |
|
0.9060 | 9.8130 | 1.2326 |
65 | 10.2810 | 0.1400 |
|
0.9060 | 5.9070 | 1.2312 |
130 | 10.2340 | 0.3750 |
|
0.9060 | 3.7820 | 1.3601 |
195 | 10.0160 | 0.6560 |
|
0.9220 | 2.5630 | 1.5557 |
260 | 9.7500 | 0.8750 |
|
0.8910 | 1.8440 | 1.8066 |
325 | 9.4540 | 1.3280 |
|
0.7970 | 1.2970 | 2.0622 |
390 | 9.0780 | 1.4220 |
|
0.6720 | 0.9840 | 2.3108 |
455 | 8.7040 | 1.6720 |
|
0.5780 | 0.7810 | 2.9718 |
520 | 8.3750 | 2.0470 |
|
0.4840 | 0.6090 | 3.2617 |
585 | 7.9210 | 2.3280 |
|
0.4530 | 0.5310 | 3.9884 |
650 | 7.4070 | 2.6880 |
|
0.4220 | 0.4680 | 4.7408 |
715 | 6.8290 | 2.9220 |
|
0.4060 | 0.4530 | 5.8988 |
780 | 6.2350 | 3.0620 |
|
0.4530 | 0.4370 | 7.9950 |
845 | 5.5940 | 3.4690 |
|
0.4530 | 0.4540 | 10.4827 |
910 | 4.8910 | 3.8280 |
|
0.4220 | 0.4530 | 14.0951 |
975 | 4.0930 | 4.1720 |
|
0.4530 | 0.4370 | 18.2100 |
1040 | 3.3750 | 4.3590 |
|
0.4690 | 0.3910 | 24.0141 |
1105 | 2.5470 | 4.7970 |
|
0.5160 | 0.4070 | 32.1312 |
1170 | 1.7350 | 4.8900 |
|
0.5940 | 0.4220 | 39.7957 |
1235 | 0.5620 | 5.2340 |
|
0.5470 | 0.4220 | 54.6313 |
1300 | 0.0000 | 0.0000 |
|
0.6720 | 0.4220 | 76.5178 |
| ||||||
|
11.3750 | 0.1400 | NA | NA | NA | |
|
41.3750 | 2.0460 | NA | NA | NA | |
|
47.5460 | 26.2070 | 19.7540 | 8.3890 | 105.6486 |
Training time (in CPU seconds) of FLSA-SVMS, RFLSA-SVMS, SVMs, and LS-SVMs on ringnorm benchmark.
|
FLSA-SVMs | RFLSA-SVMs | log2( |
SVMs | LS-SVMs | |
---|---|---|---|---|---|---|
|
|
|
SMO | CG | ||
1 | 7.0310 | 0.0150 |
|
5.9840 | 2.5620 | 8.1861 |
150 | 235.1560 | 2.1250 |
|
6.0000 | 2.5470 | 8.4898 |
300 | 228.1410 | 16.5630 |
|
5.9840 | 2.5630 | 9.1043 |
450 | 219.4060 | 13.5160 |
|
5.9690 | 2.5620 | 9.7888 |
600 | 210.4220 | 36.3430 |
|
5.9840 | 2.5620 | 10.5138 |
750 | 199.8600 | 23.6100 |
|
5.9690 | 2.5160 | 11.8843 |
900 | 189.3440 | 29.1720 |
|
6.3280 | 2.5310 | 13.8879 |
1050 | 179.0150 | 67.1560 |
|
5.8590 | 2.4690 | 16.5239 |
1200 | 167.9220 | 41.3430 |
|
5.2190 | 2.4060 | 20.5536 |
1350 | 156.5000 | 62.6090 |
|
5.1100 | 2.3590 | 25.7184 |
1500 | 143.8430 | 63.1100 |
|
5.0150 | 2.4070 | 32.0827 |
1650 | 131.2180 | 83.4070 |
|
5.0160 | 2.3600 | 38.5026 |
1800 | 119.0470 | 82.5930 |
|
5.0160 | 2.3290 | 48.0788 |
1950 | 105.7820 | 85.5470 |
|
5.0000 | 2.3910 | 62.9366 |
2100 | 92.4840 | 136.6720 |
|
5.0160 | 2.3430 | 77.4289 |
2250 | 78.6570 | 107.9530 |
|
5.0310 | 1.9530 | 92.2615 |
2400 | 65.0000 | 105.4850 |
|
5.0310 | 1.8130 | 111.8921 |
2550 | 50.7500 | 133.1870 |
|
5.0310 | 1.7970 | 124.1563 |
2700 | 36.6560 | 142.8280 |
|
5.0150 | 1.8120 | 131.0737 |
2850 | 22.2180 | 149.6720 |
|
5.0310 | 1.7970 | 135.9099 |
3000 | 7.7030 | 170.1560 |
|
5.0150 | 1.8290 | 138.8800 |
| ||||||
|
242.1870 | 2.1400 | NA | NA | NA | |
|
470.3280 | 18.7030 | NA | NA | NA | |
|
2646.1550 | 1553.0620 | 113.6230 | 47.9080 | 1127.8540 |
For the FLSA-SVM algorithm, given a dictionary matrix of 60 columns, the training cost is
In Table
While maintaining competitive generalization performance to the SVM and the Least-Square SVM (LS-SVM), the proposed Reduced Forward Least-Squares Approximation (RFLSA) SVM uses only a random sampling of, rather than all, the training samples as the candidates for support vectors during the training procedure. This strategy of random selection was shown to be statistically justified.
Meanwhile, when an RFLSA-SVM is trained whose solution is spanned by
The experiments confirmed that, for the RFLSA-SVM and the FLSA-SVM algorithms, the technique of contexts inheritance made the procedure of the tuning of the regularization parameter much faster than the SVM and the LS-SVM.
This work was supported by Grants from Project LQ13F030011 of Zhejiang Natural Science Foundation and Project 2012AY1022 of Jiaxing Science and Technology Bureau, China.