A New SVM Multiclass Incremental Learning Algorithm

A new support vector machine (SVM) multiclass incremental learning algorithm is proposed. To each class training sample, the hyperellipsoidal classifier that includes as many samples as possible and pushes the outlier samples away is trained in the feature space. When the new samples are added to the classification system, the algorithm reuses the old classifiers that have nothing to do with the new sample classes. To be classified sample, the Mahalanobis distances are used to decide the class of classified sample. If the sample point is not surrounded by any hyperellipsoidal or is surrounded by more than one hyperellipsoidal, the membership is used to confirm its class. The experimental results show that the algorithm has higher performance in classification precision and classification speed.


Introduction
SVMs [1], as a new machine learning method based on statistical learning theory, deliver state-of-the-art performance in real world pattern recognition [2,3] and data mining applications such as text categorization, hand-written character recognition, image classification, and bioinformatics, even into control field [4,5].However, the general training method of SVMs will not work when the amount of training samples is too large to be put into the RAM of computer.In order to solve this problem and improve the speed of training SVMs, the incremental learning algorithm has become one of the key techniques for training SVMs on large data sets, especially on multiclass problem.
When using SVMs to deal with multiclassification problems, the most popular four approaches are one-againstone [6], one-against-rest [7], DAGSVM [8], and binary tree SVM [9][10][11].Some incremental learning algorithms have been proposed, such as Batch SVM [12,13], online recursive algorithm [14], divisional training SVMs algorithm [15], and fast incremental learning algorithm [16].However, these multiclassification approaches are based on binary classifiers, when new samples were added to the classification system, the whole model of the classifier must be retrained.Reference [17] proposed a class-incremental learning algorithm.The algorithm reuses the old models of the classifier, and only one binary classifier is trained when a new class comes.But it is not suitable for large data set, and new sample set can not include the old class samples.Reference [18] proposed a multiclass incremental learning algorithm based on hypersphere SVMs (HSSVMIL).The algorithm reuses the old models of the classifier, and it is suitable to class-incremental learning and old class sample-incremental learning at the same time.But the distribution of every class sample must be hypersphere shape and the density of the samples is higher in the feature space.Otherwise, the precision of the algorithm is lower.For the disadvantage, [19] proposed a multiclass incremental learning algorithm based on hyper ellipsoidal (HEIL), but the algorithm does not consider the influence of the outlier samples.
In this paper, a Mahalanobis hyperellipsoidal SVM multiclass incremental learning algorithm (MSVMIL) is proposed.To every class sample, the smallest hyperellipsoidal that contains as much samples as possible and pushes the outlier samples away is trained in the feature space.Mahalanobis distances are used to confirm the class of classified sample.
This paper is organized as follows.In Section 2, a review of hypersphere SVM is given.In Section 3, a new multiclass incremental learning algorithm is discussed in detail.In Section 4, experimental results are given on Reuters 21578.Finally, conclusion is outlined.

Hyperellipsoidal Support Vector Machine
Given a set of training sample of a class  = {  }  =1 , where   ∈   .Let  be a  ×  sample matrix.Training a hyperellipsoidal (, ) in the feature space, where  is the center of the hyperellipsoidal and  is the radius of the hyperellipsoidal.The hyperellipsoidal should contain most of the samples and the radius  is as small as possible.If there are not remote points, then the hyperellipsoidal will contain all samples.If there are remote points, then some samples outside the hyperellipsoidal are allowed, training the smallest hyperellipsoidal that contains most of the samples.When we are uncertain whether there are remote points, nonnegative slack variables   ( = 1, 2, . . ., ) are introduced to allow some samples outside the hyperellipsoidal.Using the method is similar to finding optimal hyperplane to obtain the smallest hyperellipsoidal [19][20][21].The formulation is as follows: where  is used to compromise the number of noises out of hyperellipsoidal and the radius of hyperellipsoidal, Σ is covariance matrix of the samples.
To solve the optimization problem above, one can construct the Lagrange function as follows: where   (  ≥ 0) and   (  ≥ 0) are the Lagrange multipliers.
According to the Kuhn-Tucker Theorem (KKT) in optimization theory, the following conditions are satisfied: Substituting ( 3) into (2), the dual optimal problem is obtained as follows: max The kernel form of ( 4) is as follows: max where The examples that lie outside or on the margin are the corresponding   nonzero.These examples are called support vectors.

(8)
Remark 1. Mahalanobis distance denotes the distance between a data point and multivariate space centroid, that is, overall mean value.
Assume that   ( = 1, 2, . . ., ) is a subset of , and all samples in   belong to the th class.For every subset   , train the smallest hyperellipsoidal (  ,   ) in feature space, where   is the center of the hyperellipsoidal,   is the radius of the hyperellipsoidal.  is the support vector set.
If there is a new sample set  is added to the old classification system, where   ( = 1, 2, . . ., ,  + 1, . . ., ) is a subset of  and all samples of   belong to the th class.The multiclass incremental learning algorithm based on hyperellipsoidal SVM is described in detail as follows.
If there is not hyperellipsoidal containing the sample point, via (,   ) >   ( = 1, 2, . . ., ), then compute the membership that the sample belongs to the th class according to (9) and then confirm the class of the sample according to (10).Consider class = arg min If there are no less than two hyperellipsoidals containing the sample point, compute the membership that the sample belongs to the th class according to (11) for (,   ) ≤   firstly and then confirm the class of the sample according to (10).One has For the classified ample , the multiclass classification algorithm is described as follows.
Step 2. If the sample point is contained by only one hyperellipsoidal (  ,   ), the sample belongs to the th class; go to Step 5. Otherwise, go to Step 3. Step 3. If the sample point is not contained by any hyperellipsoidal, compute the membership that the sample belongs to the th ( = 1, 2, . . ., ) class according to (9) and then confirm the class of the sample according to (10); go to Step 5; otherwise, go to Step 4.
Step 4. If the sample point is contained by no less than two hyperellipsoidals (  ,   ) ( ∈ 1, 2, . . ., ), then compute the membership that the sample belongs to the mth class according to (11) and then confirm the class of the sample according to (10); go to Step 5.

Experiments
Experiments are made on Reuters 21578, in which five categories and 2302 texts are used.1536 texts are used as training set, and the rest are used as testing set (see Table 1).Information gain is used to reduce feature dimension and the weight of every word is computed according to TF-IDF.
To verify the efficiency of the proposed method, the same task is realized by using HSSVMIL, HEIL, and MSVMIL.The computational experiments were done on a Pentium 1.6 G with 512 MB memory.Kernel function is radial basis function (RBF) (, ) =  −‖−‖ 2 , where  = 0.01.Penalty parameter of MSVMIL  = 100.System parameter of HSSVMIL V = 0.1.
The macroaverage precision (MAAP), macroaverage recall (MAAR), and macroaverage  1 (MAAF) are used to evaluate the classification performance of the algorithms.Defined as follows: where  is the number of classes,   is the precision of the th class sample,   is the recall of th class sample, and   1 is the  1 value of th class sample.
In experiments, the original dataset includes two class samples (Acq and Eran), and three times incremental learning are done.The first incremental samples include three classes (Acq, Eran, and Grain).The second incremental samples include four class samples (Acq, Eran, Grain, and Crude).The third incremental samples include four class samples (Acq, Eran, Crude, and Trade).The Macroaverage precision, macroaverage recall, and macroaverage  1 of three algorithms are given in Table 2.The training time and testing time of three algorithms are given in Table 3.
The experimental results show that the classification precision and recall of MSVMIL are higher than HEIL and MSVMIL.The main reasons are that MSVMIL reduced the volume that surrounds sample points by pushing the outlier samples away and considered the distribution of samples by using Mahalanobis distance.The classification speed of MSVMIL is faster than SSVMIL and is equal basically to HEIL.The training speed of MSVMIL is faster than HEIL and is equal basically to SSVMIL.

Conclusion
To solve SVM multiclass incremental learning problem, a novel algorithm based on Mahalanobis hyperellipsoidal SVM is proposed.In the process of incremental learning, only the new samples and the old class support vectors that its class exist in the new samples take part in the training.The history classifiers that have nothing to do with the new samples are reused.In the process of classification, the Mahalanobis distance is used to confirm the classified sample class.The experimental results show, compared with HEIL and SSVMIL, the proposed algorithm has a higher performance in classification precision and classification speed.In fact, data driven [22,23] is another worth-thinking method, and this idea would be presented in the further work.

2 Mathematical
Problems in Engineering

Table 1 :
Training set and testing set.

Table 3 :
Training time and testing time of three algorithms.