Micro Learning Support Vector Machine for Pattern Classification: A High-Speed Algorithm

The support vector machine theory has been developed into a very mature system at present. The original support vector machine to solve the optimization problem is transformed into a direct calculation formula of line in this paper and the model is o(n2) time complexity. In the model of this article, weited theory, multiclassification problem and online learning have all become the direct inference, and we have applied the new model to the UCI data set. We hope that in the future, this model will be useful in real-world problems such as stock forecasting, which require nonlinear hi-speed algorithms.


Introduction
Since the establishment of the support vector machine ( [1][2][3]) in 1995, Vapnik et al., the support vector machine(SVM) has been the focus of researchers in data mining. e classical SVM processing is a classic binary classification problem, where SVM labels unlabeled points by solving an optimal line. One of the basic principles of SVM is to use kernel technology so that the specific mapping method cannot be known. rou the simple inner product of the nonlinear problem in the original space, we can solve this problem with a linear problem in another space that is mapped to. Moreover, the model can significantly improve the time complexity of the small sample problems throu dual theory, and one of the classic model thinking: the maximum interval is also widely used in various models 1 .
In 2007, Jayadeva et al. ( [4][5][6][7]) established the model as the twin support vector machine (TWSVM), which solves the classic two classification problem. Unlike SVM, TWSVM is mainly used to solve nonparallel problems. e support surface of the SVM is two parallel hyperplanes, and TWSVM is the solution to the two nonparallel hyperplanes. is model no longer uses the maximum interval principle in SVM. TWSVM solves for two straight lines that are as close as possible to the two classes of points. Classification is performed by determining which line a new point is close to. SVM has a time complexity of o(n 3 + d) when the number of samples is n and the number of characteristics is d, making SVM very suitable for solving hi-dimensional small sample problems. Some other algorithms are also suitable for small samples, but they have mutual advantages and disadvantages with SVM 2 . e kernel technique also makes SVM very suitable for solving nonlinear problems and hi-dimensional problems. ere are also some algorithms that are suitable for hi-dimensional problem, but they have mutual advantages and disadvantages with SVM 3 . However, a common problem with a range of algorithms based on SVM is the inability to solve large-sample problems due to the limitations of the optimization algorithm. erefore, we want to provide algorithm that maintains the good properties of SVM while reducing the time required to solve large sample sizes. We give a nonoptimal SVM model with a time complexity of o(n 2 + d). is improvement will go some way towards circumventing the problem that SVM cannot be applied to large-scale data. Our model can be applied to many hi-dimensional or large sample problems 4 and has substantial implications for solving real-world problems with hi-dimensional, large samples, and hi-time demands.
In this paper, we consider a kind of nonoptimal machine learning model from the point of view that there is only one positive and one negative. e model takes a point from the positive and negative points to train a model, and then uses all the combinations of positive and negative points. Finally, the model is considered by using multiple models.
e basic logic of the model is that it can be used to train several classification models. In this case, we construct a classification model which can be used in kernel technology. It is interesting to note that in this model, the problems of machine learning, such as the classification problem, the weited problem, and the fitting problem, will be straitforward to operate. e details of our research are shown in Figure 1.

Classical Model
Consider the classic two classification problem: given training set (x i , y i ) ∈ R d × −1, 1 { }, i � 1, 2, 3, . . . , n where y i is the label, and we have to look for the decision function f(x) to infer any new input x corresponding output y. In order to facilitate the representation, we use the following notation: A represents a data set of positive class points, and B represents the data set of negative class points.
First, we review the classical linear SVM model. e model aims to establish a strait line between positive and negative, two types of samples. One of the principles of SVM is the principle of maximum distance, that is, to maximize the distance between two support planes. We assume that the dividing surface is wx + b � 0, and the two support surface is wx + b � 1 and wx + b � −1.
e problem of solving the problem of SVM is changed into the following optimization problem: On the other hand, the linear optimization problem is as follows: (2) en, we consider the kernel technology in the dual problem and transform x i x j into K(x i , x j ). e Euclidean space is mapped into another space, and the nonlinear problem is transformed into a linearly separable problem in a hi-dimensional space.
We look back on another machine learning algorithm: the TWSVM. e TWSVM focuses on solving the nonparallel problem. Two types of sample points are enriched near the two parallel lines. e model aims to find two nonparallel strait lines, which can be used to determine the type of line in the classification. e optimization problem of the model is as follows: e dual problem of the model is as follows: where In order to introduce kernel technology, we consider replacing the two strait lines xw 1 + b 1 � 0 and xw 2 + b 2 � 0 (X is the sum of A and B).  Computational Intelligence and Neuroscience e dual problem is as follows:

New Model
Firstly, we consider the process of learning. If we only have a sample point, for example, if our problem is how to determine whether a person is male or female and the training set is just a lady in the picture. en, we cannot judge another picture of the characters in the male and female. It is difficult to classification when we have only one class of points. By the same token, even thou our training focuses on ten thousand women's photographs, without a single photo of men, it is still unable to train a model that can distinguish between men and women. It is difficult for us to compare the difference between a man and ten thousand women in our normal human thinking, and we can only compare the difference between a male and a female. erefore, we consider a positive point and a negative point. e training set has only one positive and one negative point. Using the idea of the maximum interval of SVM, we can obtain that the optimal line is the two points of the vertical bisector of the line, and the functional distance between the two points and a strait line is 1 in Figure 2.
Obviously, we can get the dividing line as follows: en, we consider a very interesting classification problem, as shown in figure two in Figure 3.
From Figure 3, we can see that each point in the positive point and negative point in the class make the points of the line, and finally, the combination of points is a reasonable way. So we consider the following algorithm. en, we consider the general situation, that is, the number of positive and negative points (in order to facilitate the consideration, we assume that there are M positive points and N negative points). Take a positive point and a negative class point, and we can get its vertical bisector.
We consider the calculation of all the points, and then use each of the subline to consider the classification problem. e core idea of this algorithm is to take each positive point and each negative class point out to build a subline, and then all the points out of an average. Take the positive and negative values as the classification results and we consider the classification results are as follows: After we discuss an improvement of the model, we consider the following sample points in Figure 4.
In the training of the sample points, if we consider the model we calculate, it will lead to the left side of the figure, resulting in the training model is not reasonable. e foundation of our model is the two point training division. We consider the linear translation, namely, the introduction of a parameter C in(−1, 1) two, making the line training as follows: en, we introduce the nucleus to each line. Since our model is only composed of the inner product of two vectors, we can use K(x, y) to replace the xy, which can be obtained as follows: It is evident that the time complexity of the model is O(n 2 ). Because we have a large number of points to get the average score, the weited sum of the sample points is the direct inference of the model. Similarly, if we want to get an online learning model or give up some of the sample points, the time complexity will be very low. Under this premise, the training complexity of the multiclassification problem is also very low.

Data Testing
First, we compute the linear kernel on the UCI dataset, and the accuracy and variance contrast is shown in Table 1.
We can see that the new model has a good advantage, and then we consider the computation of the nonlinear RBF kernel on the UCI data set in Table 2.
In the theory, we show that the new algorithm has strong superiority in time complexity. Here, we do some experiments to count. It can be seen that there are great advantages of some data sets. en, we calculate the computation time of the linear kernel and the nonlinear kernel on the different number of data sets in Table 3.
In addition to the analysis of the time complexity of the linear problem, we then count the time of the nonlinear case. en, we apply the nonlinear kernel in Table 4. e time complexity of our model is o(n 2 + d). e time complexity of the SVM and the TWSVM is o(n 3 + d). Based on Tables 3 and 4, our model is still faster than the SVM and the TWSVM, even in small sample problems with less than 1000 samples. Also, since our algorithm does not need to solve the optimization problem, the time of our algorithm is stable with respect to the growth of the number of samples. It can be expected that in large-scale samples, our algorithm will be much faster than SVM and TWSVM in computation.

Conclusion
Based on the point-to-point model, this paper establishes a micro learning support vector machine. e model is different from the traditional SVM in the way of solving it, and it is not necessary to solve the optimization problem. is makes the model have some difference with the traditional machine learning algorithm. Both neural networks and SVM are unknown time to calculate. e microlearning support vector machine is in a fixed time when the length of the orientation and the number of sample points are fixed. is is of great benefit to the stability of our design and practical applications. From the view of time complexity, the algorithm is better than SVM. Extending the micro learning support vector machine to weighted problems, multiclassification problems, and fitting problems is very simple and straightforward.
Our algorithm outperforms both SVM and TSVM in terms of model accuracy and computation time, and it also has good nonlinear generalisation due to the fact that we also use the kernel function. e computation time of this algorithm is explicit because it does not require solving an optimization problem. Overall, this algorithm is well suited to problems such as stock prediction and face recognition, which require nonlinear, hi-dimensional data, and hi computational speed.
Based on [8,9], we can extend the model to a semisupervised problem in the future. We just have not come up with a suitable modelling idea yet. We believe that the ideas used to build our model can also be extended to the field of feature extraction in the future. And it can be applied to many related problems (e. g. [10,11]).
We likewise believe that our model can be used to solve problems related to regression after a SVM-to-SVR-like transformation (e. g. [12][13][14][15][16][17]). Of particular interest is the fact that our algorithms are well suited for applications in the field of financial forecasting(e. g. [18][19][20].). e field of financial forecasting requires algorithms with controllable computation times and good performance for nonlinear problems.
In the same way as SVM, our algorithm can be used to solve multiclassification problems. We hope that other researchers will apply our algorithm to multiclassificationrelated problems in the future(e. g. [21][22][23][24]). It is worth noting that this algorithm can be used for face recognition(e. g. [25,26]). Similarly, our model can be used to solve the multilabel problem(e. g. [27,28]).

Data Availability
Data are from the UCI dataset.

Conflicts of Interest
e authors declare that they have no conflicts of interest.