In modeling, only information from the deviation between the output of the support vector regression (SVR) model and the training sample is considered, whereas the other prior information of the training sample, such as probability distribution information, is ignored. Probabilistic distribution information describes the overall distribution of sample data in a training sample that contains different degrees of noise and potential outliers, as well as helping develop a highaccuracy model. To mine and use the probability distribution information of a training sample, a new support vector regression model that incorporates probability distribution information weight SVR (PDISVR) is proposed. In the PDISVR model, the probability distribution of each sample is considered as the weight and is then introduced into the error coefficient and slack variables of SVR. Thus, the deviation and probability distribution information of the training sample are both used in the PDISVR model to eliminate the influence of noise and outliers in the training sample and to improve predictive performance. Furthermore, examples with different degrees of noise were employed to demonstrate the performance of PDISVR, which was then compared with those of three SVRbased methods. The results showed that PDISVR performs better than the three other methods.
Since its proposal by Vapnik, the support vector machine (SVM) has been used in many areas, including both pattern recognition and regression estimation [
The SVR model also has been improved by prior knowledge [
Inevitably, even small noise can debase the accuracy of the model. Furthermore, in some situations, part of the noisy information may be ten to even dozens of times larger than the normal data. These outliers introduce bias and inaccuracies to SVR. Nevertheless, the probability distribution of the sample data is a good indicator of noise. From the perspective of the probability distribution of sample data, normal data and data that contain the least amount of noise have the highest probability in the sample data. By contrast, data that contain large amount of noise have relatively small probability. Thus, outliers in the sample data will have the smallest probability. Therefore, the probability distribution is the prior knowledge that helps weaken the influence from noise and outliers in the sample data. We consider this information to modify our SVR model.
This article is structured as follows: Section
To better describe the proposed algorism, the mathematical clarification of the basic concepts of SVR and the usage of deviation information should be provided.
SVR is originally used to solve linear regression problems. For given training samples
However, the previous solution is only for a linear regression problem. Nonlinear regression necessitates the kernel function in the SVR model [
The changing of the fitting function leads to the following constraint optimization problem:
The standard SVR is a compromise between structural risk minimization and empirical risk minimization. In particular, for the support vector regression learning algorithm, the structural risk term is
Traditional SVR does not possess a special method for addressing noise in sample data. An efficient way to weaken noise is to adjust parameters in the SVR model. These parameters are called hyperparameters in SVRs. Hyperparameters exert a considerable impact on algorithm performance. The general way to test the performance of hyperparameters is via the deviation between the model output and the sample data [
In most of the circumstances, the deviation between the model output and sample data is represented by the correlation coefficient
The formula for mean square error (MSE) is as follows:
Although DMSVR can reduce influence from the noise, it also has some weaknesses. The main disadvantage of this method is the time it spends on training. There are many parameters that need to be optimized in SVR. If there are extra parameters to optimize, these works would make the train process inefficient. To solve the uncertainty of error parameter
The probability distribution information is the same as the probability distribution function and describes the likelihood that the output value of a continuous random variable is near a certain point. Integrating the probability density function is the proper way to calculate the probability value of the random variable in that certain region. From the sample data, we could set the frequency of output to appear as different values. Then, we set frequency as
Once we have obtained the probability distribution of output, it should be integrated into the basic SVR model. In the basic SVR model, the error parameter
Define the
Linear problem illustration of a PDISVR model.
To further improve the performance of the SVR model, we consider adding an extra fragment to the PDISVR framework. The PDISVR model only has a unique error parameter
Normally, the performance of the different SVRs is intensively dependent on the parameters selection. A PSO based hyperparameters selecting method in [
Typical parameters for PSO algorithms.
Parameter  Value 

Maximum generation  200 
Population size  20 
Cognitive efficient ( 
1.5 
maximum velocity ( 
1 
Social efficient ( 
1.7 
Initial inertia weights  0.9 
Final inertia weights  0.4 
minimum velocity ( 
10 
Workflow on hyperparameters’ selection.
To verify the effect of the probability distribution information on the standard SVR model, we employed three kinds of numerical experiments with real datasets. In these experiments, we considered three kernels including linear kernel, polynomial kernel, and Gaussian kernel as SVRs’ kernel functions. All of the experiments were operated on MATLAB with Intel i5 CPU and 6 GB internal storage.
Experimental studies have mainly compared different SVR models, including basic SVR, MKSVR and heuristic weighted SVR [
Example 1 tested the previous four methods with onehighdimensional functions, as shown in [
After obtaining the optimal hyperparameters, we then determined the influence of noise on different SVR algorithms. The range of the magnitude of the noise
Four models’ prediction results in linear kernel with different noise: (a) is the average value of correlation coefficient
Four models’ prediction results in radial basis function kernel with different noise: (a) is the average value of correlation coefficient
Four models’ prediction results in polynomial kernel with different noise: (a) is the average value of correlation coefficient
In these three figures, the average criterion values indicated that the general performance of the algorithm and the standard deviation are representative of the algorithm’s stability. From Figures
The effects from rough error cannot be ignored in real production processes. To better simulate real production conditions and reveal the robustness of the proposed PDISVR when the training samples involved outliers, the rough error term should be added to the function in the previous model. A total of 80 data with a noise intensity of 1 were haphazardly generated by (
To better compare the predictive performance of the different SVR algorithms, the same four algorithms were trained ten times in samples with three outliers. The average values and standard deviation values of
As indicated in Tables
Testing results of SVR algorithms with rough error (linear kernel).
Criterion  PDISVR  SVR  HWSVR  MKSVR 



Average value 

0.86844  0.84402  0.85828 
Standard deviation 

0.002867  0.002707  0.001998 
MSE  
Average value 

0.52311  0.58221  0.45612 
Standard deviation 

0.111161  0.045897  0.028321 
Testing results of SVR algorithms with rough error (radial basis function kernel).
Criterion  PDISVR  SVR  HWSVR  MKSVR 



Average value 

0.78126  0.75976  0.64346 
Standard deviation  0.000297 

0.000774  0.000244 
MSE  
Average value  0.70686  0.73748 

0.84296 
Standard deviation  0.00316  0.00150  0.00506 

Testing results of SVR algorithms with rough error (polynomial kernel).
Criterion  PDISVR  SVR  HWSVR  MKSVR 



Average value 

0.79399  0.75874  0.8137 
Standard deviation  0.000377  0.004657  0.009121 

MSE  
Average value 

0.727  1.48208  0.62433 
Standard deviation 

0.076291  1.462867  0.028654 
In order to illustrate the quality of PIDSVR’s weight element, we compared the weighting results in Table
Weights of the training sample for PDISVR.
Weights of the training sample for HWSVR.
To test our regression model in a more realistic way, we imported six more realistic datasets from the UCI Machine Learning Repository [
Details of the experimental datasets.
Data  Number of instances  Number of attributes  Dataset information 

Combustion side reaction  144  6  The combustion side reaction is the process by which the solvent and reactants are oxidized to carbon dioxide, carbon monoxide, and water 


AutoMPG dataset  398  8  Data concerning citycycle fuel consumption 


Computer hardware dataset  209  9  Relative CPU performance in terms of cycling time, memory size, and other parameters 


Soil samples measured by NearInfrared Spectroscopy (NIR)  108  1050  This data is about the soil organic matter (SOM) from northern Sweden (681210N, 181490E) 


Energy efficiency  768  9  This study looked into assessing the heating load and cooling load requirements of buildings (that is, energy efficiency) as a function of building parameters 


Crude oil distillation process  30  1  Data related to different components with different boiling points 
As shown in Tables
Comparative results of previous SVR models in real datasets (linear kernel).
Criterion  PDISVR  SVR  HWSVR  MKSVR  

Combustion side reaction 


0.6790  0.6797  0.6939 
MSE  0.0051  0.0050  0.0046 


AutoMPG 


0.8578  0.8992  0.8554 
MSE 

25.8051  23.7999  20.4802  
Computer hardware 


0.8750  0.8748  0.9012 
MSE  7987.2  8373.7  8381.5 


NIRSOIL 


0.7335  0.7712  0.7657 
MSE 

2230.3  2085.1  2385.8  
Energy efficiency 


1  1  0.9999 
MSE 

1.1092  0.0221  0.0818  
Crude oil distillation process 

0.9372  0.7684  0.9372 

MSE  1016.2 

14510  8162.8 
Comparative results of previous SVR models in real datasets (radial basis function kernel).
Criterion  PDISVR  SVR  HWSVR  MKSVR  

Combustion side reaction 


0.425  0.6031  0.5827 
MSE 

0.0095  0.6031  0.0578  
AutoMPG 

0.5594  0.614  0.6261 

MSE 

26.1598  169.7247  22.0482  
Computer hardware 

0.7815  0.8597  0.8647 

MSE  27961  19986  15355 


NIRSOIL 


0.6707  0.5565  0.5666 
MSE 

1302.3  1601.1  3523.5  
Energy efficiency 


0.9796  0.9901  0.9997 
MSE 

4.1875  3.019  0.0474  
Crude oil distillation process 

0.9899  0.9739 

0.9573 
MSE 

78.7604  1119.8  317.6248 
Comparative results of previous SVR models in real datasets (polynomial kernel).
Criterion  PDISVR  SVR  HWSVR  MKSVR  

Combustion side reaction 

0.5796  0.6853 

0.5518 
MSE 

0.0106  0.3546  0.1237  
AutoMPG 

0.9124  0.9338 

0.8604 
MSE 

6793.3  6814.4  4855.9  
Computer hardware 

0.8337 

0.8519  0.8423 
MSE 

19.5298  19.8641  13.9781  
NIRSOIL 


0.6342  0.6468  0.6429 
MSE 

3104.7  2918.9  3007.3  
Energy efficiency 


0.9964  0.9988  0.9867 
MSE 

0.4598  0.1676  4.1448  
Crude oil distillation process 

0.9571  0.8071  0.9807 

MSE  321.8016  112.9353 

1105.4000 
In traditional SVR modeling, the deviation between the model outputs and the real data is the only way to represent the influence of noise. Other information, such as possibility distribution information, is not emphasized. Therefore, we proposed a special method that uses the possibility distribution information to modify the basic error parameter and slack variables of SVR. Given that these parameters are weighted by the probability distribution of the model output points, they can be adjusted by SVR itself and no longer require optimization by any intelligent optimization algorithm. The proposed algorithm is superior to other SVRbased algorithms in dealing with noisy data and outliers in simulation and actual datasets.
The authors declare that they have no conflicts of interest.