Parameter Selection Method for Support Vector Regression Based on Adaptive Fusion of the Mixed Kernel Function

Support vector regression algorithm is widely used in fault diagnosis of rolling bearing. A new model parameter selection method for support vector regression based on adaptive fusion of the mixed kernel function is proposed in this paper. We choose the mixed kernel function as the kernel function of support vector regression. The mixed kernel function of the fusion coefficients, kernel function parameters, and regression parameters are combined together as the parameters of the state vector.Thus, themodel selection problem is transformed into a nonlinear system state estimation problem. We use a 5th-degree cubature Kalman filter to estimate the parameters. In this way, we realize the adaptive selection of mixed kernel function weighted coefficients and the kernel parameters, the regression parameters. Compared with a single kernel function, unscented Kalman filter (UKF) support vector regression algorithms, and genetic algorithms, the decision regression function obtained by the proposed method has better generalization ability and higher prediction accuracy.


Introduction
The core components and important mechanical structures of mechanical equipment will inevitably be subject to varying degrees of failure with the complex operating conditions and bad working environment.It may cause huge economic losses and casualties when equipment fails.Rolling bearing is widely used in rotating machinery, and its running state directly affects the accuracy, reliability, and life of the machine.Timely and accurate diagnoses for the fault of rolling bearing are helpful to improve the reliability of equipment and reduce the probability of accidents.Due to the highly nonlinear nature between the fault and the characteristic, fault diagnosis methods based on machine learning have been applied more and more widely in the field of automation in recent years [1][2][3].Vapnik invented the superior performance of support vector regression method (support vector regression, SVR) [4].The improved SVR algorithm will greatly improve the accuracy of fault diagnosis.
The key to SVR is the kernel function and parameter selection.There are several methods that have been used to optimize the kernel and select the regression parameters, such as cross validation learning [5,6], gradient descent learning [7,8], evolutionary learning [9,10], and positive semidefinite programming learning [11,12].The model of support vector regression and the selection of kernel parameters are relatively few, and it primarily uses the grid search cross validation method and evolutionary method.But the efficiency of these methods is very low because of the exhaustive searching for optimal parameters [13].When the number of parameters is more than two, it becomes almost impossible to operate, such as the genetic algorithm [14] and particle swarm optimization algorithm [15].The more serious case is that the evolutionary algorithm may easily fall into local optimization; that is, it only obtains a suboptimal solution, rather than the optimal solution.Literature [16] provides another way to estimate kernel parameters by using the adjustment of multiparameters of LS-SVM as the parameter identification problem of a nonlinear dynamic system.By using the smoothness of the system model, the kernel parameters and the regression parameters are automatically adjusted by the extended Kalman filter (EKF).Reference [17] puts forward a new method based on the unscented Kalman filter support vector regression model selection method (UKF-SVR) to solve the loss function on the hyperparameters of a nondifferentiable problem.However, considering the accuracy of UKF, it is difficult to meet the needs of practical applications, and when the system dimensions are high, the performance of UKF is significantly degraded, leading to a curse of dimensionality.In 2009, Arasaratnam and Haykin proposed a cubature Kalman filter (CKF) that uses radial quadrature rules for optimization sigma points and weights, enhancing the ability to handle high dimensional nonlinear state estimated accuracy and improving stability.CKF is based on the 3rd degree of radial integration rule, and filtering accuracy is still limited [18].Recently, [19] proposed a class of high-order spherical and radial integral cubature Kalman filters and proved that the five-order cubature Kalman filter (5th-degree CKF) can obtain high accuracy and high stability with low computational cost.
Although the above actions to resolve specific problems have improved the algorithm, they are single kernel function based on support vector regression.Kernel functions can be divided into local kernel functions and global kernel functions.The learning ability of local kernel functions is stronger, and the extrapolation ability of global kernel functions is stronger.The selection of kernel functions significantly affects the generalization ability of support vector regression.Using only one kernel function will often have limitations.Based on the characteristics of the original kernel function, linear fusion of a local kernel function and a global kernel function constitutes a new kernel function, the mixed kernel function, and the kernel function learned from the advantages of local and global kernel functions that can accurately reflect the actual characteristics of a sample.Hybrid kernel function is introduced to determine the local kernel and the global kernel function fusion coefficient.An appropriate fusion coefficient can better exert the advantages of hybrid kernel function.At present, combined weight values are often determined by experience [20,21].Reference [20] refers to the removing of pulmonary nodules which indicates that the initial selection of mixed kernel function coefficients can be based on Gray features, morphological features, and texture features.And the support vector regression based on the constructed hybrid kernel function cannot ensure the best performance.
Because of this, this paper uses the mixed kernel function as the kernel function of the SVM and the 5th-degree CKF as the basic framework, adaptively adjusting the fusion coefficient, kernel parameters, and regression parameters of the mixed kernel function.The remainder of this article is as follows: the first part is the description of the problem; the second part is a review of the typical kernel function; the third part is the parameter selection method of support vector regression based on the adaptive fusion of the mixed kernel function; the fourth part is the analysis of the algorithm; the fifth part is a simulation example; and the sixth part is the summary.

Problem Description
2.1.Support Vector Regression.The ultimate goal of the support vector regression is to find a regression function :   → : where () is a function that can map data  from low dimension to high dimensional feature space,  is a weight vector, and  is a numeric value that can be up or down.
where   ,  *  is the relaxation factor.When there is an error in fitting,   ,  *  are greater than 0. If not,   ,  *  are all equal to 0. The first term of the optimization function further smooths the fitting function to improve generalization.The second item is to reduce the error; when constant  > 0, it indicates the extent of the penalty for a sample out of error .
The performance of support vector regression is affected by the error penalty parameter , which is the degree of punishment that is used to process the mistakenly divided sample. is a tradeoff between the algorithm complexity and degree of mistakenly classified samples.When the value of  is small, it means that the punishment for the empirical error of the original data is small.Machine learning complexity is small, but the experience risk is high.When the value of  is larger, the empirical error penalty is larger, and the experience risk is small.However, this can lead to high computational complexity and poor generalization ability.Therefore, it is very important to choose the appropriate punishment coefficient  for practical problems.
The structure of the support vector regression is shown in Figure 1.Another key factor that affects the performance of the support vector regression is the kernel function and its parameters.The core of the support vector regression algorithm is the introduction of kernel function.The kernel High performance of support vector regression is difficult to obtain with a single kernel function.The characteristics of the actual sample are complicated and changeable and cannot be completely characterized by the local kernel function or the global kernel function.The mixed kernel function combines the global kernel function and the local kernel function according to a ratio that can accurately reflect the characteristics of the actual sample based on the local and global kernel functions.Therefore, the mixed kernel function has good learning ability and good generalization ability.

5th-Degree CKF Principle. Different nonlinear filters
have different performance characteristics.The estimation performance of a nonlinear filter is dependent on the specific nonlinear filter type.5th-degree cubature Kalman filtering algorithm can obtain high accuracy and high stability with low computational cost; thus, it is selected to adaptively adjust the fusion coefficient, kernel parameters, and regression parameters.The general nonlinear system is given as follows: where,   is -dimensional state vector;   is -dimensional observation vector.f and h are known nonlinear functions.Both {  } and {V  } are independent zero mean Gaussian white noise.
Similar to the CKF, the structure of 5th-degree CKF is also divided into two steps: state prediction (time updating) and measurement update.The core of the difference is that the high-order volume Kalman filter uses the phase cubature rule and weight coefficient of high dimension to solve the problem above in introduction.High-order cubature rule satisfies where   is the th column of the unit vector matrix of dimensional space. +  and  −  are the set of points as follows: The weights  1 and  2 are where is the surface area of the unit sphere.According to the moment matching method, when  = 2, the weight is Table 1: Four types of kernel functions and their characteristics.
Kernel function Characteristics Linear kernel function: It is a special case of the kernel function.The parameters are few and the speed is fast [28].

It is a global kernel function. And it becomes a linear kernel function when 𝑞 = 1.
The greater the value of , the higher the dimension of mapping, and the greater the amount of computation.When  is too large, the complexity of the learning machine is also increased.The promotion ability of the support vector regression is reduced, and it is easy to introduce the phenomenon of overfitting [29].
Gauss kernel function (RBF kernel): RBF kernel function is a strong local kernel function, and the external pushing ability decreases with the increase of parameters.Compared with the general kernel functions, Gauss kernel function only needs to determine a parameter, and constructing the kernel function model is relatively simple.Therefore, RBF kernel function is currently the most widely used one [30].Sigmoid kernel function: where ,  are the kernel parameters and satisfy the condition  > 0,  < 0.
The theoretical basis of support vector regression determines the global optimal value of the support vector regression rather than the local minimum value.It also ensures that it will not cause an overlearning phenomenon because of good generalization ability of unknown samples [31].
This paper first chooses the mixed kernel function as the kernel function of the support vector regression.The problem with constructing the kernel function is to select the fusion coefficients of the local kernel function and the global kernel function, the kernel parameters, and the penalty parameter  of the mixed kernel function.Then, the fusion coefficients are imbedded into the super kernel parameters as the state vector so that the construction of the kernel function and the selection of the parameters of the kernel function can be transformed into a nonlinear filtering problem that can be solved by the 5th-degree CKF.Finally, the adaptive adjustment of the fusion coefficient as well as the estimation of the kernel parameters and the penalty parameters is determined.

Review of Classic Kernel Functions
The key to support vector regression is the introduction of kernel function.When the data set is in a low dimensional space, it is usually difficult to separate; when the data set is mapped to a high dimensional space, the formation of new data sets is more easily separated, but the computational effort for this method is huge.The introduction of kernel function reduces computation in the high dimensional feature space directly after the transformation that avoids the "curse of dimensionality" problem.The kernel function is denoted as (  ,   ), where   ,   are the sample data.Four types of kernel functions are widely used in the research and application of support vector regression [21], as shown in Table 1.
Different kernel functions are selected to form different support vector regressions.The linear kernel function, the polynomial kernel function, and the Gauss kernel function have been widely used.The most widely used one is the RBF kernel function with good learning ability.No matter what the conditions are, low dimensional, high dimensional, small samples, and large samples, RBF kernel functions are applicable.RBF kernel function has a wide convergent region, and it is an ideal classification basis function.Sigmoid kernel function from neural networks in practical application is limited.Only under specific conditions (parameters  and  satisfy certain conditions) can the sigmoid kernel function meet the conditions of symmetric and positive definite kernel function.Sigmoid kernel function is proven to have good global classification performance in the application of neural networks, but the classification performance of the application in SVM needs further research [23].
Kernel function skillfully solves the low dimensional vectors that are mapped into a high dimensional curse of dimensionality problem and improves machine learning nonlinear processing ability.However, each kind of kernel function has its own characteristics based on different kernel functions of support vector regression with different generalization abilities.At present, the kernel function is divided into two categories: global kernel function and local kernel function.The local kernel function is effective in extracting the local character of the sample.The value of kernel function is affected by the data points at a very close distance, and the interpolation ability is stronger.Therefore, learning ability is strong.The Gauss kernel function is a local kernel function.Global kernel function is effective in extracting the global characteristics of the samples.Kernel function is only affected by the distance of data points of the value, so generalization ability is strong [24].Compared with the local kernel function, the global kernel function is weak.The linear kernel function, the polynomial kernel function, and the sigmoid kernel function all are global kernel functions.In short, the learning ability of a local kernel function is strong, and its generalization ability is weak.The generalization ability of a global kernel function is strong, but its learning ability is weak.

Parameter Selection Method for Support
Vector Regression Based on Adaption Fusion of Mixed Kernel Function where 1 and 2 are the weights of the two kernel functions in the mixed kernel function and 0 ≤ 1, 2 ≤ 1, 1 + 2 = 1.
The mixed kernel function is still a Mercer kernel.
It has already been proven that any function can be chosen as a kernel function as long as it satisfies the Mercer condition.Therefore, mixed kernel function ( 8) can be chosen as a kernel function since it satisfies the Mercer condition.The mixed kernel function is the convex combination of the local and global kernel functions.The introduction of the mixed kernel function eliminates the deficiencies in using a single global or local kernel function.
When 1 = 0 or 2 = 0, the mixed kernel function becomes a single kernel function.The model selection of single kernel function based support vector regression only concerns the selection of the internal parameters of the kernel function.However, the model selection of mixed kernel function based support vector regression concerns not only the selection of the internal parameters of both the local and global kernel functions but also the fusion coefficients of the two kernel functions to make sure that the performance of the mixed kernel function based support vector regression is best.Before we train the support vector regression, we need to determine the weighed fusion coefficients.The coefficients 1 and 2 in (6) are usually determined by past experiences.Since the mixed kernel function does not describe the properties of the training samples very well, regression forecast performance will be degraded.Currently, there is no analytical method for the selection of fusion coefficients.The fusion coefficients are usually selected according to experience and it is difficult to estimate the coefficients on line.

Mixed Kernel Function Based Support Vector Regression.
Differing from the support vector regression based on a single kernel function, the support vector regression based on a fused kernel function utilizes a fused kernel function containing both local and global kernel functions; that is, () in ( 2) is a high dimensional feature space function composed of the mixed kernel function.To solve the convex quadratic optimization problem defined by (2), we introduce the Lagrange multipliers   ,  *  .Then, the optimization problem can be transformed into a dual problem as follows [25]: By solving the dual problem above, we can derive the solution  = ( 1 ,  * 1 ,  2 ,  * 2 , . . .,   ,  *  )  of the original optimization problem defined by (2).Replacing the inner product (  ⋅   ) in objective function (11) by the mixed kernel function  mix (  ,   ), we can construct a decision function as follows: where  is calculated in the following way.Select   or  *  in an open interval.If   is selected, then If  *  is selected, then where  ∈ {1, 2, . . ., },  1 ∪  ∪⋅ ⋅ ⋅∪  = ,  1 ∪ 2 ∪⋅ ⋅ ⋅∪  = .In each iteration randomly choose a group of data   as the prediction and the remaining  − 1 groups as the training database.Given the initial parameter  0 , we use LIBSVM [26] to train the support vector regression.Suppose the training result is α and b.Then, the decision function is where α = (α 1 , α * 1 , α2 , α * 2 , . . ., α , α *  )  .Substitute   into ( 16) and we can derive the prediction output of   as follows: Choose   ,  ∈ {1, 2, . . ., } as the prediction group and the remaining groups  1 , . . .,  −1 ,  +1 , . . .,   as the training groups.After -fold cross validation regression prediction, all data in sample data set  has one and only one prediction output.Therefore, for parameter vector , we can define a prediction output function as follows:

The Establishment of Parameter Filter Model.
In this paper, the kernel function weighted fusion coefficients 1, 2, the local kernel function parameters, the parameters of the global kernel function, and the penalty parameter C are combined together as the support vector regression parameters denoted as .For convenience, let  1 ,  2 be the kernel parameters of local kernel function and the kernel parameters of the global kernel function, respectively.The selection of the whole parameter can be used as a filter estimation problem for a nonlinear dynamic system.The establishment of parameters of nonlinear system is as follows: where () is an -dimensional state vector parameter; () is the output observation.Process noise () and observation noise V() are both Gaussian white noise sequences with zero mean and known variances  and .
Because solving the optimal kernel parameters can be considered a fixed constant for a specific practical object, we can establish the linear state equation with respect to the parameters described in formula (19).For any state vector, all primitive data has a predictive output after being trained and predicted by LIBSVM, so a nonlinear observation equation can be established as formula (20).For operation of the 5thdegree cubature Kalman filtering algorithm, artificial process noise and observation noise need to be added to the system model.

Parameter Selection Method for Support Vector Regression Based on Adaptive Fusion of Mixed Kernel Function. In
The parameter selection method for support vector regression based on adaptive fusion of mixed kernel function Initialization: (1) For original data set , select the mixed kernel function, and set the initial parameter state value  0 .

End while End
Algorithm 1: The detailed algorithm steps of the parameter selection method for support vector regression based on adaptive fusion of mixed kernel function.
this section, the method for selecting model parameters of support vector regression and the specific steps of the proposed algorithm are described.The design of the parameter adjustment system is shown in Figure 2.
First, the k-fold cross validation method is used to divide the original data set into  groups.Select the local kernel function and the global kernel function to determine the mixed kernel function.Train this data set with  sub-LIBSVM based on the mixed kernel function.Then, the predictive output is input to the 5th-degree Kalman filter.All parameters of the model are used as the state vector of the system; thus the selection of model parameters is a nonlinear state estimation problem.
In parameter system model ( 19) and ( 20), the real value of observation vector () in each iteration is unchanged.The observation vector is the original sample data of the target value vector () = ( 1 ,  2 , . . .,   )  .We can make the optimal state estimation for the parameter state vector  according to the observation vectors of real values and predicted output values to obtain a minimum variance between the real values () and predicted output values ỹ. 5th-degree cubature Kalman filter algorithm is used to estimate .The algorithm of the parameter selection method for support vector regression based on adaptive fusion of mixed kernel function includes two processes: the time update process and the measurement update process as shown in Algorithm 1.The proposed algorithm combines the mixed kernel function fusion coefficient, the kernel parameter, and penalty parameter  as the state parameter vector then obtains the predicted output of the data set using the -fold cross validation method based on LIBSVM.Finally, calculate the optimal parameter state vector iteratively by the 5th-degree cubature filtering algorithm.The goal of the proposed algorithm in this paper is to search the optimal state vector  recursively and obtain the minimum error variance between the real target value of the sample () and the predictive output of the support vector regression ỹ.

Algorithm Analysis
The value of the kernel function is influenced by the data points that are close to each other in the algorithm of support vector regression based on local kernel function, while in the support vector regression based on local kernel function algorithm the data points that are far from each other have an effect on the value of the kernel function.Using only the global kernel function or local kernel function has some limitations in solving practical problems.It often cannot accurately describe the characteristics of the actual sample.Thus, it leads to poor performance of the support vector regression.The mixed kernel function contains two different types of kernel function, the local kernel function and global kernel function.This new kernel function can greatly improve the actual sample properties.The support vector regression based on the mixed kernel function has good learning ability and good generalization ability.However, choosing a mixed kernel function coefficient fusion remains a difficult problem.Prior knowledge of experts and simple cross validation operation are commonly used, but these methods cannot achieve high performance support vector regression.
In this paper, we use a combination of parameters to construct the kernel function.The parameters of the local kernel function, the global kernel function, and the penalty parameter C together form the parameters of the support vector regression.Thus, the mixed kernel function can accurately describe the actual samples according to the specific characteristics of the sample, by adjusting the weighted fusion coefficient of the local kernel function and the global kernel function.The support vector regression based on the mixed kernel function has stronger extrapolation ability due to the highly integrated radial and spherical integral method applied to 5th-degree cubature Kalman filter algorithm.This algorithm has higher parameter estimation accuracy, compared with the UKF algorithm and 3rd-degree cubature Kalman filter algorithm.Therefore, the estimated result of the parameter state vector is more accurate, and the parameters brought into the support vector regression are better, and the predictability of support vector regression is better.We can understand the proposed support vector regression parameter adjustment algorithm in another way.All the state parameter vectors can be regarded as the kernel parameters of the mixed kernel function, including the combination parameter, the parameters of the local kernel function, the parameters of the global kernel function, and the penalty parameter .The state estimation of the parameter vector is performed based on the 5th-degree cubature Kalman filter with high precision, and the optimal kernel parameter values of the support vector regression are obtained.

Simulation Example
6.1.Subjects.We selected the experimental data of rolling bearings for Electrical Engineering Laboratory of Case Western Reserve University for analysis and verification [27].The measured rolling bearing type is SKF6205.Single point faults are, respectively, arranged on the driving end, the bearing outer ring, the inner ring, and the rolling body with spark erosion technique.The fault depth is 0.2794 mm and diameter 0.1778 mm.The number of balls is 9.The rolling bearing works under four states, including normal, inner ring fault, outer ring fault, and rolling body fault.Acceleration sensors are used to measure vibration signals with the traditional sampling method of signal acquisition and sampling frequency 12 kHz.The data obtained is shown in Figure 3.

Feature Extraction.
The study on extraction is used to represent the fault state characteristic parameters of fault size with 12, as shown in Table 2.We take 50 groups of data from drive end of the vibration data directly.And each set of data has intercepted 4096 sample points.Then, calculate the characteristic parameters of each data. is  = [1, 2, , , , ]  , while the parameter vector based on the kernel function of a single RBF is  = [, ]  .In order to illustrate the effectiveness of the proposed algorithm, support vector regression algorithm for single RBF kernel function based on genetic algorithm (RBF-GA-SVR), support vector regression algorithm for single RBF kernel function based on UKF algorithm (RBF-UKF-SVR), support vector regression algorithm for single RBF kernel function based on CKF algorithm (RBF-CKF-SVR), mixed kernel function based on UKF algorithm (MKF-UKF-SVR), and support vector regression algorithm for mixed kernel function based on CKF algorithm (MKF-CKF-SVR) are compared.The prediction results for four states are shown as Figures 4-7.
For the convenience of description, we have the following simplified definitions about the filtering algorithms:   In order to show the robustness of the proposed method in front of typical noisy levels, we chose three different levels of noise to experiment with 1 = 0.1, 2 = 0.3, and 3 = 0.5, respectively.The prediction results of these algorithms are shown in Figures 8-10.Kernel parameter estimation results and the prediction error results are shown in Tables 2 and 3, respectively.
From the simulation results in Figures 4-7 and Tables 3  and 4, it can be seen that the kernel parameter of Algo5 is larger, so it has poor generalization ability and larger prediction error.Compared with the Algo5, the Algo4 algorithm has higher prediction accuracy.This is mainly due to the use of the filtering framework to estimate the kernel parameters.Because of the high accuracy of 5th cubature Kalman filter algorithm to estimate the parameters, kernel parameter  values of Algo3 are smaller, and the predictive ability is better than the Algo5 algorithm and Algo4 algorithm.But its accuracy is lower than Algo2 due to the effect of the mixed kernel.The proposed Algo1 algorithm characterizes the sample information by using local and global kernel function information.The Algo1 algorithm has stronger generalization ability, and the prediction error is the least.From the perspective of the fusion coefficient, coefficient of the local kernel function RBF is 0.902.This is mainly because the actual sample prefers to use the local kernel function, but compared with the Algo3, the predicted accuracy of Algo1 algorithm is greatly improved.This is the key role played by the global kernel function in the mixed kernel function, which makes it more complete and accurate to describe actual sample information.
From Figures 8, 9, and 10, the kernel parameter estimation error becomes larger with the increasing noise levels.It is normal that the large noise influences the accuracy of the nonlinear filter.But the estimated errors are within the allowable range.

Conclusions
The proposed algorithm parameter selection method for support vector regression based on adaptive fusion of mixed kernel function combines the mixed kernel function fusion coefficient, kernel function parameter, and regression parameters together as the parameters of the state vector and obtains the predicted output of the original data set based on LIBSVM.The fusion coefficients are adaptively adjusted by the 5th-degree Kalman filter, and the kernel function parameters and the regression parameters are selected by using the estimated parameters values.Finally, the prediction of the bearing fault diagnosis experiment shows that the kernel function and the parameters based on the method proposed in this paper can obtain stronger generalization ability of the support vector regression and higher prediction accuracy.

Figure 1 :
Figure 1: Schematic diagram of support vector regression.

Figure 2 :
Figure 2: Parameter adjustment structure of support vector regression based on adaptive fusion of the mixed kernel function.
Based on the above analysis, we fuse the local and global kernel functions so that the mixed kernel function is of both strong learning ability and generalization ability.In this section, we propose a mixed kernel function based on adaptive fusion.The adaption means that the weight of every kernel function in the mixed kernel function is estimated by a filter rather than determined according to past experiences.Denote the local and global kernel functions by   (  ,   ) and   (  ,   ), respectively.Then, the mixed kernel function can be expressed by   (  ,   ) = 1 ⋅   (  ,   ) + 2 ⋅   (  ,   ) , 4.1.Mixed Kernel Function.

Table 3 :
Results of parameter estimation.

Table 4 :
Results of sample prediction error., MAE represents mean absolute error, RMSE represents mean square error, and SD represents standard deviation. Here