1. Introduction

JCSE

Journal of Control Science and Engineering

1687-5257 1687-5249

Hindawi

10.1155/2017/3614790

3614790

Research Article

Parameter Selection Method for Support Vector Regression Based on Adaptive Fusion of the Mixed Kernel Function

Wang

Hailun

¹ ²

http://orcid.org/0000-0002-9699-7006

Daxing

¹ ³ Yao

Yuan

College of Electrical and Information Engineering

Quzhou University

Quzhou 324000

China

qzu.zj.cn

Logistics Engineering College

Shanghai Maritime University

Shanghai 200000

China

shmtu.edu.cn

Department of Automation

Zhejiang University of Technology

Hangzhou 310023

China

zjut.edu.cn

2017

2112017

2017 30 06 2017 14 09 2017 08 10 2017 2112017

2017

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Support vector regression algorithm is widely used in fault diagnosis of rolling bearing. A new model parameter selection method for support vector regression based on adaptive fusion of the mixed kernel function is proposed in this paper. We choose the mixed kernel function as the kernel function of support vector regression. The mixed kernel function of the fusion coefficients, kernel function parameters, and regression parameters are combined together as the parameters of the state vector. Thus, the model selection problem is transformed into a nonlinear system state estimation problem. We use a 5th-degree cubature Kalman filter to estimate the parameters. In this way, we realize the adaptive selection of mixed kernel function weighted coefficients and the kernel parameters, the regression parameters. Compared with a single kernel function, unscented Kalman filter (UKF) support vector regression algorithms, and genetic algorithms, the decision regression function obtained by the proposed method has better generalization ability and higher prediction accuracy.

National Natural Science Foundation of China

61403229

Natural Science Foundation of Zhejiang Province

LY13F030011

1. Introduction

The core components and important mechanical structures of mechanical equipment will inevitably be subject to varying degrees of failure with the complex operating conditions and bad working environment. It may cause huge economic losses and casualties when equipment fails. Rolling bearing is widely used in rotating machinery, and its running state directly affects the accuracy, reliability, and life of the machine. Timely and accurate diagnoses for the fault of rolling bearing are helpful to improve the reliability of equipment and reduce the probability of accidents. Due to the highly nonlinear nature between the fault and the characteristic, fault diagnosis methods based on machine learning have been applied more and more widely in the field of automation in recent years [1–3]. Vapnik invented the superior performance of support vector regression method (support vector regression, SVR) [4]. The improved SVR algorithm will greatly improve the accuracy of fault diagnosis.

The key to SVR is the kernel function and parameter selection. There are several methods that have been used to optimize the kernel and select the regression parameters, such as cross validation learning [5, 6], gradient descent learning [7, 8], evolutionary learning [9, 10], and positive semidefinite programming learning [11, 12]. The model of support vector regression and the selection of kernel parameters are relatively few, and it primarily uses the grid search cross validation method and evolutionary method. But the efficiency of these methods is very low because of the exhaustive searching for optimal parameters [13]. When the number of parameters is more than two, it becomes almost impossible to operate, such as the genetic algorithm [14] and particle swarm optimization algorithm [15]. The more serious case is that the evolutionary algorithm may easily fall into local optimization; that is, it only obtains a suboptimal solution, rather than the optimal solution. Literature [16] provides another way to estimate kernel parameters by using the adjustment of multiparameters of LS-SVM as the parameter identification problem of a nonlinear dynamic system. By using the smoothness of the system model, the kernel parameters and the regression parameters are automatically adjusted by the extended Kalman filter (EKF). Reference [17] puts forward a new method based on the unscented Kalman filter support vector regression model selection method (UKF-SVR) to solve the loss function on the hyperparameters of a nondifferentiable problem. However, considering the accuracy of UKF, it is difficult to meet the needs of practical applications, and when the system dimensions are high, the performance of UKF is significantly degraded, leading to a curse of dimensionality. In 2009, Arasaratnam and Haykin proposed a cubature Kalman filter (CKF) that uses radial quadrature rules for optimization sigma points and weights, enhancing the ability to handle high dimensional nonlinear state estimated accuracy and improving stability. CKF is based on the 3rd degree of radial integration rule, and filtering accuracy is still limited [18]. Recently, [19] proposed a class of high-order spherical and radial integral cubature Kalman filters and proved that the five-order cubature Kalman filter (5th-degree CKF) can obtain high accuracy and high stability with low computational cost.

Although the above actions to resolve specific problems have improved the algorithm, they are single kernel function based on support vector regression. Kernel functions can be divided into local kernel functions and global kernel functions. The learning ability of local kernel functions is stronger, and the extrapolation ability of global kernel functions is stronger. The selection of kernel functions significantly affects the generalization ability of support vector regression. Using only one kernel function will often have limitations. Based on the characteristics of the original kernel function, linear fusion of a local kernel function and a global kernel function constitutes a new kernel function, the mixed kernel function, and the kernel function learned from the advantages of local and global kernel functions that can accurately reflect the actual characteristics of a sample. Hybrid kernel function is introduced to determine the local kernel and the global kernel function fusion coefficient. An appropriate fusion coefficient can better exert the advantages of hybrid kernel function. At present, combined weight values are often determined by experience [20, 21]. Reference [20] refers to the removing of pulmonary nodules which indicates that the initial selection of mixed kernel function coefficients can be based on Gray features, morphological features, and texture features. And the support vector regression based on the constructed hybrid kernel function cannot ensure the best performance.

Because of this, this paper uses the mixed kernel function as the kernel function of the SVM and the 5th-degree CKF as the basic framework, adaptively adjusting the fusion coefficient, kernel parameters, and regression parameters of the mixed kernel function. The remainder of this article is as follows: the first part is the description of the problem; the second part is a review of the typical kernel function; the third part is the parameter selection method of support vector regression based on the adaptive fusion of the mixed kernel function; the fourth part is the analysis of the algorithm; the fifth part is a simulation example; and the sixth part is the summary.

2. Problem Description 2.1. Support Vector Regression

The ultimate goal of the support vector regression is to find a regression function f: RD→R:(1)y=fx=wTφx+b,where φ(x) is a function that can map data x from low dimension to high dimensional feature space, w is a weight vector, and b is a numeric value that can be up or down. Standard support vector regression adopts ε-insensitive function. It is assumed that all the training data are fitted with a linear function in the accuracy of ε. The problem is translated into an objective function to optimize the objective function minimization problem as follows [22]: (2)minw∈Rn,ξ,ξ∗,b∈R 12w2+C2∑i=1lξi2+ξi∗2s.t. wTφxi+b-yi≤ε+ξi, i=1,2,…,l yi-wTφxi-b≤ε+ξi∗, i=1,2,…,l ξi,ξi∗≥0, i=1,2,…,l,where ξi,ξi∗ is the relaxation factor. When there is an error in fitting, ξi,ξi∗ are greater than 0. If not, ξi,ξi∗ are all equal to 0. The first term of the optimization function further smooths the fitting function to improve generalization. The second item is to reduce the error; when constant C>0, it indicates the extent of the penalty for a sample out of error ε.

The performance of support vector regression is affected by the error penalty parameter C, which is the degree of punishment that is used to process the mistakenly divided sample. C is a tradeoff between the algorithm complexity and degree of mistakenly classified samples. When the value of C is small, it means that the punishment for the empirical error of the original data is small. Machine learning complexity is small, but the experience risk is high. When the value of C is larger, the empirical error penalty is larger, and the experience risk is small. However, this can lead to high computational complexity and poor generalization ability. Therefore, it is very important to choose the appropriate punishment coefficient C for practical problems.

The structure of the support vector regression is shown in Figure 1. Another key factor that affects the performance of the support vector regression is the kernel function and its parameters. The core of the support vector regression algorithm is the introduction of kernel function. The kernel function has two aspects: the construction of the kernel function and the selection of the kernel function model. In fact, the appropriate choice of the model is the key to improve the performance of support vector regression. Model selection determines the kernel function that is more suitable for the data characteristics of the original sample data before training. The kernel function involves two steps: first, determine the type of kernel function, and then select the relevant parameters of the kernel function. Current research is focusing on the choice of the kernel function model. Because different samples may have different characteristics, the construction of the kernel function is more important than the choice of kernel function. Construction of a good kernel function is still a challenging problem.

Figure 1

Schematic diagram of support vector regression.

High performance of support vector regression is difficult to obtain with a single kernel function. The characteristics of the actual sample are complicated and changeable and cannot be completely characterized by the local kernel function or the global kernel function. The mixed kernel function combines the global kernel function and the local kernel function according to a ratio that can accurately reflect the characteristics of the actual sample based on the local and global kernel functions. Therefore, the mixed kernel function has good learning ability and good generalization ability.

2.2. 5th-Degree CKF Principle

Different nonlinear filters have different performance characteristics. The estimation performance of a nonlinear filter is dependent on the specific nonlinear filter type. 5th-degree cubature Kalman filtering algorithm can obtain high accuracy and high stability with low computational cost; thus, it is selected to adaptively adjust the fusion coefficient, kernel parameters, and regression parameters. The general nonlinear system is given as follows:(3)xk=fxk-1+wkzk=hxk+vk,where, xk is n-dimensional state vector; zk is m-dimensional observation vector. f and h are known nonlinear functions. Both wk and vk are independent zero mean Gaussian white noise.

Similar to the CKF, the structure of 5th-degree CKF is also divided into two steps: state prediction (time updating) and measurement update. The core of the difference is that the high-order volume Kalman filter uses the phase cubature rule and weight coefficient of high dimension to solve the problem above in introduction. High-order cubature rule satisfies(4)IUn=w-s1∑j=1nn-1gssj++gs-sj++gssj-+gs-sj-+w-s2∑j=1ngsej+gs-ej,where ej is the jth column of the unit vector matrix of n-dimensional space. sj+ and sj- are the set of points as follows:(5)sj+=12ek+el:k<l, k<l=1,2,…,nsj-=12ek-el:k<l, k<l=1,2,…,n.The weights w-s1 and w-s2 are(6)w-s1=Annn+2w-s2=4-nAn2nn+2,where Γz=∫0∞exp⁡-λλz-1dλ,

A n = 2 π n / Γ ( n / 2 ) is the surface area of the unit sphere. According to the moment matching method, when n=2, the weight is(7)w1=Γn/2n+2w2=nΓn/22n+2.

This paper first chooses the mixed kernel function as the kernel function of the support vector regression. The problem with constructing the kernel function is to select the fusion coefficients of the local kernel function and the global kernel function, the kernel parameters, and the penalty parameter C of the mixed kernel function. Then, the fusion coefficients are imbedded into the super kernel parameters as the state vector so that the construction of the kernel function and the selection of the parameters of the kernel function can be transformed into a nonlinear filtering problem that can be solved by the 5th-degree CKF. Finally, the adaptive adjustment of the fusion coefficient as well as the estimation of the kernel parameters and the penalty parameters is determined.

3. Review of Classic Kernel Functions

The key to support vector regression is the introduction of kernel function. When the data set is in a low dimensional space, it is usually difficult to separate; when the data set is mapped to a high dimensional space, the formation of new data sets is more easily separated, but the computational effort for this method is huge. The introduction of kernel function reduces computation in the high dimensional feature space directly after the transformation that avoids the “curse of dimensionality” problem. The kernel function is denoted as Kxi, xj, where xi, xj are the sample data. Four types of kernel functions are widely used in the research and application of support vector regression [21], as shown in Table 1.

Table 1

Four types of kernel functions and their characteristics.

Kernel function	Characteristics
Linear kernel function: Kxi,xj=xi⋅xj.	It is a special case of the kernel function. The parameters are few and the speed is fast [28].

Polynomial kernel function: Kxi, xj=xi⋅xj+cq, where c and q are the kernel parameters and satisfy the condition c≥0, q∈N.	It is a global kernel function. And it becomes a linear kernel function when q=1. The greater the value of q, the higher the dimension of mapping, and the greater the amount of computation. When q is too large, the complexity of the learning machine is also increased. The promotion ability of the support vector regression is reduced, and it is easy to introduce the phenomenon of overfitting [29].

Gauss kernel function (RBF kernel): Kxi,xj=exp⁡(-xi-xj2/σ2), where σ>0.	RBF kernel function is a strong local kernel function, and the external pushing ability decreases with the increase of parameters. Compared with the general kernel functions, Gauss kernel function only needs to determine a parameter, and constructing the kernel function model is relatively simple. Therefore, RBF kernel function is currently the most widely used one [30].

Sigmoid kernel function: Kxi,xj=tanh⁡λxi⋅xj′+φ, where λ, φ are the kernel parameters and satisfy the condition λ>0, φ< 0.	The theoretical basis of support vector regression determines the global optimal value of the support vector regression rather than the local minimum value. It also ensures that it will not cause an overlearning phenomenon because of good generalization ability of unknown samples [31].

Different kernel functions are selected to form different support vector regressions. The linear kernel function, the polynomial kernel function, and the Gauss kernel function have been widely used. The most widely used one is the RBF kernel function with good learning ability. No matter what the conditions are, low dimensional, high dimensional, small samples, and large samples, RBF kernel functions are applicable. RBF kernel function has a wide convergent region, and it is an ideal classification basis function. Sigmoid kernel function from neural networks in practical application is limited. Only under specific conditions (parameters V and C satisfy certain conditions) can the sigmoid kernel function meet the conditions of symmetric and positive definite kernel function. Sigmoid kernel function is proven to have good global classification performance in the application of neural networks, but the classification performance of the application in SVM needs further research [23].

Kernel function skillfully solves the low dimensional vectors that are mapped into a high dimensional curse of dimensionality problem and improves machine learning nonlinear processing ability. However, each kind of kernel function has its own characteristics based on different kernel functions of support vector regression with different generalization abilities. At present, the kernel function is divided into two categories: global kernel function and local kernel function. The local kernel function is effective in extracting the local character of the sample. The value of kernel function is affected by the data points at a very close distance, and the interpolation ability is stronger. Therefore, learning ability is strong. The Gauss kernel function is a local kernel function. Global kernel function is effective in extracting the global characteristics of the samples. Kernel function is only affected by the distance of data points of the value, so generalization ability is strong [24]. Compared with the local kernel function, the global kernel function is weak. The linear kernel function, the polynomial kernel function, and the sigmoid kernel function all are global kernel functions. In short, the learning ability of a local kernel function is strong, and its generalization ability is weak. The generalization ability of a global kernel function is strong, but its learning ability is weak.

4. Parameter Selection Method for Support Vector Regression Based on Adaption Fusion of Mixed Kernel Function 4.1. Mixed Kernel Function

Based on the above analysis, we fuse the local and global kernel functions so that the mixed kernel function is of both strong learning ability and generalization ability. In this section, we propose a mixed kernel function based on adaptive fusion. The adaption means that the weight of every kernel function in the mixed kernel function is estimated by a filter rather than determined according to past experiences.

Theorem 1.

Denote the local and global kernel functions by Klocal(xi,xj) and Kglobal(xi,xj), respectively. Then, the mixed kernel function can be expressed by (8)Kmixxi,xj=p1·Klocalxi,xj+p2·Kglobalxi,xj,where p1 and p2 are the weights of the two kernel functions in the mixed kernel function and 0≤p1,p2≤1, p1+p2=1. The mixed kernel function is still a Mercer kernel.

Proof.

Since Klocalxi,xj and Kglobalxi,xj are local and global kernel functions, they both satisfy the Mercer condition [23]; that is, for any φx≠0 and ∫φ2(x)dx<0, (9) is satisfied. (9)∫Klocalx,x′φxφx′dxdx′>0∫Kglobalx,x′φxφx′dxdx′>0.Since 0≤p1,p2≤1, it can be derived that(10)p1·∫Klocalx,x′φxφx′dxdx′+p2·∫Kglobalx,x′·φxφx′dxdx′>0;that is, ∫Kmix(x,x′)φ(x)φ(x′)dxdx′>0.

It has already been proven that any function can be chosen as a kernel function as long as it satisfies the Mercer condition. Therefore, mixed kernel function (8) can be chosen as a kernel function since it satisfies the Mercer condition. The mixed kernel function is the convex combination of the local and global kernel functions. The introduction of the mixed kernel function eliminates the deficiencies in using a single global or local kernel function.

When p1=0 or p2=0, the mixed kernel function becomes a single kernel function. The model selection of single kernel function based support vector regression only concerns the selection of the internal parameters of the kernel function. However, the model selection of mixed kernel function based support vector regression concerns not only the selection of the internal parameters of both the local and global kernel functions but also the fusion coefficients of the two kernel functions to make sure that the performance of the mixed kernel function based support vector regression is best. Before we train the support vector regression, we need to determine the weighed fusion coefficients. The coefficients p1 and p2 in (6) are usually determined by past experiences. Since the mixed kernel function does not describe the properties of the training samples very well, regression forecast performance will be degraded. Currently, there is no analytical method for the selection of fusion coefficients. The fusion coefficients are usually selected according to experience and it is difficult to estimate the coefficients on line.

4.2. The Establishment of Parameter Filter Model 4.2.1. Mixed Kernel Function Based Support Vector Regression

Differing from the support vector regression based on a single kernel function, the support vector regression based on a fused kernel function utilizes a fused kernel function containing both local and global kernel functions; that is, ϕ(x) in (2) is a high dimensional feature space function composed of the mixed kernel function. To solve the convex quadratic optimization problem defined by (2), we introduce the Lagrange multipliers αi,αi∗. Then, the optimization problem can be transformed into a dual problem as follows [25]: (11)minw∈Rn,ξ,ξ∗,b∈R 12∑i,j=1lαi∗-αiαj∗-αjxi·xj+ε∑i=1lαi∗+αi-∑i=1lyiαi∗-αis.t. ∑i=1lαi-αi∗=0 0≤αi,αi∗≤C, i=1,2,…,l.By solving the dual problem above, we can derive the solution α-=α-1,α-1∗,α-2,α-2∗,…,α-l,α-l∗Tof the original optimization problem defined by (2). Replacing the inner product xi·xj in objective function (11) by the mixed kernel function Kmix(xi, xj), we can construct a decision function as follows:(12)fx=∑i=1lα-i∗-α-iKmixxi·x+b-.where b- is calculated in the following way. Select α-j or α-k∗ in an open interval. If α-j is selected, then(13)b-=yj-∑i=1lα-i∗-α-iKmixxi,xj+ε.If α-k∗ is selected, then(14)b-=yk-∑i=1lα-i∗-α-iKmixxi,xj-ε.

4.2.2. Predictive Output Function

Suppose the sample data set of support vector regression is D=(xi,yi)∣ i∈I, where I=1,2,…,N is the index set and yi is the objective vector of the data. Divide the sample data into k groups by the k-fold cross validation method; that is, (15)Dj=xi,yi∣i∈Ij,where j∈1,2,…,k, I1∪Ij∪⋯∪Ik=I, D1∪D2∪⋯∪Dk=D. In each iteration randomly choose a group of data Dp as the prediction and the remaining k-1 groups as the training database. Given the initial parameter γ0, we use LIBSVM [26] to train the support vector regression. Suppose the training result is α~ and b~. Then, the decision function is (16)fx=∑i=1lα~i∗-α~iKxi·x+b~,where α~=α~1,α~1∗,α~2,α~2∗,…,α~l,α~l∗T.

Substitute Dp into (16) and we can derive the prediction output of Dp as follows:(17)y~px=∑t∈Ipα~p∗-α~pKxp·x+b~.

Choose Di, i∈1,2,…,k as the prediction group and the remaining groups D1,…,Di-1,Di+1,…,Dk as the training groups. After k-fold cross validation regression prediction, all data in sample data set D has one and only one prediction output. Therefore, for parameter vector γ, we can define a prediction output function as follows:(18)y~=hγ.

4.2.3. The Establishment of Parameter Filter Model

In this paper, the kernel function weighted fusion coefficients p1, p2, the local kernel function parameters, the parameters of the global kernel function, and the penalty parameter C are combined together as the support vector regression parameters denoted as γ. For convenience, let k1, k2 be the kernel parameters of local kernel function and the kernel parameters of the global kernel function, respectively. The selection of the whole parameter can be used as a filter estimation problem for a nonlinear dynamic system. The establishment of parameters of nonlinear system is as follows:(19)γk=γk-1+wk(20)yk=hγk+vk,where γ(k) is an n-dimensional state vector parameter; yk is the output observation. Process noise wk and observation noise v(k) are both Gaussian white noise sequences with zero mean and known variances Q and R.

Because solving the optimal kernel parameters can be considered a fixed constant for a specific practical object, we can establish the linear state equation with respect to the parameters described in formula (19). For any state vector, all primitive data has a predictive output after being trained and predicted by LIBSVM, so a nonlinear observation equation can be established as formula (20). For operation of the 5th-degree cubature Kalman filtering algorithm, artificial process noise and observation noise need to be added to the system model.

4.3. Parameter Selection Method for Support Vector Regression Based on Adaptive Fusion of Mixed Kernel Function

In this section, the method for selecting model parameters of support vector regression and the specific steps of the proposed algorithm are described. The design of the parameter adjustment system is shown in Figure 2.

Figure 2

Parameter adjustment structure of support vector regression based on adaptive fusion of the mixed kernel function.

First, the k-fold cross validation method is used to divide the original data set into k groups. Select the local kernel function and the global kernel function to determine the mixed kernel function. Train this data set with k sub-LIBSVM based on the mixed kernel function. Then, the predictive output is input to the 5th-degree Kalman filter. All parameters of the model are used as the state vector of the system; thus the selection of model parameters is a nonlinear state estimation problem.

In parameter system model (19) and (20), the real value of observation vector yk in each iteration is unchanged. The observation vector is the original sample data of the target value vector y(k)=y1,y2,…,yNT. We can make the optimal state estimation for the parameter state vector γ according to the observation vectors of real values and predicted output values to obtain a minimum variance between the real values y(k) and predicted output values y~. 5th-degree cubature Kalman filter algorithm is used to estimate γ. The algorithm of the parameter selection method for support vector regression based on adaptive fusion of mixed kernel function includes two processes: the time update process and the measurement update process as shown in Algorithm 1.

<bold>Algorithm 1: </bold>The detailed algorithm steps of the parameter selection method for support vector regression based on adaptive fusion of mixed kernel function.

The parameter selection method for support

vector regression based on adaptive fusion of

mixed kernel function

Initialization:

(1) For original data set D, select the mixed

kernel function, and set the initial parameter

state value γ0.

(2) Divide D into k groups by using k fold

cross validation method denoted by D1,D2,…,Dk.

While (Parameter state value does not meet the

set conditions) do

Time update process:

(3) Calculate weights w-s1, w-s2, w1, w2 using

formulas (6)-(7).

Measurement update process:

(4) Decompose one step prediction error

covariance matrix Pk∣k-1and evaluate the

cubature point ξi according to formula (14) in

reference [19].

(5) Train the data set based on the LIBSVM

algorithm to obtain the final prediction output.

(6) Combining predict y~, compute one step

prediction by using formula (12).

(7) Use formula (8)–(14) of reference [19] to

implement the subsequent measurement update.

End while

End

The proposed algorithm combines the mixed kernel function fusion coefficient, the kernel parameter, and penalty parameter C as the state parameter vector then obtains the predicted output of the data set using the k-fold cross validation method based on LIBSVM. Finally, calculate the optimal parameter state vector iteratively by the 5th-degree cubature filtering algorithm. The goal of the proposed algorithm in this paper is to search the optimal state vector γ recursively and obtain the minimum error variance between the real target value of the sample yk and the predictive output of the support vector regression y~.

5. Algorithm Analysis

The value of the kernel function is influenced by the data points that are close to each other in the algorithm of support vector regression based on local kernel function, while in the support vector regression based on local kernel function algorithm the data points that are far from each other have an effect on the value of the kernel function. Using only the global kernel function or local kernel function has some limitations in solving practical problems. It often cannot accurately describe the characteristics of the actual sample. Thus, it leads to poor performance of the support vector regression. The mixed kernel function contains two different types of kernel function, the local kernel function and global kernel function. This new kernel function can greatly improve the actual sample properties. The support vector regression based on the mixed kernel function has good learning ability and good generalization ability. However, choosing a mixed kernel function coefficient fusion remains a difficult problem. Prior knowledge of experts and simple cross validation operation are commonly used, but these methods cannot achieve high performance support vector regression.

In this paper, we use a combination of parameters to construct the kernel function. The parameters of the local kernel function, the global kernel function, and the penalty parameter C together form the parameters of the support vector regression. Thus, the mixed kernel function can accurately describe the actual samples according to the specific characteristics of the sample, by adjusting the weighted fusion coefficient of the local kernel function and the global kernel function. The support vector regression based on the mixed kernel function has stronger extrapolation ability due to the highly integrated radial and spherical integral method applied to 5th-degree cubature Kalman filter algorithm. This algorithm has higher parameter estimation accuracy, compared with the UKF algorithm and 3rd-degree cubature Kalman filter algorithm. Therefore, the estimated result of the parameter state vector is more accurate, and the parameters brought into the support vector regression are better, and the predictability of support vector regression is better. We can understand the proposed support vector regression parameter adjustment algorithm in another way. All the state parameter vectors can be regarded as the kernel parameters of the mixed kernel function, including the combination parameter, the parameters of the local kernel function, the parameters of the global kernel function, and the penalty parameter C. The state estimation of the parameter vector is performed based on the 5th-degree cubature Kalman filter with high precision, and the optimal kernel parameter values of the support vector regression are obtained.

6. Simulation Example 6.1. Subjects

We selected the experimental data of rolling bearings for Electrical Engineering Laboratory of Case Western Reserve University for analysis and verification [27]. The measured rolling bearing type is SKF6205. Single point faults are, respectively, arranged on the driving end, the bearing outer ring, the inner ring, and the rolling body with spark erosion technique. The fault depth is 0.2794 mm and diameter 0.1778 mm. The number of balls is 9. The rolling bearing works under four states, including normal, inner ring fault, outer ring fault, and rolling body fault. Acceleration sensors are used to measure vibration signals with the traditional sampling method of signal acquisition and sampling frequency 12 kHz. The data obtained is shown in Figure 3.

Figure 3

Vibration signals for four different types of faults.

6.2. Feature Extraction

The study on extraction is used to represent the fault state characteristic parameters of fault size with 12, as shown in Table 2. We take 50 groups of data from drive end of the vibration data directly. And each set of data has intercepted 4096 sample points. Then, calculate the characteristic parameters of each data.

Table 2

Sensitive features parameters.

Quantity symbol	Characteristic index
T1	Mean
T2	Absolute average
T3	Peak
T4	Square root amplitude
T5	Root mean square value
T6	Variance
T7	Crooked
T8	Kurtosis factor
T9	Waveform factor
T10	Margin factor
T11	Peak factor
T12	Pulse factor

6.3. Algorithms Comparison

In this simulation, the local kernel function of the mixed kernel function is the RBF kernel function, and the global kernel function is the sigmoid kernel function. The parameter vector of the mixed kernel function is γ=[p1,p2,σ,λ,φ,C]T, while the parameter vector based on the kernel function of a single RBF is γ=[σ,C]T. In order to illustrate the effectiveness of the proposed algorithm, support vector regression algorithm for single RBF kernel function based on genetic algorithm (RBF-GA-SVR), support vector regression algorithm for single RBF kernel function based on UKF algorithm (RBF-UKF-SVR), support vector regression algorithm for single RBF kernel function based on CKF algorithm (RBF-CKF-SVR), mixed kernel function based on UKF algorithm (MKF-UKF-SVR), and support vector regression algorithm for mixed kernel function based on CKF algorithm (MKF-CKF-SVR) are compared. The prediction results for four states are shown as Figures 4–7.

Figure 4

Prediction result for state 1.

Figure 5

Prediction result for state 2.

Figure 6

Prediction result for state 3.

Figure 7

Prediction result for state 4.

For the convenience of description, we have the following simplified definitions about the filtering algorithms:

Algo1: MKF-5th-degree-CKF-SVR algorithm

Algo2: MKF-UKF-SVR algorithm

Algo3: RBF-5th-CKF-SVR algorithm

Algo4: RBF-UKF-SVR algorithm

Algo5: RBF-GA-SVR algorithm.

In order to show the robustness of the proposed method in front of typical noisy levels, we chose three different levels of noise to experiment with R1=0.1, R2=0.3, and R3=0.5, respectively. The prediction results of these algorithms are shown in Figures 8–10. Kernel parameter estimation results and the prediction error results are shown in Tables 2 and 3, respectively.

Table 3

Results of parameter estimation.

Algorithms	Parameter vector γ
Algo5	[99.992, 99.999]
Algo4	[1.683, 27.723]
Algo3	[0.681, 18.230]
Algo2	[0.912, 0.088, 1.501, 24.342]
Algo1	[0.902, 0.098, 0.708, 1.034, 2.127, 73.364]

Figure 8

Prediction result of Algo1 for R1=0.1.

Figure 9

Prediction result of MKF 5th-degree CKF-SVR for R2=0.3.

Figure 10

Prediction result of MKF 5th-degree CKF-SVR for R3=0.5.

From the simulation results in Figures 4–7 and Tables 3 and 4, it can be seen that the kernel parameter of Algo5 is larger, so it has poor generalization ability and larger prediction error. Compared with the Algo5, the Algo4 algorithm has higher prediction accuracy. This is mainly due to the use of the filtering framework to estimate the kernel parameters. Because of the high accuracy of 5th cubature Kalman filter algorithm to estimate the parameters, kernel parameter values of Algo3 are smaller, and the predictive ability is better than the Algo5 algorithm and Algo4 algorithm. But its accuracy is lower than Algo2 due to the effect of the mixed kernel. The proposed Algo1 algorithm characterizes the sample information by using local and global kernel function information. The Algo1 algorithm has stronger generalization ability, and the prediction error is the least. From the perspective of the fusion coefficient, coefficient of the local kernel function RBF is 0.902. This is mainly because the actual sample prefers to use the local kernel function, but compared with the Algo3, the predicted accuracy of Algo1 algorithm is greatly improved. This is the key role played by the global kernel function in the mixed kernel function, which makes it more complete and accurate to describe actual sample information.

Table 4

Results of sample prediction error.

Data	Statistical indicators	Algorithms
Data	Statistical indicators	Algo1	Algo2	Algo3	Algo4	Algo5
Train data	MAE	0.0053	0.0073	0.0094	0.0156	0.0200
	RMSE	0.0098	0.0146	0.0198	0.0298	0.0399
	SD	0.0060	0.0090	0.0110	0.0179	0.0228

Test data	MAE	0.0095	0.0128	0.0171	0.0208	0.0274
	RMSE	0.0111	0.0152	0.0195	0.0241	0.0319
	SD	0.0110	0.0150	0.0203	0.0252	0.0331

Here, MAE represents mean absolute error, RMSE represents mean square error, and SD represents standard deviation.

From Figures 8, 9, and 10, the kernel parameter estimation error becomes larger with the increasing noise levels. It is normal that the large noise influences the accuracy of the nonlinear filter. But the estimated errors are within the allowable range.

7. Conclusions

The proposed algorithm parameter selection method for support vector regression based on adaptive fusion of mixed kernel function combines the mixed kernel function fusion coefficient, kernel function parameter, and regression parameters together as the parameters of the state vector and obtains the predicted output of the original data set based on LIBSVM. The fusion coefficients are adaptively adjusted by the 5th-degree Kalman filter, and the kernel function parameters and the regression parameters are selected by using the estimated parameters values. Finally, the prediction of the bearing fault diagnosis experiment shows that the kernel function and the parameters based on the method proposed in this paper can obtain stronger generalization ability of the support vector regression and higher prediction accuracy.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61403229) and Natural Science Foundation of Zhejiang Province (LY13F030011).

Widodo

Yang

Support vector machine in machine condition monitoring and fault diagnosis

Mechanical Systems and Signal Processing 2007 21 6 2560 2574

2-s2.0-34249661124

10.1016/j.ymssp.2006.12.007

Kang

Kim

Tan

A. C.

Kim

E. Y.

Choi

Reliable fault diagnosis for low-speed bearings using individually trained support vector machines with kernel discriminative feature analysis

IEEE Transactions on Power Electronics 2015 30 5 2786 2797

10.1109/tpel.2014.2358494

Yan

Gao

R. X.

Chen

Wavelets for fault diagnosis of rotary machines: a review with applications

Signal Processing 2013 96 1 15

10.1016/j.sigpro.2013.04.015

2-s2.0-84877747329

Vapnik

V. N.

Statistical Learning Theory 1998

New York, NY, USA

Wiley- Interscience

Adaptive and Learning Systems for Signal Processing, Communications, and Control

MR1641250

Zbl0935.62007

Refaeilzadeh

Tang

Liu

Cross-validation 2009

Springer-US

Shao

M. J.

Wang

An effective semi-cross-validation model selection method for extreme learning machine with ridge regression

Neurocomputing 2015 151 2 933 942

2-s2.0-84919701228

10.1016/j.neucom.2014.10.002

Sopyła

Drozda

Stochastic gradient descent with Barzilai-Borwein update step for SVM

Information Sciences 2015 316 218 233

2-s2.0-84930080913

10.1016/j.ins.2015.03.073

Villmann

Haase

Kaden

Kernelized vector quantization in gradient-descent learning

Neurocomputing 2015 147 1 83 95

2-s2.0-84924052183

10.1016/j.neucom.2013.11.048

Froelich

Salmeron

J. L.

Evolutionary learning of fuzzy grey cognitive maps for the forecasting of multivariate, interval-valued time series

International Journal of Approximate Reasoning 2014 55 6 1319 1335

10.1016/j.ijar.2014.02.006

MR3210106

Zbl06302194

2-s2.0-84901794088

Bosch

O. J. H.

Nguyen

N. C.

Maeno

Yasui

Managing Complex Issues through Evolutionary Learning Laboratories

Systems Research and Behavioral Science 2013 30 2 116 135

2-s2.0-84875821693

10.1002/sres.2171

Rakotomamonjy

Bach

Canu

Grandvalet

More efficiency in multiple kernel learning

Proceedings of the 24th International Conference on Machine Learning, ICML 2007

June 2007

775 782

10.1145/1273496.1273594

2-s2.0-34547971778

Lee

J. R.

Raghavendra

Steurer

Lower bounds on the size of semidefinite programming relaxations

Proceedings of the 47th Annual ACM Symposium on Theory of Computing, STOC 2015

June 2015

567 576

10.1145/2746539.2746599

2-s2.0-84958760481

Hsu

C.-W.

Chang

C.-C.

Lin

C.-J.

A practical guide to support vector classification

Bioinformatics 2003 1 1

Whitley

An executable model of a simple genetic algorithm

Foundations of Genetic Algorithms 2014 1519 45 62

Kuang

Zhang

Jin

A novel SVM by combining kernel principal component analysis and improved chaotic particle swarm optimization for intrusion detection

Soft Computing 2015 19 5 1187 1199

2-s2.0-84925291073

10.1007/s00500-014-1332-7

Nandi

A. K.

Automatic tuning of L2-SVM parameters employing the extended Kalman filter

Expert Systems with Applications 2009 26 2 160 175

2-s2.0-67649129996

10.1111/j.1468-0394.2009.00469.x

Huang

D. Y.

Chen

X. Y.

A novel approach of model selection for SVR

Journal of Fuzhou University 2011 39 4 527 532, 538

MR2896018

Wang

Zhang

Parameter optimization of SVR based on DRVB-ASCKF

Proceedings of the International Conference on Estimation, Detection and Information Fusion, ICEDIF 2015

January 2015

141 145

10.1109/ICEDIF.2015.7280178

2-s2.0-84963511109

Jia

Xin

Cheng

High-degree cubature Kalman filter

Automatica 2013 49 2 510 518

10.1016/j.automatica.2012.11.014

MR3004718

2-s2.0-84872039216

Research on Multi-core Learning SVM Algorithm and Identification of Pulmonary Nodules 2014

Jilin, China

College of Communication Engineering, Jilin University

Lin

Yang

C. Z.

Face recognition technology based on support vector machine

Journal of Infrared and Laser Engineering 2001 30 5 318 322

Pour

S. G.

Girosi

Joint Prediction of Chronic Conditions Onset: Comparing Multivariate Probits with Multiclass Support Vector Machines

Proceedings of the Symposium on Conformal and Probabilistic Prediction with Applications

2016

Springer International Publishing

Hsieh

C.-J.

Dhillon

I. S.

A divide-and-conquer solver for kernel support vector machines

Proceedings of the 31st International Conference on Machine Learning, ICML 2014

June 2014

855 870

2-s2.0-84919800930

Huang

The study on Kernel in Support Vector Machine [Ph. D. dissertation] 2008

Soochow University

Cortes

Vapnik

Support-vector networks

Machine Learning 1995 20 3 273 297

2-s2.0-34249753618

10.1007/BF00994018

Zbl0831.68098

Chang

Lin

LIBSVM: a Library for support vector machines

ACM Transactions on Intelligent Systems and Technology 2011 2 3, article 27

10.1145/1961189.1961199

2-s2.0-79955702502

The experimental data of rolling bearings for Electrical Engineering Laboratory of Case Western Reserve University

2008, https://case.edu/bulletin/

Loutas

T. H.

Roulias

Georgoulas

Remaining useful life estimation in rolling bearings utilizing data-driven probabilistic E-support vectors regression

IEEE Transactions on Reliability 2013 62 4 821 832

2-s2.0-84890425282

10.1109/TR.2013.2285318

Jacobs

J. P.

Bayesian support vector regression with automatic relevance determination kernel for modeling of antenna input characteristics

Institute of Electrical and Electronics Engineers. Transactions on Antennas and Propagation 2012 60 4 2114 2118

10.1109/TAP.2012.2186252

MR2953020

Zbl1369.62138

2-s2.0-84859803506

Ghaedi

Rahimi

M. R.

Ghaedi

A. M.

Tyagi

Agarwal

Gupta

V. K.

Application of least squares support vector regression and linear multiple regression for modeling removal of methyl orange onto tin oxide nanoparticles loaded on activated carbon and activated carbon prepared from Pistacia atlantica wood

Journal of Colloid and Interface Science 2016 461 425 434

2-s2.0-84942474446

10.1016/j.jcis.2015.09.024

Benkedjouh

Medjaher

Zerhouni

Rechak

Health assessment and life prediction of cutting tools based on support vector regression

Journal of Intelligent Manufacturing 2015 26 2 213 223

10.1007/s10845-013-0774-6

2-s2.0-84876193131