This paper proposes a fuzzy weighted least squares support vector regression (FW-LSSVR) with data reduction for nonlinear system modeling based only on the measured data. The proposed method combines the advantages of data reduction with some ideas of fuzzy weighted mechanism. It not only possesses the capability of illuminating local characteristic of the modeled plant but also can deal with the problem of boundary effects resulted from local LSSVR method when the modeled data is at the boundary of whole data subset. Furthermore, in comparison of the SVR, the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor λ can be chosen in relatively smaller value than SVR to further reduce more computational time. First of all, distilling the original input space into several regions with fuzzy partition by applying Gustafson-Kessel clustering algorithm (GKCA) is a foundation for data reduction and the overlap factor is introduced to reduce the size of subsets. Following that, those subset regression models (SRMs) which can be simultaneously solved by LSSVR are integrated into an overall output of the estimated nonlinear system by fuzzy weighted. Finally, the proposed method is demonstrated by experimental analysis and compared with local LSSVR, weighted SVR, and global LSSVR methods by using the index of computational time and root-mean-square error (RMSE).
training program of high level innovative talents of Guizhou[2017]3[2017]19Guizhou Province Natural Science Foundation in ChinaKY[2016]018KY[2016]254Science and Technology Research Foundation of Hunan Province13C333Science And Technology Project of Guizhou[2017]1207[2018]1179Doctoral Fund of Zunyi Normal UniversityBS[2015]13BS[2015]041. Introduction
It is well known that, in a large number of applications such as advanced control, process simulation, fault detection, or other research areas, a significant problem is to construct mathematical model of estimated system only based on its measured data. Some major theories or methods on identifying nonlinear system have been independently developed in various research field including fuzzy system [1], neural networks [2], and other approach [3]. However, LSSVR method [4], like SVR [5], also adopts the structural risk minimization and has the better equilibrium between sparsity and modeling accuracy. Furthermore, LSSVR, by substituting a set of equality conditions for complex inequalities ones, translates complex quadratic optimization as a simple linear programming, which is greatly relieve computational load. In literature [6], the power of generalization for LSSVR is no worse than that SVR.
Therefore, the LSSVR has been attracting extensive attentions and has obtained successful application like time series prediction [7, 8], subspace identification [9, 10], signal processing [11, 12], and other applications [13, 14] during the past few years. In spite of the LSSVR approach, referred to as the global LSSVR (G-LSSVR) approach, has become an effective tool in various applications, and can identify an estimated model whose modeling accuracy is guaranteed by obtaining an appropriate mathematical model [15] and a proper hyperparameter set which usually consists of the two variables: the kernel width (σ) and the penalty factor (γ). Generally, it is an insignificant for global-LSSVR to derive a well local behavior. We discovers in literature [15, 16] that G-LSSVR has also some defective in illuminating local behavior.
Recently, local modeling approaches, as an alternative efficient algorithm, because of their superiorities to identify various areas of estimated nonlinear system, seem desirable. In the literature [17], a local fusion modeling method based on LSSVR and nerofuzzy has been proposed. It employs LSSVR and a learning method named as layering two kind of problem to construct each local model. In the literature [18], it adopts another approach which makes use of interest training point to identify local model instead of all points and by applying vector-norm distance [19] to search for K nearest points. To seek a set of optimal data points, a Euclidian distance measurement method [20, 21] is proposed and the local model is set up according to these neighboring points. From the pointview of capturing the local behavior, our aims are to construct their connection between some models and the localized support vector regression (LSVR) method [22] has been proposed. Considering the much more computational time of G-LSSVR, a local grey SVR [23] is developed to speed up the calculational time. Further, by introducing regularization, a general local and global learning framework [24] formulates multiple classifier in each data of neighbours.
Nevertheless, the local modeling approaches or local-LSSVR have more superiorities in identifying local characteristics than that approaches such as global-SVR or LSSVR; it is still unsatisfactory in modeling global capability. First, due to the different criterion to select K nearest neighbor training data in subsets, the better performance for local-LSSVR is not derived when those training data are at the boundary area. Second, because the number of constructing all local models is equal to the size of testing set, the local LSSVR approach generally leads to a heavy computation load [16]. Third, it generates boundary effects resulting from local LSSVR method when the modeled data is at the boundary of whole data subset.
Based on the above consideration, our aims present a FW-LSSVR method for nonlinear system modeling based only on the obtained measure data. The paper integrates the superiorities of GKCA, weighted average mechanism, and some ideas from LSSVR. First of all, distilling the original input space into several regions with fuzzy partition by applying GKCA is a foundation for data reduction. Following that, those subset regression models (SRMs) which can be simultaneously solved by LSSVR are integrated into an overall output of the estimated nonlinear system by fuzzy weighted. The proposed method not only possesses the capability of illuminating local characteristic of the estimated models but also can deal with the problem of boundary effects resulted from local LSSVR method. Furthermore, in comparison of the support vector regression (SVR), the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor λ is chosen in relatively smaller value than SVR to further reduce more computational time. Finally, experimental analysis demonstrates that our approach not only overcomes the disadvantages of local LSSVR, weighted SVR, and global LSSVR methods in the process of modeling nonlinear system but also has better root-mean-square error (RMSE) performance and needs less computational time.
The paper is organised as follows: brief descriptions for LSSVR and GKCA in Section 2 are firstly given, the proposed method is introduced in detail in Sections 3 and 4 shows several examples for demonstrating our approach, and Section 5 summarizes the whole paper.
2. Preliminaries2.1. Least Squares Support Vector Regression
It has been shown that the generalization performance for LSSVR presented by [25] is comparable to that of the SVR through a meticulous empirical study [6]. Next, we will concisely introduce LSSVR with the following training points, where xk∈Rn is the input pattern and yk is the corresponding target.(1)x1,y1,…,xN,yNThe LSSVR can be represented for a test input x as(2)fx=∑k=1NαkKx,xk+bBecause of adopting the Gaussian kernel width, kernel function K(x,xk) in (2) can further be rewritten as(3)fx=∑k=1Nαkexp-x-xk22σ2+bwhere support-value-vector α=[α1,α2,…,αN]T and bias b are solved by formulating the following optimization [25]:(4)minw,b,ξk2ȷ1w,b,ξ=12wTw+γ12∑k=1Nξk2s.t.yk=wTΦxk+b+ξk,k=1,2,…,N.where Φ(·) represents a feature mapping which nonlinear space is transformed to a high-dimensional linear space and parameter γ∈R+ is regularization constant which governs the relative importance between the data fitting and the smoothness of the solution. Using Lagrange multiplier method for (4) gives rise to an unconstrained optimization problem:(5)L1w,b,ξ;α=ȷ1w,b,ξ-∑k=1NαkwTΦxk+b+ξk-ykIn terms of the KKT condition, one derives(6)∂L1∂w=0→w=∑k=1NαkΦxk∂L1∂b=0→∑k=1Nαk=0∂L1∂ξk=0→αk=γξk∂L1∂αk=0→wTΦxk+b-yk+ξk=0,k=1,2,…,NConsequently, learning process of LSSVR corresponding to (5) is implemented by solving(7)01NT1NΩ+γ-1INbα=0ywhere y=[y1,y2,…,yN]T, 1N=[1,1,…,1]T, Ωij=K(xi,xj)=Φ(xi)TΦ(xj), and ∀i,j=1,2,…,N, with K(·,·) a positive-definitive Mercer kernel function meeting the Merer’s theorem.
2.2. Gustafson-Kessel Clustering Algorithm
Clustering analysis plays an important role in classification and regression problem. In order to study some important characteristics of complex system, it is crucial for researchers to decompose an original data set into several subsets which is well reflect a system’s behavior. Especially, GKCA [26] used for extracting various clustering center in different shape and direction for a larger data set [27] and is superior to conventional FCM. GKCA can be achieved by minimizing the following objective function:(8)J=∑k=1R∑i=1Nμikmxi-νkTMkxi-νk(9)∑k=1Rμik=1,i=1,2,…,Nμik is a component from fuzzy matrix U, Mk is defined by (12), R describes the number of clustering center νk, and it needs to be predefined. In a nutshell, GKCA can be boiled down to the following steps:
(1) calculating the cluster centers (10)νkl=∑i=1Nμikl-1mxi∑i=1Nμikl-1m,1≤k≤Rwhere l denotes iteration number and N is the number of all data points.
(2) computing Fk according to the definition of covariance(11)Fk=∑i=1Nμikl-1mxi-νklxi-νklT∑i=1Nμikl-1m,1≤k≤R(3) computing the distance(12)Mk=detFk1/nFk-1dikMk2=xi-νklTMkxi-νkl (4) revising the components μik of fuzzy matrix (13)μikl=1∑j=1RdikMk/djkMk2/m-1,1≤i≤N,1≤k≤RThe iteration stops when the difference between the fuzzy partition matrices U(l) and U(l-1) in the following iterations are lower than δ.
3. Fuzzy Weighted Least Squares Support Vector Regression
The paper develops a new method combining respective advantages both global and local learning method to formulate overall framework. The procedures of the proposed FW-LSSVR approach are depicted by Figure 1.
An illustration for the proposed FWA-LSSVR approach.
3.1. Constructing Fuzzy Weighted with Triangle Membership Functions
Applications of fuzzy concepts were early developed by Zadeh [28]. A triangular fuzzy number A~ can be parametrized by a triplet (aL,aM,aR), where aL and aR denote the left and right bounds, respectively, and aM represents the mode of A~. The membership function of the triangular fuzzy number A~ is defined by(14)μA~u=0,u<aLu-aLaM-aL,aL≤u≤aMaR-uaR-aM,aM≤u≤aR0,u≥aRThe α-cut Aα of the fuzzy set A~ in the universe of discourse U is defined by(15)Aα=ui∣μA~ui≥α,ui∈Uwhere α∈[0,1].
In generally, fuzzy partition is implemented by some clustering methods and GKCA [26] is common used to decompose the original data set. It discovers that GKCA is superior to that of FCM(fuzzy c-means) and subtractive clustering. GKCA extended the standard FCM algorithms by adopting a flexible distance measure that is calculated using covariance matrices as exhibited in (11). Meanwhile, various difformities and orientation in original data set are detected by GKCA.
In this paper, GKCA is used, in which it is based on the minimization of (8). As stated in Section 2.2, the iteration is to be stopped when the termination criterion is satisfied, namely, U(l)-U(l-1)<δ, and an appropriate fuzzy membership matrices is obtained finally. Following that, the cluster centers νkj and spread widthes ρkj are calculated, respectively, as [29](16)νkj=∑i=1Nxijuki∑i=1Nuki(17)ρkj=∑i=1Nukixij-νkj2∑i=1Nuki(18)i=1,2,…,N;k=1,2,…,R;j=1,2,…,nwhere N is the number of training data, R is the number of clusters, uik is the degree of membership of xi in the cluster k, xi is the ith training data, n is a feature dimensionality, and ∥∗∥ measures the distance between two vectors.
From (16) and (17), the weighted values can be calculated by applying triangle membership functions. In order to derive the weighted values, triangle membership function μk(xij) is constructed as follows according to (14): (19)μkxij=0xij≤νkj-λ·σkjxij-νkj-λ·σkjνkj-νkj-λ·σkjνkj-λ·σkj<xij<νkjνkj+λ·σkj-xijνkj+λ·σkj-νkjνkj<xij<νkj+λ·σkj0xij≥νkj+λ·σkj1xij=νkj Instead of (15), α-cut Aα of the fuzzy set A~, the overlap factor λ is introduced into triangle membership functions to more readily dominate the size of original data subsets, and the degree of fulfilment βk(xi) is calculated in terms of(20)βkxi=μkxi1·μkxi2⋯μkxin=∏j=1nμkxiji=1,2,…,N;j=1,2,…,nk=1,2,…,RBy the normalized firing level of the kth fuzzy sets, weighted values ωk(xi) is finally calculated as(21)ωkx=βkx∑k=1Rβkx
In some applications [30], Gaussian membership function is adopted(22)μkxij=exp-θxij-ιkj2i=1,2,…,N;k=1,2,…,R;j=1,2,…,nwhere θ=4/τ2 and τ≥0 describing the width of the Gaussian fuzzy function, which is usually chosen as a interval [0.3,0.5].
3.2. FW-LSSVR with Data Reduction
Takagi-Sugeno fuzzy models [31] have recently become a powerful practical instrument in identifying the complex system. Based on the fuzzy partition, nonlinear description of estimated system can well be expanded into several simple linear descriptions by applying rules of if-then(23)Ri:Ifxi1isμi1xi1and⋯andxinisμinxinthenyi=aiTx+bii=1,2,…,RHere μi1, μi2, …, μin are the fuzzy membership function of xn, yn is corresponding output, and a and b are defined as consequent parameter.
There are the fuzzy sets assigned to corresponding input variables, variable yi represents the value of the ith rule output, and ai and bi are parameters of the consequent function.
For the input x, fuzzy weighted output f^ can be summarized as(24)f^x=∑i=1RωixaiTx+biwhere ωi(x) represents the normalized firing strength of the ith rule for the kth sample and is computed by (20) and (21).
Next, substituting linearizing around a point, the proposed method make use of the subset regression models (SRMs) which are simultaneously solved by LSSVR in each fuzzy partition area. Firstly, the original input data set is divided into several subsets with fuzzy partition. In each region, SRM is independently trained by LSSVR. Based on the obtained centers νkj and the spread width ρkj from (16) and (17), the old data set is once again decomposed into a new one △k by introducing the overlap factor λ to reduce the size of original subsets. We can perform the partition by the following pseudo code:(25)fork=1:Rfori=1:Nifνk1-λ·ρk1≤xi1≤νk1+λ·ρk1&&⋯&&νkn-λ·ρkn≤xin≤νkn+λ·ρknxij,yi∈△kelsexij,yi∉△kendendend
where the overlap factor λ is introduced to reduce the size of subsets and obtained a new training set with data reduction.
Then, the obtained new training subsets will be used to construct each SRMk by (3) as follows:(26)SRMkx=fxx∈△k=∑i=1mkαkiexp-x-xi22σ2+bkk=1,2,…,Rwhere SRMk is termed as the kth subset regression model, the parameters αki and bk are derived by LSSVR approach, and mk describes the size of new subset △k. Following that, the weighted values ωi(x) computed by (21) are combined with the SRMk to form the global predicted output as follows:(27)y^xi=∑k=1RωkxiSRMkxiIt is clear from (27) that each SRMk is solved by LSSVR and can be completed simultaneously. As a result, it can largely improve computational efficiency of the proposed method. In brief, the proposed approach can be summarized as follows.
Step 1.
Define the overlap factor λ and select the size of clustering subsets R where R is generally selected to 2.
Step 2.
Obtain the appropriate fuzzy partition matrices U by applying GKCA until the termination criterion U(l)-U(l-1)<δ is satisfied, namely, U=U(l) finally.
Step 3.
Compute the cluster centers νkj and the spread width ρkj using (16) and (17) from the obtained matrices U and the training data X.
Step 4.
Determine new training subsets △k by (25) based on the overlap factor λ, the cluster centers, and the spread width.
Step 5.
Set two hyperparameters {σ and γ} in the LSSVR.
Step 6.
Construct each subset regression model SRMk(x) by the LSSVR approach and (26) is thus obtained.
Step 7.
Compute the global predicted output according to (26) and (27) finally.
4. Experimental Studies
For the purpose of illustrating our approach, both RMSE (root mean squares error) and computational time are considered by four simulated data experiments. All numerical experiments are carried out on the personal computer with a 2.50GHz Intel(R) Celeron(R) CPU and 2 Gbytes memory. This computer runs on Windows XP, with MATLAB R2012a and VC++ 6.0 compiler installed. The LSSVR from Matlab Toolbox was used (this toolbox has been obtained via the Internet at http://www.esat.kuleuven.be/sista/lssvmlab/.).
We evaluate the performance of the proposed approach on four benchmark data sets [15]. The index adopted for measuring modeling accuracy is selected as(28)RMSE=1N∑k=1Nyk-y^k2Another index is the total computational time for constructing the proposed method and the local running time for constructing those SRMs. The two indexes of the proposed method are compared with G-LSSVR, local-LSSVR, and [15]. In addition, the importance of the selected different overlap factor λ is also compared. To obtain a fair comparison, their hyperparameters are set as the same values. The local-LSSVR is shortly introduced in the following. Let the training data X={xi,yi∣i=1,2,…,N} be obtained by experiment or a real system and xt be generated from testing data set and devoted to the test input of predicted output. In the closest regions of xt, there are P training inputs to be selected by applying the norm-distance approach. As a result, local-LSSVR models corresponding to all testing output are derived by P training inputs in those regions.
Example 1.
The approximated function is(29)yx=10·exp-x·cos2·π·x,x∈0,5
In this function, 501 training points and 1001 testing points are obtained from (29). Due to the use of the proposed approach, there are only two hyperparameters (i.e., σ and γ) to be chosen, whereas SVR approach needs to choose three hyperparameters (i.e., σ, γ, and ε). For comparison, Figure 2 shows the results of the WFA-LSSVR and G-LSSVR method. The two indexes both RMSE and computational time including local-LSSVR, G-LSSVR, [15] and our approach, are summarized in Tables 1 and 2, respectively. Additionally, the importance of selected different overlap factor λ is also compared by Table 3.
Comparison results of the proposed method, [15], G-LSSVR and Local-LSSVR with the hyper-parameter set {1.5,1000}, and the overlap factor λ as 1.5 are shown for Example 1.
R: the number of the SRMs
RMSE (Training)
RMSE (Testing)
our approach
[15]
our approach
[15]
3-SRMs
0.2172
0.8184
0.2151
0.8186
5-SRMS
0.0368
0.5446
0.0363
0.5447
7-SRMs
0.0146
0.3554
0.0140
0.3556
9-SRMs
0.0079
0.3389
0.0076
0.3391
12-SRMS
0.0042
0.1957
0.0040
0.1961
15-SRMs
0.0026
0.1471
0.0026
0.1472
G-LSSVR
1.7295
1.7298
Local-LSSVR
M=21 training points
3.7635
3.7630
M=41 training points
1.1742
1.1622
M=61 training points
0.0054
0.0050
M=81 training points
0.0028
0.0026
Computational time of the proposed method, G-LSSVR and Local-LSSVR with the hyperparameter set {1.5,1000}, and the overlap factor λ as 1.5 are shown for Example 1, where the T-T represents total computational time for building the overall process of the proposed method and L-T represents the computational time for constructing all SRMs.
R:the number of SRMs
3
5
7
9
12
15
Proposed approach
L-T(Second)
0.1406
01563
0.1875
0.2344
0.2813
0.2656
T-T(Second)
0.6406
0.8750
1.2813
1.2969
1.5469
1.7031
G-LSSVR
T-T(Second): 0.2500
L-T(Second): —
M: Training points
21
41
61
81
–
–
Local-LSSVR
L-T(Second)
–
–
–
–
–
–
T-T(Second)
16.2500
17.8125
17.3125
18.0625
–
–
Comparison results of the selected different overlap factor λ for our approach are shown for Example 1, where R represents the number of SRMs, L-T the computational time for constructing all SRMs, and △i the number of data points for each training subset.
The total training data points △=501
RMSE
RMSE
L-T
R=5
△1
△2
△3
△4
△5
△6
△7
(Training)
(Testing)
(second)
λ=1.5
128
159
160
160
128
–
–
0.0146
0.0142
0.1875
λ=2.5
268
183
266
183
267
–
–
0.2086
0.2081
0.3906
R=7
△1
△2
△3
△4
△5
△6
△7
λ=1.5
101
134
134
133
101
133
133
0.0132
0.0129
0.2156
λ=2.5
147
215
222
222
147
215
223
0.1875
0.1869
0.5103
Comparison of the proposed model outputs, Global-LSSVR outputs, and testing output data for Example 1.
If we take these tables into account, it discovers that our approach obtains a better nonlinear function approximation comparing to RMSE in Table 1 for G-LSSVR, local-LSSVR and [15]. In addition, the running time of the proposed methods in Table 2 is approximately 10-times shorter comparing with local-LSSVR method at least. In other words, the proposed method leads to a less computational time than local-LSSVR. As shown in Table 2, local-LSSVR needs more computational time. The main reason is that the number of required local models is too large and is equal to the size of all testing set. From Table 3 we also see that the comparison results on RMSE and computational time corresponding to the overlap factor λ as 1.5 performed better than as 2.5. That is to say, under the circumstances to cover the training data, larger λ does not necessarily lead to a better performance. These results confirm the superiority of our proposed method over other methods.
Example 2.
The approximated function with two variables was(30)fx1,x2=sinx12+x22(x12+x22).In this function, x1 and x2 equally sampling on interval [-5,5] are used as training inputs. The number of the training data obtained is 1681 (i.e., 41×41). The number of the used test data is 6561 (i.e., 81 × 81). For comparison, the same indexes in Example 1 are used including our approach, local-LSSVR, and Global-LSSVR and [15] is summarized in Tables 4 and 5. Additionally, the importance of selected different overlap factor λ is also compared by Table 6. These results show that the proposed method (WFA-LSSVR) outperforms G-LSSVR, local-LSSVR, and [15].
Comparison results of the proposed method, [15], G-LSSVR and Local-LSSVR with the hyper-parameter set {3,15}, and the overlap factor λ as 1.5 are shown for Example 2.
R: the number of the SRMs
RMSE (Training)
RMSE (Testing)
our approach
[15]
our approach
[15]
4-SRMs
0.0179
0.0325
0.0180
0.0327
6-SRMS
0.0180
0.0323
0.0180
0.0325
8-SRMs
0.0177
0.0316
0.0178
0.0317
10-SRMs
0.0175
0.0305
0.0175
0.0306
G-LSSVR
0.0289
0.0284
Local-LSSVR
M=49 training points
0.1210
0.1288
M=81 training points
0.0477
0.0558
M=121 training points
0.0249
0.0282
M=169 training points
0.0160
0.0190
Computational time of the proposed method, G-LSSVR and Local-LSSVR with the hyperparameter set {3,15}, and the overlap factor λ as 1.5 are shown for Example 2, where the T-T represents total computational time for building the overall process of the proposed method and L-T represents the computational time for constructing all SRMs.
R:the number of SRMs
4
6
8
10
Proposed approach
L-T(Second)
0.4688
0.7031
0.7969
1.0156
T-T(Second)
3.0469
4.8125
5.1875
6.5781
G-LSSVR
T-T(Second): 3.2969
L-T (Second): —
M: Training points
49
81
121
169
Local-LSSVR
L-T(Second)
–
–
–
–
T-T(Second)
107.3438
115.6563
127.4063
155.1563
Comparison results of the selected different overlap factor λ for our approach are shown for Example 2, where R represents the number of SRMs, L-T the computational time for constructing all SRMs, and △i the number of data points for each training subset.
The total training data points △=1681
RMSE
RMSE
L-T
R=6
△1
△2
△3
△4
△5
△6
△7
△8
(Training)
(Testing)
(second)
λ=1.5
880
880
880
920
920
1000
–
–
0.0180
0.0180
0.7031
λ=3.0
1320
1320
1360
1600
1360
1600
–
–
0.0189
0.0189
1.0781
R=7
△1
△2
△3
△4
△5
△6
△7
△8
λ=1.5
760
840
760
880
840
760
760
880
0.0177
0.0178
0.7969
λ=3.0
1280
1600
1280
1280
1280
1360
1360
1600
0.0179
0.0179
1.4375
Furthermore, it demonstrates that the predicted outputs of local modeling approach base on LSSVR lead to the problem of boundary effects under the different number of M=49, 81, 121, and 169, as shown in Figure 3. Figure 4 gives the estimated value of the proposed FW-LSSVR approach with 4, 6, 8 and 10 SRMs. From Tables 4 and 5 we can also see that only slightly worse results (training RMSE) were obtained by our approach than the local-LSSVR method with M=169 training data points, but running time of the local-LSSVR is significantly longer, in which has no less than 107.3438 seconds in the experiment. From Table 6, the comparison results on RMSE and computational time corresponding to the overlap factor λ as 1.5 performed better than as 3.0. That is to say, under the circumstances to cover the training data, larger λ does not necessarily lead to a better performance and conversely a large number of training data points and more computational time are required to construct all SRMs. Therefore, compared with other methods, our method achieved better testing RMSE with less computational time and also had relatively good generalization ability.
The predicted output corresponding to the local-LSSVR technique, where M=49, 81, 121, and 169.
Outputs of the proposed FWA-LSSVR approach with R=4, 6, 8, and 10 SRMs are shown, respectively.
Example 3.
In this example, the following nonlinear dynamic system was used:(31)yk+1=ykyk-1yk+2.51+y2k+y2k-1+sin2πk50+vky0=0,y1=0,1≤k≤100501 data points are obtained from (31) and v(t) is a Gaussian noise with variance Var(v)=0.25 that is shown in Figure 5. The number of the used test data is 1001. In order to compare the performance of the proposed method with other approach, the results and the curves are given in Table 7 and in Figure 5, respectively. These results show that the proposed method outperforms G-LSSVR, local-LSSVR, and [15], and RMSE in Table 7 indicates the proposed method had the best generalization performance. In addition, the running time of the proposed methods in Table 8 is approximately 10 times shorter comparing with local-LSSVR method at least. In other words, the proposed method leads to a less computational time than local-LSSVR. As shown in Table 8, local-LSSVR needs more computational time. Additionally, the importance of selected different overlap factor λ is also compared by Table 9.
Comparison results of the proposed method, [15], G-LSSVR and Local-LSSVR with the hyper-parameter set {1,1000}, and the overlap factor λ as 1.5 are shown for Example 3.
R: the number of the SRMs
RMSE (Training)
RMSE (Testing)
our approach
[15]
our approach
[15]
3-SRMs
0.2523
0.2715
0.2546
0.2832
5-SRMS
0.2531
0.2659
0.2661
0.2762
7-SRMs
0.2287
0.2487
0.2537
0.2645
9-SRMs
0.2298
0.2465
0.2464
0.2583
G-LSSVR
0.2922
0.3185
Local-LSSVR
M=21 training points
1.8583
1.8721
M=41 training points
0.2332
0.3383
M=61 training points
0.2259
0.3267
M=81 training points
0.2433
0.3561
Comparison results of the proposed method, [15], G-LSSVR and Local-LSSVR with the hyper-parameter set {1,1000}, and the overlap factor λ as 1.5 are shown for Example 3.
R:the number of SRMs
3
5
7
9
Proposed approach
L-T(Second)
0.2500
0.2813
0.2656
0.3594
T-T(Second)
0.3281
1.0938
1.3438
1.5313
G-LSSVR
T-T(Second): 0.1875
L-T (Second): —
M: Training points
49
81
121
169
Local-LSSVR
L-T(Second)
–
–
–
–
T-T(Second)
16.5469
16.7813
17.9063
18.1719
Comparison results of the selected different overlap factor λ for our approach are shown for Example 3, where R represents the number of SRMs, L-T the computational time for constructing all SRMs, and △i the number of data points for each training subset.
The total training data points △=501
RMSE
RMSE
L-T
R=5
△1
△2
△3
△4
△5
△6
△7
(Training)
(Testing)
(second)
λ=1.1
106
106
118
118
117
–
–
0.2531
0.0142
0.2813
λ=2.5
268
183
266
183
267
–
–
0.2756
0.2881
0.3438
R=7
△1
△2
△3
△4
△5
△6
△7
λ=1.1
83
99
98
97
98
98
83
0.2287
0.2537
0.2656
λ=2.5
222
147
147
215
222
221
215
0.2818
0.2894
0.4063
Comparison of the proposed model outputs and original output data for Example 3.
Example 4.
In this section, two hundred and ninety-six simulated data generated from a real Box-Jenkins [32] system are applied to the proposed method. These data points consisted of the gas flow rate signal x(k) and the concentration of CO2 which is described as the output of y(k). Figure 6 shows the training data that include the input signal x(k) and the output signal y(k). To identify the model, we choose xk=[x(k),x(k-1),y(k-1),y(k-2)] as the input variables and y(k) as the output variable. In this example, 5-folds cross-validation is employed to evaluate the performance. According to the cross-validation method, the (training/testing) RMSE and the computational time of the global LSSVR (G-LSSVR) approach, the local-LSSVR (L-LSSVR) approach with M=41,61,81, and the proposed approach R=4,8,12 SRMs are summarized in Tables 10 and 11. From Tables 10 and 11, there is a little larger RMSE (training) for our technique than that of those approaches, the RMSE (testing) corresponding to our approach is smaller than that of them, and the run time for our technique is smaller than other local modeling approaches but litter bigger that global modeling techniques based on LSSVR. Figure 7 gives the comparisons between the actual output and the predicted output of our techniques. The importance of the selected different overlap factor λ is also compared in Table 12. As shown in Table 12, although the RMSE (training) corresponding to the overlap factor λ of 3.5 is less than that of 2.8, the RMSE (testing) corresponding to the overlap factor λ as 2.8 is less than that of as 3.4. That is to say, generalization performance in relatively smaller value of λ outperforms that of the large value. Additionally, in comparison of the [15], the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor λ is chosen in relatively smaller value to further reduce more computational time.
Comparison results of the proposed method, [15], G-LSSVR and Local-LSSVR with the hyper-parameter set {25,1000}, and the overlap factor λ as 2.8 are shown for Example 4.
R: the number of the SRMs
RMSE (Training)
RMSE (Testing)
our approach
[15]
our approach
[15]
4-SRMs
0.2544
0.1589
0.3304
0.5163
8-SRMS
0.2506
0.1365
0.3321
0.4861
12-SRMs
0.2491
0.1248
0.3338
0.4218
G-LSSVR
0.2480
0.3355
Local-LSSVR
M=41 training points
0.3328
0.3805
M=61 training points
0.3234
0.3666
M=81 training points
0.3237
0.3366
Computational time of the proposed method, G-LSSVR and Local-LSSVR with the hyperparameter set {25,1000}, and the overlap factor λ as 2.8 are shown for Example 4, where the T-T represents total computational time for building the overall process of the proposed method and L-T represents the computational time for constructing all SRMs.
R:the number of SRMs
4
8
12
Proposed approach
L-T(Second)
1.5938
2.1407
3.3437
T-T(Second)
2.4375
3.6875
4.8438
G-LSSVR
T-T(Second): 0.8281
L-T (Second): —
M: Training points
41
61
81
Local-LSSVR
L-T(Second)
–
–
–
T-T(Second)
25.4375
25.7813
26.3906
Comparison results of the selected different overlap factor λ for our approach are shown for Example 4, where R represents the number of SRMs, L-T the computational time for constructing all SRMs, and △i the number of data points for each training subset.
The total training data points △=296, λ=2.8
RMSE
RMSE
L-T
(Training)
(Testing)
(second)
R=4
△1
△2
△3
△4
CV1
162
197
194
160
0.2236
0.4429
0.2969
CV2
155
190
190
161
0.2801
0.2374
0.3438
CV3
196
191
163
148
0.2883
0.2004
0.3750
CV4
146
162
196
176
0.2563
0.3282
0.2656
CV5
197
162
160
194
0.2236
0.4429
0.3125
(CV1+CV2+CV3+CV4+CV5)/5
0.2544
0.3304
1.5938
The total training data points △=296, λ=3.5
RMSE
RMSE
L-T
(Training)
(Testing)
(second)
R=4
△1
△2
△3
△4
CV1
208
193
219
217
0.2151
0.4381
0.4219
CV2
215
213
197
200
0.2637
0.2728
0.3125
CV3
218
222
200
192
0.2761
0.2066
0.3906
CV4
196
183
218
205
0.2465
0.3437
0.3438
CV5
227
219
208
193
0.2151
0.4381
0.2969
(CV1+CV2+CV3+CV4+CV5)/5
0.2433
0.3399
1.7747
The simulated data set obtained from Box and Jenkins is used to validate the proposed method.
Comparison of the proposed model outputs and original output data for Box and Jenkins example.
5. Conclusion
In this paper, a fuzzy weighted least squares support vector regression (FW-LSSVR) method for nonlinear system modeling have been proposed and illustrated based on the advantages of fuzzy weighted mechanism and some ideas from LSSVR. Considering that each training subset is mutually independent, all SRMs can be constructed simultaneously and our method can largely reduce computational time. As shown in our experimental results, there have better superiorities in calculation time and modeling accuracy for our approach than those approaches such as local or global modeling method. It is noted that, run time for our technique is smaller than other local modeling approaches but litter bigger that global modeling techniques based on LSSVR. Nevertheless, modeling accuracy for our approach has a considerable improvement than other techniques. Furthermore, in comparison with SVR, the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor λ is chosen in relatively smaller value than SVR to further reduce more computational time.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Acknowledgments
The research was partially funded by the training program of high level innovative talents of Guizhou ([2017]3, [2017]19), the Guizhou Province Natural Science Foundation in China (KY[2016]018, KY[2016]254), the Science and Technology Research Foundation of Hunan Province (13C333), the Science And Technology Project of Guizhou ([2017]1207, [2018]1179), and the Doctoral Fund of Zunyi Normal University (BS[2015]13, BS[2015]04).
ZhangD.ZhouZ.JiaX.Networked fuzzy output feedback control for discrete-time Takagi-Sugeno fuzzy systems with sensor saturation and measurement noiseDe GregorioM.GiordanoM.Background estimation by weightless neural networksLiK.PengJ.-X.BaiE.-W.A two-stage algorithm for identification of nonlinear dynamic systemsSuykensJ.LukasL.Van DoorenP.De MoorB.VandewalleJ.Least squares support vector machine classifiers: a large scale algorithm in99European Conference on Circuit Theory and Design, ECCTD1999Citeseer839842BurgesC. J. C.A tutorial on support vector machines for pattern recognitionvan GestelT.SuykensJ. A. K.BaesensB.ViaeneS.VanthienenJ.DedeneG.de MoorB.VandewalleJ.Benchmarking least squares support vector machineclassifiersZhangL.ZhouW.-D.ChangP.-C.YangJ.-W.LiF.-Z.Iterated time series prediction with multiple support vector regression modelsQuanT.LiuX.LiuQ.Weighted least squares support vector machine local region method for nonlinear time series predictionFalckT.DreesenP.de BrabanterK.PelckmansK.de MoorB.SuykensJ. A. K.Least-squares support vector machines for the identification of wiener-hammerstein systemsBakoL.MercèreG.LecoeucheS.LoveraM.Recursive subspace identification of Hammerstein models based on least squares support vector machinesSunB.-Y.HuangD.-S.FangH.-T.Lidar signal denoising using least-squares support vector machineSunB.-Y.HuangD.-S.FangH.-T.YangX.-M.A novel robust regression approach of lidar signal based on modified least squares support vector machineZhengH. B.LiaoR. J.GrzybowskiS.YangL. J.Fault diagnosis of power transformers using multi-class least square support vector machines classifiers with particle swarm optimizationDe BrabanterK.De BrabanterJ.SuykensJ. A. K.De MoorB.Approximate confidence and prediction intervals for least squares support vector regressionChuangC.-C.Fuzzy weighted support vector regression with a fuzzy partitionOuyangA.PengX.LiuY.FanL.LiK.An Efficient Hybrid Algorithm Based on HS and SFLAMiranianA.AbdollahzadeM.Developing a local least-squares support vector machines-based neuro-fuzzy model for nonlinear and chaotic time series predictionClevelandW. S.DevlinS. J.Locally weighted regression: an approach to regression analysis by local fittingKecmanV.YangT.Adaptive Local Hyperplane for regression tasksProceedings of the 2009 International Joint Conference on Neural Networks, IJCNN 2009June 2009USA156615702-s2.0-70449359203OuyangA.PengX.LiuJ.SallamA.Hardware/Software Partitioning for Heterogenous MPSoC Considering Communication OverheadElattarE. E.GoulermasJ.WuQ. H.Electric load forecasting based on locally weighted support vector regressionYangH.HuangK.KingI.LyuM. R.Localized support vector regression for time series predictionJiangH.HeW.Grey relational grade in local support vector regression for financial time series predictionWangF.A general learning framework using local and global regularizationSuykensJ. A. K.VandewalleJ.Least squares support vector machine classifiersGustafsonD. E.KesselW. C.Fuzzy clustering with a fuzzy covariance matrixProceedings of the IEEE Conference on Decision and Control including the 17th Symposium on Adaptive ProcessesJanuary 1979San Diego, Calif, USA76176610.1109/CDC.1978.2680282-s2.0-0018057468LiuX.FangH.ChenZ.A novel cost function based on decomposing least-square support vector machine for Takagi-Sugeno fuzzy system identificationZadehL. A.Fuzzy setsLeskiJ. M.TSK-fuzzy modeling based on ε-insensitive learningChiuS. L.Fuzzy model identification based on cluster estimationTakagiT.SugenoM.Fuzzy identification of systems and its applications to modeling and controlBoxG. E. P.JenkinsG. M.