An optimal weight learning machine with growth of hidden nodes and incremental learning (OWLM-GHNIL) is given by adding random hidden nodes to single hidden layer feedforward networks (SLFNs) one by one or group by group. During the growth of the networks, input weights and output weights are updated incrementally, which can implement conventional optimal weight learning machine (OWLM) efficiently. The simulation results and statistical tests also demonstrate that the OWLM-GHNIL has better generalization performance than other incremental type algorithms.
Natural Science Foundation of Zhejiang ProvinceLY18F030003Foundation of High-Level Talents in Lishui City2017RC01Department of Education of Zhejiang ProvinceY201432787National Natural Science Foundation of China613730571. Introduction
Feedforward neural networks (FNNs) have been extensively used in classification applications and regressions [1]. As a specific type of FNNs, single hidden layer feedforward networks with additive models can approximate any target continuous function [2]. Owing to excellent learning capabilities and fault tolerant abilities, SLFNs play an important role in practical applications and have been investigated extensively in both theory and application aspects [3–7].
Maybe the most popular training method for SLFNs classifiers in recent years was gradient-based back-propagation (BP) algorithms [8–12]. BP algorithms can be easily recursively implemented in real time; however, the slow convergence is the bottleneck of BP, where the fast training is essential. In [13], a kind of novel learning machine named extreme learning machine (ELM) is proposed for training SLFNs, where the learning parameters of hidden nodes, including input weights and biases, are randomly assigned and need not be tuned, while output weights can be obtained by simple generalized inverse computation. It has been proven that, even without updating the parameters of the hidden layer, SLFNs with randomly generated hidden neurons and tunable output weights maintain their universal approximation and excellent generalization performance [14]. It has been shown that ELM is faster than most state-of-the-art training algorithms for SLFNs and it has been applied widely to many practical cases such as classification, regression, clustering, recognition, and relevance ranking problems [15, 16].
Since input weights and hidden layer biases of SLFNs trained with ELM are randomly assigned, the minor changes of data in input vectors maybe cause large changes of data in hidden layer output matrix of the SLFNs. This in turn will lead to large changes of data in the output weight matrix. According to statistical learning theory [17–21], the large changes of data in the output weight matrix will greatly increase both structural and empirical risks of the SLFNs, which will in turn decrease robustness property of the SLFNs regarding the input disturbances. In fact, it has been noted from simulations that the SLFNs trained with the ELM sometimes perform poor generalization performance and robustness with regard to the input disturbances. In view of this situation, OWLM [22] was proposed; it is seen that both input weights and output weights of the SLFNs are globally optimized with the batch learning type of least squares. All feature vectors of classifier can then be placed at the prescribed positions in feature space in the sense that the separability of those nonlinearly separable patterns can be maximized, and better generalization performance can be achieved compared with conventional ELM.
However, there is still one major issue existing in OWLM, which is OWLM needing more computational cost than ELM, since the input weights are not randomly selected in SLFNs trained with OWLM. With the advent of the big data age, data sets become larger and more complex [23, 24], which reduces the learning efficiency of OWLM further. For implementing OWLM efficiently, this paper proposed an incremental learning machine referred to as optimal weight learning machine with growth of hidden nodes and incremental learning (OWLM-GHNIL). Whenever new nodes are added, the input weights and output weights could be incrementally updated which can implement the conventional OWLM algorithm efficiently. At the same time, owing to the advantages of OWLM, OWLM-GHNIL has better generalization performance than other incremental algorithms such as EM-ELM [25] (an approach that could automatically determine the number of hidden nodes in generalized single hidden layer feedforward networks) and I-ELM [14], which added random hidden nodes to SLFNs only one hidden node each time.
The rest of this paper is organized as follows: in Section 2, the OWLM is briefly described. In Section 3, we present OWLM-GHNIL in detail and analyze its computational complexity. Simulation results are then presented in Section 4, showing that our proposed approach performs more efficiently and has better generalization performance than some existing methods. In Section 5, we give conclusion.
2. Brief of the Optimal Weight Learning Machine
In this section, we briefly describe the OWLM.
For N given input pattern vectors x1(k),x2(k),…,xN(k), as well as N corresponding desired output data vectors o1(k),o2(k),…,oN(k),respectively, N linear output equations of SLFNs in Figure 1 can be obtained as(1)Hβ=O,where(2)H=x1kx2k⋯xNkTW=x1Tw1x1Tw2⋯x1TwN~x2Tw1x2Tw2⋯x2TwN~⋮⋮⋮⋮xNTw1xNTw2⋯xNTwN~=XW,with(3)X=x1k,x2k,…,xNkT,xik=xi1k,xi2k,…,xinkand input weight matrix(4)W=w1,w2,…,wN~,wi=wi1,wi2,…,wniTand output weight matrix(5)β=β1,β2,…,βm,βi=βi1,βi2,…,βiN~.
A single hidden layer neural network with linear nodes and an input tapped delay line.
Let y1,y2,…,yNbe N feature vectors, corresponding to the input data vectors x1,x2,…,xN. Then we have(6)y1y2y3⋯yN=WTx1x2x3⋯xN,or(7)Y=WTX.
Let the N reference feature vectors be described by(8)Yd=yd1yd2y3d⋯ydN.
Generally, as described in [22], the selection of the N desired feature vectors in (8) mainly depends on the characteristics of the input vectors. By optimizing the input weights of the SLFNs in Figure 1, the OWLM can place the feature vectors of the SLFNs at the “desired position” in feature space. The purpose of the assignment is to further maximize the separability of the vectors in the feature space so that the generalization performance and robustness, seen from the output layer of the SLFNs, can be greatly improved, compared with the SLFNs trained with ELM.
The design of the optimal input weight of the SLFNs can be formulated by the following optimization problem:(9)Minimize12εf2+λ12W2subject toεf=Yd-Y=Yd-WTX,where λ1 is a positive real regularization parameter.
The optimal input weight matrix was derived as follows:(10)W=λ1I+XXT-1XYdT.
Similarly, to minimize the error between the desired output pattern T and the actual output pattern O, the design of the optimal output weight of the SLFNs can be formulated by the following optimization problem:(11)Minimize12ε2+λ22β2subject toε=O-T=Hβ-T,where λ2 is a positive real regularization parameter.
The optimal output weight matrix was derived as follows:(12)β=λ2I+HTH-1HTT.
The optimal weight learning machine [22] can be summarized as follows.
Algorithm OWLM. Given a training set xi,tii=1N, as well as hidden node number N~, do the following steps.
Calculate the hidden layer output matrix H by (2).
Step 3.
Calculate the input weight matrix Wnew by (10).
Step 4.
Recalculate the hidden layer output matrix Hnew by Wnew.
Step 5.
Calculate the output weight matrix β by (12).
Obviously, the OWLM needs more training time compared with ELM, since it needs additional computational cost for computing the input weight matrix.
3. Growing Hidden Nodes and Incrementally Updating Weights
Given SLFNs with initial hidden nodes m and a training set xi,tii=1N let N be the number of input patterns, let l be the length of the input patterns, and let h be the length of the output patterns.
We have(13)W0=XXT+λ1I-1XYd0T,(14)H0=XTW0(15)β0=H0TH0+λ2I-1H0TT,where Yd0 is an m×N matrix consisting of N desired feature vectors.
Let E(Hi)=minHiβi-T be the network output error; if E(H0) is less than the target error ε>0, then no new hidden nodes need to be added and the learning procedure completes. Otherwise, we could add n new nodes to the existing SLFNs; then(16)W1=XXT+λ1I-1XYd0Yd1T=W0XXT+λ1I-1XYd1T=W0ΔW0,where Yd1 is an n×N matrix consisting of N desired feature vectors. Then,(17)H1=XTW0ΔW0=H0XTΔW0=H0,ΔH0,β1=H0,ΔH0TH0,ΔH0+λ2I-1H0,ΔH0TT=H0TH0H0TΔH0ΔH0TH0ΔH0TΔH0+λ2I-1H0,ΔH0TT=H0TH0+λ2IH0TΔH0ΔH0TH0ΔH0TΔH0+λ2I-1H0TTΔH0TT.
The Schur complement PP=ΔH0TΔH0+λ2I-ΔH0TH0H0TH0+λ2I-1H0TΔH0 of H0TH0+λ2I is invertible by choosing the suitable λ2. Then, using the result on the inversion of 2×2 block matrices [26], we can get(18)H0TH0+λ2IH0TΔH0ΔH0TH0ΔH0TΔH0+λ2I-1=H0TH0+λ2I-1+H0TH0+λ2I-1H0TΔH0P-1ΔH0TH0H0TH0+λ2I-1-H0TH0+λ2I-1H0TΔH0P-1-P-1ΔH0TH0H0TH0+λ2I-1P-1;then(19)β1=H0TH0+λ2IH0TΔH0ΔH0TH0ΔH0TΔH0+λ2I-1H0TΔH0TT=H0TH0+λ2I-1H0TT+H0TH0+λ2I-1H0TΔH0P-1ΔH0TH0H0TH0+λ2I-1H0TT-H0TH0+λ2I-1H0TΔH0P-1ΔH0TT-P-1ΔH0TH0H0TH0+λ2I-1H0TT+P-1ΔH0TT=β0+H0TH0+λ2I-1H0TΔH0P-1ΔH0TH0β0-H0TH0+λ2I-1H0TΔH0P-1ΔH0TT-P-1ΔH0TH0β0+P-1ΔH0TT=β0+Q0-R0U0+V0.To save computational cost, Q0,R0,U0,andV0 in (19) should be computed as the following sequence:(20)P-1ΔH0TH0⟶P-1ΔH0TH0β0=U0,H0TH0+λ2I-1H0TΔH0⟶H0TH0+λ2I-1H0TΔH0U0=Q0,P-1ΔH0T⟶P-1ΔH0TT=V0⟶H0TH0+λ2I-1H0TΔH0P-1ΔH0TT=R0.
Given the number of initial hidden nodes m0, the maximum number of hidden nodes mmax, and the expected output error ε, the OWLM-GHNIL for the SLFNs with the mechanism of growing hidden nodes can be summarized as the following two steps.
Algorithm OWLM-GHNIL
Step 1 (initialization step).
Compute W0,H0,β0 by (13)–(15).
Compute the corresponding output error E(H0)=minH0β0-T.
Step 2 (recursively incremental step).
Let k=0, and while mk<mmax and E(Hk)>ε,
k=k+1;
randomly add n (n need not be kept constant) hidden nodes to the existing network; then Wk+1,Hk+1,βk+1 can be calculated by (16)–(19).
End
Different from conventional OWLM which needs recalculating the input weight matrix and output weight matrix, whenever the network architecture is changed, the OWLM-GHNIL only needs updating the input weight matrix and output weight matrix incrementally each time; that is why it can reduce the computational complexity significantly. Moreover, the convergence of the OWLM-GHNIL can be guaranteed by the Convergence Theorem in [25].
Now, we begin to analyze computational complexity of the updated work.
The computational complexity, which we consider, expresses the total number of required scalar multiplications. Some matrix computations need not be done repeatedly including the inversion of matrix in (13), since they have been obtained in the process of computing Wk,Hk,βk. Then it requires lNn,lNn, and 2mkn2+mknN+3mknh+2mk2n+2n2N+nNh+n3 multiplications for Wk+1, Hk+1, and βk+1, respectively. Thus, the total computational complexity for the weights Wk+1 and βk+1 is(21)COWLM-GHNIL=2lNn+2mkn2+mknN+3mknh+2mk2n+2n2N+nNh+n3.
If we compute Wk+1 and βk+1 by (10) and (12) directly, it will cost COWLM=l3+2l2N+2lN(mk+n)+2(mk+n)2N+(mk+n)3+(mk+n)Nh multiplications.
Since in most applications mk and n can be much smaller than the number of training samples N: mk,n≪N and h and l are often small number in practical applications, then, with the growth of mk, n,(22)COWLMCOWLM-GHNIL≈2mk+n2mkn+2n2;when n=1/2mk,(23)COWLMCOWLM-GHNIL≈4.5;when n=1/4mk,(24)COWLMCOWLM-GHNIL≈8.3.
It can be seen that the OWLM-GHNIL is much more efficient than the conventional OWLM in such cases.
Similarly, we can get the computational complexity of the EM-ELM and ELM, respectively.(25)CEM-ELM=lNn+2mkn2+mknN+3mknh+2mk2n+2n2N+nNh+n3,CELM=lNmk+n+2mk+n2N+mk+n3+mk+nNh.
Then, we have(26)COWLM-GHNIL-CEM-ELM=lNn,COWLM-CELM=l3+2l2N+lNmk+n.
Obviously, the difference on computational complexity between OWLM-GHNIL and EM-ELM is much less than the difference between OWLM and ELM.
4. Simulation Results
In our experiments, all the algorithms are run in such computer environment: (1) operating system: Windows 7 Enterprise; (2) 3.8 GHZ CPU, Intel i5-3570; (3) memory: 8 GB; (4) simulating software: Matlab R2013a.
The performance of the OWLM-GHNIL has been compared with other growing algorithms including the EM-ELM, I-ELM, and the conventional OWLM.
In order to investigate the performance of the proposed OWLM-GHNIL, some benchmark problems are presented in this section.
The OWLM-GHNIL, EM-ELM, and OWLM have first been run to approximate the artificial “SinC” function which is a popular choice to illustrate neural network.(27)yx=sinxx,x≠01,x=0.
A training set and testing set with 5000 samples, respectively, are generated from the interval (-10,10) with random noise distributed in [-0.2,0.2], while testing data remain noise-free. The performances of each algorithm are shown in Figures 2 and 3. In this case, initial SLFNs are given five hidden nodes and then one new hidden node will be added each step until 30 hidden nodes arrive.
Average testing RMSE of different algorithms in SinC case.
Computational complexity comparison of different algorithms in Sinc case.
It can be seen from Figure 2 that the OWLM and the OWLM-GHNIL obtain similar lower testing root mean square error (RMSE) than the EM-ELM in most cases. Figure 3 shows the training time comparison of the three methods in SinC case. We can see that, with the growth of hidden nodes, the OWLM-GHNIL spent similar training time with the EM-ELM but much less than the OWLM in the case of the same number of nodes.
In the following, nine real benchmark problems including five regression applications and four classification applications are used for further comparison; all of them are available on the Web. For each case, the training data set and testing data set are randomly generated from its whole data set before each trial of simulation, and average results are obtained over 30 trials for all cases. The features of the benchmark data sets are summarized in Table 1.
Details of the data sets used for regression and classification.
Data sets
Classes
Attributes
Types
# of pieces of training data
# of pieces of testing data
Delta Ailerons
—
6
Regression
3000
4129
Delta Elevators
—
6
Regression
4517
5000
California Housing
—
8
Regression
8000
12,460
Computer activity
—
8
Regression
4000
4192
Bank domains
—
8
Regression
4500
3692
COLL20
20
1024
Classification
1080
360
G50C
2
50
Classification
414
136
USPST(B)
2
256
Classification
1509
498
Satimage
6
36
Classification
3217
3218
The generalization performance comparison between the OWLM-GHNIL and two other popular incremental ELM-type algorithms, EM-ELM and I-ELM, on regression and classification cases is given in Tables 2 and 3. In the implementation of the EM-ELM and OWLM-GHNIL, initial SLFNs are given 50 hidden nodes and then 25 new hidden nodes will be added each step until 150 hidden nodes arrive. In the case of the I-ELM, the initial SLFNs are given 1 hidden node and then the hidden nodes are added one by one until 150 hidden nodes. As observed from test results of average RMSE and accuracy in Tables 2 and 3, it looks that the OWLM-GHNIL obtained better generalization performance than the EM-ELM and I-ELM. In order to obtain an objective statistical measure, we apply a Student’s t-test to each data to check if the differences between the OWLM-GHNIL and the other two algorithms are statistically significant (p value = 0.05, i.e., confidence of 95%). It was shown in Table 2 that, in four of the regression data sets (Delta Ailerons, Delta Elevators, California Housing, and Bank domains) and three of the classification data sets (COLL20, USPST(B), and Satimage), the t-test gave a significant difference between OWLM-GHNIL and EM-ELM with superior generalization performance of the OWLM-GHNIL, whereas no significant difference was found in the two other data sets (computer activity and G50C). In Table 3, the t-test results show that there was a significant difference between OWLM-GHNIL and I-ELM with superior generalization performance of the OWLM-GHNIL in all data sets except computer activity data set.
Comparison of average testing RMSE/accuracy and Student’s t-test for the data sets between OWLM-GHNIL and EM-ELM.
Data sets
OWLM-GHNIL
EM-ELM
t-Test
RMSE/accuracy
Std.
RMSE/accuracy
Std.
p value
Delta Ailerons
0.0583
0.0058
0.1023
0.0081
0.003627
Delta Elevators
0.0896
0.0063
0.1423
0.0154
0.025436
California Housing
0.1841
0.0032
0.2321
0.0045
0.000048
Computer activity
0.0467
0.0021
0.0392
0.0018
0.628794
Bank domains
0.0212
0.0043
0.0611
0.0032
0.007694
COLL20
91.25%
0.0332
86.33%
0.0537
0.000000
G50C
85.31%
0.0553
87.23%
0.0463
0.065632
USPST(B)
90.45%
0.0158
85.21%
0.0547
0.000000
Satimage
88.15%
0.0368
85.37%
0.0446
0.007625
Comparison of average testing RMSE/accuracy and Student’s t-test for the data sets between OWLM-GHNIL and I-ELM.
Data sets
OWLM-GHNIL
I-ELM
t-Test
RMSE/accuracy
Std.
RMSE/accuracy
Std.
p-value
Delta Ailerons
0.0583
0.0058
0.1364
0.0223
0.000493
Delta Elevators
0.0896
0.0063
0.2265
0.0196
0.003743
California Housing
0.1841
0.0032
0.4365
0.0223
0.000000
Computer activity
0.0467
0.0021
0.0733
0.0072
0.065475
Bank domains
0.0212
0.0043
0.1128
0.0341
0.000000
COLL20
91.25%
0.0332
78.04%
0.0663
0.000000
G50C
85.31%
0.0553
76.45%
0.0772
0.000482
USPST(B)
90.45%
0.0158
84.33%
0.0472
0.000000
Satimage
88.15%
0.0368
80.33%
0.0682
0.001356
5. Conclusion
In this paper, we have developed an efficient method, OWLM-GHNIL; it can grow hidden nodes one by one or group by group in SLFNs. The analysis of computational complexity and simulation results on an artificial problem shows that OWLM-GHNIL can significantly reduce the computational complexity of OWLM. The simulation results on nine real benchmark problems including five regression applications and four classification applications also show that OWLM-GHNIL has better generalization performance than the two other incremental algorithms EM-ELM and I-ELM. t-test gave a significant difference with superior generalization performance of the OWLM-GHNIL further.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant no. LY18F030003, Foundation of High-Level Talents in Lishui City under Grant no. 2017RC01, Scientific Research Foundation of Zhejiang Provincial Education Department under Grant no. Y201432787, and the National Natural Science Foundation of China under Grant no. 61373057.
BishopC. M.HuangG.-B.BabriH. A.Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functionsHuX.-F.ZhaoZ.WangS.WangF.-L.HeD.-K.WuS.-K.Multi-stage extreme learning machine for fault diagnosis on hydraulic tube testerKwokT.-Y.YeungD.-Y.Objective functions for training new hidden units in constructive neural networksTeohE. J.TanK. C.XiangC.Estimating the number of hidden neurons in a feedforward network using the singular value decompositionLuoX.DengJ.WangW.WangJ.-H.ZhaoW.A quantized kernel learning algorithm using a minimum kernel risk-sensitive loss criterion and bilateral gradient techniqueXuY.LuoX.WangW.ZhaoW.Efficient DV-HOP localization forwireless cyber-physical social sensing system: A correntropy-based neural network learning schemeHaykinS.KumarS.YaoX.Evolving artificial neural networksZhangG. P.Neural networks for classification: a surveyHertzJ.KroghA.PalmerR. G.HuangG. B.ZhuQ. Y.SiewC. K.Extreme learning machine: theory and applicationsHuangG.ChenL.SiewC.Universal approximation using incremental constructive feedforward networks with random hidden nodesDingS.-F.XuX.-Z.NieR.Extreme learning machine and its applicationsLuoX.XuY.WangW.YuanM.BanX.ZhuY.ZhaoW.Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropyAnthonyM.BartlettP. L.VapnikV. N.DevroyeL.GyörfiL.LugosiG.DudaR. O.HartP. E.StorkD. G.BreimanL.FriedmanJ. H.OlshenR. A.StoneC. J.ManZ.LeeK.WangD.CaoZ.KhooS.An optimal weight learning machine for handwritten digit image recognitionLuoX.DengJ.LiuJ.WangW.BanX.WangJ.A quantized kernel least mean square scheme with entropy-guided learning for intelligent data analysisZhaoW.LunR.GordonC.FofanaA.-B. M.EspyD. D.ReinthalM. A.EkelmanB.GoodmanG. D.NiederriterJ. E.LuoX.A human-centered activity tracking system: toward a healthier workplaceFengG.HuangG.-B.LinQ.GayR.Error minimized extreme learning machine with growth of hidden nodes and incremental learningGolubG. H.van LoanC. F.