To apply the single hidden-layer feedforward neural networks (SLFN) to identify time-varying system, online regularized extreme learning machine (ELM) with forgetting mechanism (FORELM) and online kernelized ELM with forgetting mechanism (FOKELM) are presented in this paper. The FORELM updates the output weights of SLFN recursively by using Sherman-Morrison formula, and it combines advantages of online sequential ELM with forgetting mechanism (FOS-ELM) and regularized online sequential ELM (ReOS-ELM); that is, it can capture the latest properties of identified system by studying a certain number of the newest samples and also can avoid issue of ill-conditioned matrix inversion by regularization. The FOKELM tackles the problem of matrix expansion of kernel based incremental ELM (KB-IELM) by deleting the oldest sample according to the block matrix inverse formula when samples occur continually. The experimental results show that the proposed FORELM and FOKELM have better stability than FOS-ELM and have higher accuracy than ReOS-ELM in nonstationary environments; moreover, FORELM and FOKELM have time efficiencies superiority over dynamic regression extreme learning machine (DR-ELM) under certain conditions.
Plenty of research work has shown that the single hidden-layer feedforward neural networks (SLFN) can approximate any function and form decision boundaries with arbitrary shapes if the activation function is chosen properly [
For some practical fields where the training data are generated gradually, online sequential learning algorithms are preferred over batch learning algorithms as sequential learning algorithms do not require retraining whenever a new sample is received. Hence, Liang et al. developed a kind of online sequential ELM (OS-ELM) using the recursive least square [
As a variant of ELM, regularized ELM (RELM) [
If the feature mapping in SLFN is unknown to users, the kernel based ELM (KELM) can be constructed [
However, in time-varying or nonstationary applications, the newer training data usually carry more information about systems, and the older ones possibly carry less information, even misleading information; that is, the training samples usually have timeliness. The ReOS-ELM and KB-IELM cannot reflect the timeliness of sequential training data well. On the other hand, if a huge number of samples emerge, for KB-IELM, the required storage space for matrix will infinitely increase with learning ongoing and new samples arriving ceaselessly, and at last storage overflow will happen necessarily, so KB-IELM cannot be utilized at all under the circumstances.
In this paper, we combine advantages of FOS-ELM and ReOS-ELM and propose online regularized ELM with forgetting mechanism (FORELM) for time-varying applications. FORELM can overcome the potential matrix singularity problem by using regularization and eliminate effections of the outdated data on model by incorporating forgetting mechanism. Like FOS-ELM, the ensemble skill also may be employed in FORELM to enhance its stability; that is, FORELM comprises
It should be noted that our methods adjust the output weights of SLFN due to addition and deletion of the samples one by one, namely, learn and forget samples sequentially, and network architecture is fixed. They are completely different from those offline incremental ELMs (I-ELM) [
The rest of this paper is organized as follows. Section
For simplicity, ELM based learning algorithm for SLFN with multiple input single output is discussed.
The output of a SLFN with
For a given set of distinct training data
Based on the KKT theorem, the constrained optimization of (
In order to reduce computational costs, when
If the feature mapping
The ReOS-ELM, that is, SRELM and LS-IELM, can be retold as follows.
For time
For time
For time
For time
When SLFN is employed to model online for time-varying system, training samples are not only generated one by one, but also often have the property of timeliness; that is, training data have a period of validity. Therefore, during the learning process by online sequential learning algorithm, the older or outdated training data, whose effectiveness is less or is lost after several unit times, should be abandoned, which is the idea of forgetting mechanism [
After RELM has studied a given number
Let
Suppose that FORELM may consist of
Initialization: Choose the hidden output function Randomly assign hidden parameters Determine
Incrementally learn initial Get current sample Calculate Calculate
Online modeling and prediction: repeat the following procedure during every step: Acquire current Prediction: form Delete the oldest sample
After KELM has studied the given number
Let
Moreover, using the block matrix inverse formula, such equation can be obtained:
Rewrite
Compare (
Next time, compute
Integrate KB-IELM with the decremental KELM; further, we can obtain FOKELM as follows.
Initialization: choose kernel
Incrementally learn initial
Online modeling and prediction: Acquire new sample Prediction: form Delete the oldest sample
In this section, the performance of the presented FORELM and FOKELM is verified via the time-varying nonlinear process identification simulations. Those simulations are designed from the aspects of accuracy, stability, and computation complexity of the proposed FORELM and FOKELM by comparison with the FOS-ELM, ReOS-ELM (i.e., SRELM or LS-IELM), and dynamic regression extreme learning machine (DR-ELM) [
All the performance evaluations were executed in MATLAB 7.0.1 environment running on Windows XP with Intel Core i3-3220 3.3 GHz CPU and 4 GB RAM.
The system (
Denote
In all experiments, the output of hidden node with respect to the input
The root-mean-square error (RMSE) of prediction and the maximal absolute prediction error (MAPE) are regarded as measure indices of model accuracy and stability, respectively. Consider
ReOS-ELM does not discard any old sample; thus it has
In offline modeling, the training samples set is fixed; thus one may search relative optimal values for model parameter of ELMs. Nevertheless, in online modeling for time-varying system, the training samples set is changing; it is difficult to choose optimal values for parameters in practice. Therefore, we set the same parameter of these ELMs with the same value manually; for example, their parameters
RMSE and MAPE of the proposed ELMs and other aforementioned ELMs are listed in Tables
RMSE comparison between the proposed ELMs and other ELMs on identification of process (
|
50 | 70 | 100 | 150 | 200 |
| |
---|---|---|---|---|---|---|---|
|
FOS-ELM ( |
0.3474 | 0.0967 | 0.0731 | 0.0975 | 0.0817 | — |
ReOS-ELM | — | — | — | — | — | 0.0961 | |
DR-ELM | 0.0681 | 0.0683 | 0.0712 | 0.0833 | 0.0671 | — | |
FORELM ( |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | 0.9070 | 0.1808 | 0.1324 | 0.0821 | 0.0883 | — |
ReOS-ELM | — | — | — | — | — | 0.0804 | |
DR-ELM | 0.0543 | 0.0670 | 0.0789 | 0.0635 | 0.0656 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | 9.9541 | 0.4806 | — |
ReOS-ELM | — | — | — | — | — | 0.0529 | |
DR-ELM | 0.0377 | 0.0351 | 0.0345 | 0.0425 | 0.0457 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.0359 | |
DR-ELM | 0.0298 | 0.0306 | 0.0308 | 0.0391 | 0.0393 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.0351 | |
DR-ELM | 0.0268 | 0.0270 | 0.0281 | 0.0417 | 0.0365 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.0344 | |
DR-ELM | 0.0259 | 0.0284 | 0.0272 | 0.0363 | 0.0351 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.0296 | |
DR-ELM | 0.0240 | 0.0255 | 0.0249 | 0.0362 | 0.0327 | — | |
FORELM | 0.0231 | 0.0248 | 0.0256 | 0.0372 | 0.0310 | — | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.0292 | |
DR-ELM | 0.0233 | 0.0253 | 0.0270 | 0.0374 | 0.0320 | — | |
FORELM | 0.0223 | 0.0256 | 0.0252 | 0.0407 | 0.0318 | — | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.0298 | |
DR-ELM | 0.0234 | 0.0263 | 0.0278 | 0.0418 | 0.0323 | — | |
FORELM |
|
|
|
|
|
|
|
|
|||||||
FOKELM |
|
|
|
|
|
— |
“—” represents nondefinition or inexistence in the case.
“×” represents nullification owing to the too large RMSE or MAPE.
MAPE comparison between the proposed ELMs and other ELMs on identification of process (
|
50 | 70 | 100 | 150 | 200 |
| |
---|---|---|---|---|---|---|---|
|
FOS-ELM ( |
5.8327 | 0.6123 | 0.5415 | 1.0993 | 0.7288 | — |
ReOS-ELM | — | — | — | — | — | 0.3236 | |
DR-ELM | 0.3412 | 0.4318 | 0.3440 | 0.7437 | 0.4007 | — | |
FORELM ( |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | 5.0473 | 1.7010 | 2.4941 | 0.9537 | 0.9272 | — |
ReOS-ELM | — | — | — | — | — | 0.4773 | |
DR-ELM | 0.3418 | 0.3158 | 0.4577 | 0.5680 | 0.3127 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | 209.8850 | 3.9992 | — |
ReOS-ELM | — | — | — | — | — | 0.2551 | |
DR-ELM | 0.3876 | 0.3109 | 0.3808 | 0.5486 | 0.3029 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.2622 | |
DR-ELM | 0.2473 | 0.2748 | 0.2163 | 0.5471 | 0.3381 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.3360 | |
DR-ELM | 0.2725 | 0.2365 | 0.2133 | 0.5439 | 0.3071 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.2687 | |
DR-ELM | 0.2539 | 0.2454 | 0.2069 | 0.5443 | 0.3161 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.2657 | |
DR-ELM | 0.2520 | 0.2456 | 0.2053 | 0.5420 | 0.3260 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.3784 | |
DR-ELM | 0.2388 | 0.2401 | 0.3112 | 0.5415 | 0.3225 | — | |
FORELM | 0.2128 | 0.2325 | 0.3006 | 0.5416 | 0.3247 | — | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.4179 | |
DR-ELM | 0.2230 | 0.2550 | 0.3412 | 0.5431 | 0.3283 | — | |
FORELM |
|
|
|
|
|
|
|
|
|||||||
FOKELM |
|
|
|
|
|
— |
Running time (second) comparison between the proposed ELMs and other ELMs on identification of process (
|
50 | 70 | 100 | 150 | 200 |
| |
---|---|---|---|---|---|---|---|
|
FOS-ELM ( |
0.0620 | 0.0620 | 0.0621 | 0.0621 | 0.0622 | — |
ReOS-ELM | — | — | — | — | — | 0.0160 | |
DR-ELM | 0.2650 | 0.3590 | 0.5781 | 1.0790 | 1.6102 | — | |
FORELM ( |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | 0.0621 | 0.0621 | 0.0621 | 0.0623 | 0.0623 | — |
ReOS-ELM | — | — | — | — | — | 0.0167 | |
DR-ELM | 0.2500 | 0.3750 | 0.6094 | 1.0630 | 1.6250 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | 0.1250 | 0.1250 | — |
ReOS-ELM | — | — | — | — | — | 0.0470 | |
DR-ELM | 0.2650 | 0.3906 | 0.6250 | 1.1100 | 1.6560 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.1250 | |
DR-ELM | 0.2813 | 0.4219 | 0.6719 | 1.2031 | 1.7380 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.2344 | |
DR-ELM | 0.3125 | 0.4540 | 0.7500 | 1.2970 | 1.9220 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 0.3901 | |
DR-ELM | 0.3290 | 0.5160 | 0.7810 | 1.3440 | 1.9690 | — | |
FORELM |
|
|
|
|
|
— | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 1.7350 | |
DR-ELM | 0.3906 | 0.5781 | 0.8750 | 1.4680 | 2.2820 | — | |
FORELM | 6.6410 | 6.4530 | 6.2030 | 5.6090 | 5.1719 | — | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 8.7190 | |
DR-ELM | 0.5470 | 0.7970 | 1.2660 | 2.0938 | 2.9840 | — | |
FORELM | 32.7820 | 31.6720 | 30.1400 | 27.5780 | 27.7500 | — | |
|
|||||||
|
FOS-ELM | × | × | × | × | × | — |
ReOS-ELM | — | — | — | — | — | 13.6094 | |
DR-ELM | 0.6250 | 1.0000 | 1.4530 | 2.4380 | 3.4060 | — | |
FORELM |
|
|
|
|
|
|
|
|
|||||||
FOKELM |
|
|
|
|
|
— |
From Tables RMSE and MAPE of FORELM are smaller than those of FOS-ELM with the same RMSE of FORELM and FOKELM is smaller than that of ReOS-ELM with the same When FORELM requires nearly the same time as FOS-ELM, but more time than ReOS-ELM. It is because both FORELM and FOS-ELM involve Both FORELM and DR-ELM use regularization trick, so they should obtain the same or similar prediction effect theoretically, and Tables When
To intuitively observe and compare the accuracy and stability of these ELMs with the same
APE curves of ELMs on process (
APE curves of FOS-ELM
APE curves of ReOS-ELM
APE curves of DR-ELM
APE curves of FORELM
APE curves of FOKELM
On the whole, FORELM and FOKELM have higher accuracy than FOS-ELM and ReOS-ELM.
In the simulation, with the growth rate parameter
Let
Set
APE curves of ELMs on process (
APE curves of FOS-ELM
APE curves of ReOS-ELM
APE curves of DR-ELM
APE curves of FORELM
APE curves of FOKELM
Through many comparative trials, we may attain the same results as the ones in Simulation 1.
ReOS-ELM (i.e., SRELM or LS-IELM) can yield good generalization models and will not suffer from matrix singularity or ill-posed problems, but it is unsuitable in time-varying applications. On the other hand, FOS-ELM, thanks to its forgetting mechanism, can reflect the timeliness of data and train SLFN in nonstationary environments, but it may encounter the matrix singularity problem and run unstably.
In the paper, the forgetting mechanism is incorporated to ReOS-ELM, and we obtain FORELM which blends advantages of ReOS-ELM and FOS-ELM. In addition, the forgetting mechanism also is added to KB-IELM; consequently, FOKELM is obtained, which can overcome matrix expansion problem of KB-IELM.
Performance comparison between the proposed ELMs and other ELMs was carried out on identification of time-varying systems in the aspects of accuracy, stability, and computational complexity. The experimental results show that FORELM and FOKELM have better stability than FOS-ELM and have higher accuracy than ReOS-ELM in nonstationary environments statistically. When the number
The authors declare that there is no conflict of interests regarding the publication of this paper.
The work is supported by the Hunan Provincial Science and Technology Foundation of China (2011FJ6033).