^{1}

^{1}

^{1}

Kernel extreme learning machine (KELM) is a novel feedforward neural network, which is widely used in classification problems. To some extent, it solves the existing problems of the invalid nodes and the large computational complexity in ELM. However, the traditional KELM classifier usually has a low test accuracy when it faces multiclass classification problems. In order to solve the above problem, a new classifier, Mexican Hat wavelet KELM classifier, is proposed in this paper. The proposed classifier successfully improves the training accuracy and reduces the training time in the multiclass classification problems. Moreover, the validity of the Mexican Hat wavelet as a kernel function of ELM is rigorously proved. Experimental results on different data sets show that the performance of the proposed classifier is significantly superior to the compared classifiers.

Extreme learning machine, which was proposed by Huang et al. [

Therefore, both ELM and its varieties have some inevitable problems.

In order to solve the above problems, Huang et al. [

The basic principle of ELM and some theorems are shown in Section

Let us suppose that there are arbitrary distinct samples

If the output weights are

According to KKT theory, the above problem can be transformed into a Lagrange function

Kernel function method is often used in SVM as a method of replacing dot product. According to the Mercer theorem (see [

The kernel functions which satisfy (

A translation-invariant kernel

The kernel function selection method of ELM is the same as SVM. Therefore, the above theorem can also be used to determine whether a function is an admissible ELM kernel. The commonly used translation-invariant kernel functions are Gauss kernel function and polynomial kernel function. In these two functions, Gauss kernel function is a kind of translation-invariant kernel function. And the expression of the two kernel functions can be given as

In (

In original ELM model, the linear weighted hidden output function

In this part, the Mexican Hat wavelet kernel function is proposed. It is also proved that Mexican Hat wavelet function is an admissible ELM kernel.

Let

If it satisfies the translation-invariant kernel theorem, the following translation-invariant kernel function can be obtained:

The proof of Theorem

As a kind of translation-invariant kernel function, Mexican Hat wavelet is an admissible ELM kernel.

Firstly, it should be proved that the Fourier transform of Mexican Hat wavelet is nonnegative (see (

Equation (

The integral term in (

According to the translation invariance of the integral, it is easy to get (

Substituting (

Then, substituting (

From (

We have already proved that Mexican Hat wavelet is an admissible ELM kernel. So, we can substitute (

Besides, this classifier can also be used for the multiclass classification problems. And the output function is

Equation (

This section will analyze the performance of MHW-KELM and compare it with the traditional Gauss-KELM, Poly-KELM, original ELM, and BP classifier. All these algorithms run on the R2014a MATLAB software. The operating environment is Core-i7, 2.6 GHz CPU, 8 G RAM. We choose scaled conjugate gradient algorithm to optimize BP neural network, which is faster than normal BP neural network. In order to get excellent performance, the number of hidden nodes of original ELM and BP is selected as 100% and 30% of training samples, respectively. The data sets used in the experiment are from the UCI database [

Basic features of 12 data sets.

Data set | Training number | Testing number | Attribute | Category |
---|---|---|---|---|

Abalone | 2000 | 2177 | 8 | 3 |

Auto MPG | 200 | 198 | 7 | 5 |

Bank | 2000 | 2521 | 16 | 2 |

Car Evaluation | 1000 | 728 | 6 | 4 |

Wine | 100 | 78 | 13 | 3 |

Wine Quality | 2000 | 4497 | 11 | 7 |

Iris | 100 | 50 | 4 | 3 |

Glass | 100 | 114 | 9 | 2 |

Image | 100 | 110 | 19 | 7 |

Yeast | 1000 | 484 | 8 | 4 |

Zoo | 50 | 51 | 16 | 7 |

Letter | 2000 | 18000 | 16 | 26 |

Then, we use the 12 data sets given in Table

Performance comparison with statistical test on Abalone.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Abalone (2000, 3) | Mean | | | 77.24 | 62.53 | 64.20 |

Std. | | | ±0.92 | ±2.89 | ±3.13 | |

| | | 7.35 | 2.12 | 3.47 | |

Time | | | 0.968 | 3.357 | 7.835 |

Performance comparison with statistical test on Auto MPG.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Auto MPG (200, 5) | Mean | | 73.55 | 79.01 | 56.71 | <10 |

Std. | | ±1.22 | ±0.90 | ±3.24 | ||

| | 1.15 | 7.35 | 5.16 | 0 | |

Time | 0.070 | 0.071 | 0.075 | | 1.235 |

Performance comparison with statistical test on Bank.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Bank (2000, 2) | Mean | | | 86.50 | 65.85 | 87.99 |

Std. | | | ±0.64 | ±1.43 | ±1.25 | |

| | | 4.47 | 1.09 | 0.03 | |

Time | | | 0.8917 | 3.227 | 7.611 |

Performance comparison with statistical test on Car Evaluation.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Car Evaluation (1000, 4) | Mean | | 96.12 | 92.98 | 31.94 | 70.25 |

Std. | | ±0.90 | ±1.11 | ±12.36 | ±5.53 | |

| | 0.01 | 1.97 | 8.24 | 3.68 | |

Time | | | 0.240 | 0.548 | 2.751 |

Performance comparison with statistical test on Wine.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Wine (100, 3) | Mean | | 83.63 | | 50.10 | 36.87 |

Std. | | ±0.81 | | ±2.93 | ±1.28 | |

| | 5.52 | | 4.14 | 8.05 | |

Time | 0.070 | 0.072 | | | 1.088 |

Performance comparison with statistical test on Wine Quality.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Wine Quality (2000, 7) | Mean | | 49.69 | 52.14 | 45.79 | <10 |

Std. | | ±0.52 | ±0.28 | ±0.85 | ||

| | 1.04 | 7.82 | 3.27 | 0 | |

Time | | | 1.372 | 3.520 | 7.159 |

Performance comparison with statistical test on Iris.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Iris (100, 3) | Mean | | | 98.85 | 61.34 | 35.41 |

Std. | | | ±0.12 | ±0.78 | ±0.33 | |

| | | 0.01 | 4.59 | 6.21 | |

Time | 0.071 | 0.075 | 0.062 | | 1.290 |

Performance comparison with statistical test on Glass.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Glass (100, 2) | Mean | 98.11 | | | 92.83 | 75.76 |

Std. | ±0.31 | | | ±1.79 | ±3.63 | |

| 0.02 | | | 3.50 | 9.93 | |

Time | 0.072 | 0.074 | 0.065 | | 1.074 |

Performance comparison with statistical test on Image.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Image (100, 7) | Mean | | 85.12 | 87.58 | 35.56 | 16.13 |

Std. | | ±0.78 | ±0.46 | ±1.94 | ±3.25 | |

| | 6.64 | 1.78 | 8.91 | 2.45 | |

Time | 0.075 | 0.072 | 0.061 | | 1.193 |

Performance comparison with statistical test on Yeast.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Yeast (1000, 4) | Mean | | | | 37.23 | 33.95 |

Std. | | | | ±3.71 | ±2.11 | |

| | | | 1.57 | 7.84 | |

Time | | 0.201 | 0.235 | 0.457 | 3.005 |

Performance comparison with statistical test on Zoo.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Zoo (50, 7) | Mean | | | | 92.30 | 35.56 |

Std. | | | | ±0.53 | ±1.21 | |

| | | | 1.50 | 1.58 | |

Time | 0.075 | 0.076 | | | 1.135 |

Performance comparison with statistical test on Letter.

Data set (training number, category) | MHW-KELM | Gauss-KELM | Poly-KELM | Original ELM | SCG-BP | |
---|---|---|---|---|---|---|

Letter (2000, 26) | Mean | | | 68.79 | 15.51 | <10 |

Std. | | | ±3.88 | ±5.48 | ||

| | | 0.01 | 2.43 | 0 | |

Time | | 1.833 | 2.132 | 4.559 | 7.270 |

By drawing the running time in all tables to a line graph, we can get Figure

Comparison of running time of 4 algorithms.

From all tables and Figure

From Tables

In this paper, we propose a classifier, the Mexican Hat wavelet kernel ELM classifier, which can be applied to the multiclass classification problem. Besides, its validity as an admissible ELM kernel is also proved. This classifier solves the inevitable problems in original ELM by replacing the linear weighted mapping method with Mexican Hat wavelet. The experimental results show that the training time of MHW-KELM classifier is much less than that of original ELM, which solves the problem of the dimension explosion in original ELM. Meanwhile, the training accuracy of this classifier is superior to the traditional Gauss-KELM and original ELM in dealing with the multiclass classification problems.

In future work, in order to reduce the impact of inequality of the training data on the performance, we plan to utilize the boosting weighted ELM proposed by Li et al. [

The authors declare that there are no competing interests regarding the publication of this paper.

The authors gratefully acknowledge the support of the following foundations: 973 Project of China (2013CB733605), the National Natural Science Foundation of China (21176073 and 61603343), and the Fundamental Research Funds for the Central Universities.