Extreme learning machine (ELM) is a competitive machine learning technique, which is simple in theory and fast in implementation; it can identify faults quickly and precisely as compared with traditional identification techniques such as support vector machines (SVM). As verified by the simulation results, ELM tends to have better scalability and can achieve much better generalization performance and much faster learning speed compared with traditional SVM. In this paper, we introduce a multiclass AdaBoost based ELM ensemble method. In our approach, the ELM algorithm is selected as the basic ensemble predictor due to its rapid speed and good performance. Compared with the existing boosting ELM algorithm, our algorithm can be directly used in multiclass classification problem. We also carried out comparable experiments with face recognition datasets. The experimental results show that the proposed algorithm can not only make the predicting result more stable, but also achieve better generalization performance.
Many research works have been done in feedforward neural networks, which pointed out that the feedforward neural networks are able to not only approximate complex nonlinear mapping, but also provide models for some natural and artificial problems which classic parametric technics are unable to handle.
Recently, Huang et al. [
In addition, Huang [
In view of the advantages of the algorithm, Cao et al. put it into some areas, such as landmark recognition [
AdaBoost [
However, until now, not so much works have been done to apply AdaBoost to ELM for multiclass classification problem directly. In Freund and Schapire’s work [
This paper is an extension of our previous work [
The rest of the paper is organized as follows. Section
In this section, a review of the original ELM algorithm and PCA and multiclass AdaBoost and the LBP based face recognition is presented.
For
Here,
The standard SLFNs with
Different from the conventional gradient-based solution of SLFNs, ELM simply solves the function by
Since the original ELM randomly generates the weights between the input layer and the hidden layer, as well as the bias of the activation function, its performance may be not so stable. Instead of that, some other ways like PCA algorithm rewards to try.
Principal component analysis (PCA) was invented in 1901 by Pearson [
The procedure of PCA is as follows: Step 1. Compute the matrix Step 2. Find out the eigenvalue of Step 3. Compute the standardization feature vector of Step 4. Yield the principal components
PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance, and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding components. Principal components are guaranteed to be independent if the dataset is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables.
AdaBoost has been very successfully applied in binary classification problem. Original AdaBoost is proposed in [
AdaBoost algorithm is summarized as follows.
Given the training data Initialize the observation weights For fit a classifier compute the weighted error compute the weight of the update the weights of sample data, for all renormalize Output
Here, For the If the
However, for a
Similar to the binary condition, for the
The original LBP operator goes through each
Basic LBP operator.
To apply LBP operator in face recognition problem, Ahonen et al. [
In this part, the multiclass AdaBoost ELM (MAELM) algorithm is proposed and a structure of face recognition based on LBP and ELM is also included.
By applying the multiclass AdaBoost to ELM, this paper proposes the multiclass AdaBoost ELM (MAELM) algorithm. The algorithm takes a number of ELM classifiers as the weak classifiers. Initialize the observation weights For fit a classifier compute the weighted error compute the weight of the update the weight of sample data, for all renormalize Output
Part (2)(a) of the proposed algorithm should be paid more attention. Both [
The proposed method maintains the advantages from original ELM: (1) it is simple in theory and convenient in implementation; (2) wide types of feature mapping functions or kernels are available for the proposed framework; (3) the proposed method can be applied directly into multiclass classification tasks. In addition, after integrating with the weighting scheme, the weighted ELM is able to deal with data with imbalanced class distribution while maintaining the good performance on well-balanced data as unweighted ELM; by assigning different weights for each example according to the users’ needs, the weighted ELM can be generalized to cost sensitive learning.
Under the weighted circumstance, the solution of
This paper combines LBP based feature vectors with ELM to build a face recognition structure. There have been some papers [
In order to get better generalization performance, the proposed face recognition structure implements the LBP based method to get the feature vector and ELM as the classifier. It has been proved in [
There are two steps of the proposed face recognition structure. The first step is to train the training samples by ELM or MAELM. In this step, the training samples are represented by LBP based feature vectors. Then, the feature vectors are used to train the classifier model by ELM or MAELM; see Figure
Training the samples by ELM or MAELM.
Predicting the labels of test samples.
In this paper, two of the mostly used face recognition datasets Yale and ORL are used to prove the efficiency of the proposed algorithm. To make the results valid, except for Section
The parameters to set and their meanings in the experiments are listed in Table
Parameter list.
Parameters | Meaning |
---|---|
|
Number of the basic classifiers |
|
Constant value in generalized inverse of |
|
Number of hidden nodes in ELM |
|
Number of training images of each person |
|
Divide each face image into |
|
The dimension after reduction |
Although ELM is comparatively not that sensitive to the arguments as SVM, its performance still changes with the hidden layer number
Suppose we have
In this part, the experiment is conducted in Yale dataset. The experiment sets
The performance of ELM (a). The performance of MAELM (b).
It is obvious that both ELM and MAELM are not sensitive to the change of arguments. The difference between ELM and MAELM is mainly in the region where
After seeing PCA’s good performance in the region of face recognition, we wonder if PCA could have a stable and better performance when it replaces the way we originally construct the matrix
The experiment is also conducted in Yale dataset with the same parameters. Besides, the new parameter
Performance of ELM and MAELM with PCA.
|
10 | 20 | 30 | 40 | 50 | 60 |
---|---|---|---|---|---|---|
|
0.19/ |
0.29/ |
0.2/ |
0.23/ |
0.09/ |
0.22/ |
|
0.19/ |
0.22/ |
|
|
0.27/ |
0.26/ |
|
0.27/ |
|
0.24/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.85/ |
|
0.90/ |
|
|
|
|
|
0.92/ |
|
0.93/ |
0.90/ |
|
|
|
|
|
|
0.85/ |
|
|
|
|
|
|
|
It is clear that both ELM and MAELM with PCA are not so sensitive to the change of arguments. The difference between them is mainly in the region where
Since the original ELM randomly generates the weights between the input layer and the hidden layer, as well as the bias of the activation function, its performance even for the same training and test set changes each time. This is to say the performance of original ELM may not be so stable. The proposed algorithm successfully reduces the instability.
From Figure
Performance of ELM and MAELM under the same training set and test set.
In Figure
Performance of ELM and MAELM under the same training set and test set.
Algorithm | Mean accuracy rate | Standard derivation |
---|---|---|
ELM | 0.8972 | 0.0213 |
MAELM | 0.9361 | 0.0157 |
ELM_PCA | 0.9222 | 0 |
MAELM_PCA | 0.9222 | 0 |
In order to evaluate the changes of performance when
MAELM’s performance.
MAELM with PCA’s performance.
From Figure
In this part, experiments are done both in Yale and ORL datasets. The experiments set the parameters of those algorithms as follows:
The experiment indicates that MAELM has better generalization performance both in Yale and ORL datasets under different window sizes. See Figure
Performances in Yale and ORL.
Performances in Yale and ORL.
After seeing all these experiments, we can conclude that although MAELM with PCA performs not so well as the original one, ELM with PCA performs much better than before, especially in the experiment in Section
What is more, since the original ELM randomly generates the weights between the input layer and the hidden layer, as well as the bias of the activation function, its performance is not so stable. The proposed algorithm with PCA successfully reduces the instability which is very important in the real world.
Although PCA improves the performance of ELM in a certain degree, it still could not reach the ability of MAELM with random weights and bias. Finally, it comes to the result that the proposed algorithm named MAELM performs much better in solving the multiclass classification problem.
Very similar to MAELM, the DAEELM [
Many methods have been developed to apply binary classifier to multilabel problem. One-against-all (OAA) [
Suppose that both MAELM and DAEELM have
The authors of DAEELM have not published its codes and DAEELM has its own arguments which MAELM does not have. DAEELM also does not provide details of how it trains weighted data with ELM, so it will be unfair to compare the performance of MAELM and DAEELM. However, the conclusion that MAELM is much faster than DAEELM in multiclass classification problem can be drawn from the complexity analysis above.
Section
Toh in [
In Section
This paper proposes a new boosting ELM named MAELM, which applies the multiclass AdaBoost in ELM ensemble to directly solve multiclass classification problem. A face recognition structure combined LBP based method and ELM is also presented in the paper. What is more, this paper proposes the way in which ELM combined with PCA instead of using random weights between the input layer and the hidden layer, as well as the bias of the activation function.
Experiments in LBP based face recognition will show the stable and good performance in a certain degree. Although PCA improves the performance of ELM, it still could not be better than MAELM with random weights and bias. Experiments show that in LBP based face recognition problem, the recognition result of MAELM is more stable than the original ELM and better than any other algorithms listed in the paper.
Finally, it comes to the result that the proposed algorithm named MAELM, which applies the multiclass AdaBoost in ELM and combines with LBP method, performs much better in solving the multiclass classification problem.
Also, MAELM is compared with DAEELM in multiclass classification problem in theory, which indicates that MAELM has much lower computation complexity than DAEELM. Moreover, this paper makes the problem how to train weighted data by ELM clear.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research is based on work supported in part by the National Natural Science Foundation of China (61370173, 61173123) and the Natural Science Foundation Project of Zhejiang Province under Project LR13F030003.