Evolutionary Voting-Based Extreme Learning Machines

Voting-based extreme learning machine (V-ELM) was proposed to improve learning efficiency where majority voting was employed. V-ELM assumes that all individual classifiers contribute equally to the decision ensemble. However, in many real-world scenarios, this assumption does not work well. In this paper, we aim to enhance V-ELM by introducing weights to distinguish the importance of each individual ELM classifier in decision making. Genetic algorithm is used for optimizing these weights. This evolutionary V-ELM is named as EV-ELM. Results on several benchmark databases show that EV-ELM achieves the highest classification accuracy compared with V-ELM and ELM.

Ensemble learning has been popular for decades.Jain et al. [24] wrote a concise yet informative introduction to the classifier combination.Polikar [25] comprehensively reviewed the area of multiple classifier system (ensemble system) for decision making.Majority voting [26] is one of the most commonly used combining strategies.This rule seeks the class that receives the highest number of votes and assigns it to the predicted label for the testing pattern.
Since each classifier in an ensemble does not necessarily contribute equally to the final decision, Littlestone and Warmuth [27] proposed weighing individual classifiers to make them discriminative.Ensemble learning-based ELM algorithms [16,18,19] were reported to successfully resolve the problems of predictive instability and overfitting.
Among ensemble-based extensions, voting-based ELM (V-ELM) [18] was proposed to perform multiple independent ELM training using a simple and effective learning architecture.A decision was made based on majority voting.V-ELM not only enhanced the classification performance, but also lowered the variance.In this paper, we aim to investigate the introduction of weights for all individual ELM classifiers to enhance the V-ELM algorithm.The hypothesis is that each individual ELM classifier presents various levels of confidence in decision making.In our proposed method, each weight represents the importance of an ELM classifier.Final decision is made with the weighted majority voting scheme.
The remainder of this paper is organized as follows.Section 2 briefly reviews ELM and V-ELM algorithms.Section 3 presents the proposed evolutionary V-ELM (EV-ELM) algorithm.Section 4 demonstrates the performance of Inputs: Training samples  = {(x  ,   )} ,  = 1, . . .,  with labels   ∈ {1, . . ., }; (): Population of candidates at generation ; : Population size of each generation;   : Crossover probability;   : Mutation probability Initialization: Set  = 0 and initialize the population () at random.Evolutionary Process: Evaluate the fitness values of () using (5).Increase  with 1 in each iteration.To create a new generation ( + 1), operations of selection, crossover and mutation on () are used.Repeat the following steps until the termination criteria of genetic algorithm (GA) is met.(i) Firstly, (1 −   ) members of () are probabilistically selected to ( + 1) according to the fitness.(ii) Secondly, the crossover operator is applied to half of not selected candidates in ().The offsprings after crossover are added to ( + 1).(iii) Lastly, a number of chromosomes with a probability of   in ( + 1) are subject to mutation.Store weights { opt 1 , . . .,  opt  } as the outputs where "opt" indicates "optimal".Decision Making: Given a testing sample x, use the following equation to predict its label EV-ELM and compares it with ELM and V-ELM.Section 5 concludes this study.

Background
2.1.Extreme Learning Machine.In the process of SLFN learning, ELM randomly selects weights and biases for hidden nodes.Then it analytically determines the output weights by finding the least squares solution.Given a training set consisting of  samples  = {(x  , t  ) | x  ∈ R  , t  ∈ R V ,  = 1, 2, ..., }, where x  is an  × 1 input vector and t  is an V × 1 target vector, an SLFN with Ñ hidden nodes is formulated as where the additive hidden node is employed.Weight vector w  connects the th hidden node and input neurons.In approximating  samples using Ñ hidden nodes,   , w  , and   are supposed to exist if zero error is obtained.Consequently, (1) can be written as where H(w 1 , . . ., w Ñ,  1 , . . .,  Ñ, x 1 , . . ., x  ) is hidden layer output matrix of the network; ℎ  = (w  ⋅x  +  ) is the output of th hidden neuron with respect to x  ,  = 1, 2, . . ., Ñ and  = 1, 2, . . ., ; β = [ 1 , . . .,  Ñ] T and T = [t 1 , . . ., t  ] T are output weight matrix and target matrix, respectively.The ELM algorithm can be summarized as three steps: (1) generate parameters w  and   randomly for  = 1, . . ., Ñ; (2) calculate the hidden layer output matrix H; and (3) calculate the output weight using  = H † T. It has been shown in [28] that any continuous target functions in R  can be universally approximated using single SLFN with randomly chosen additive hidden nodes.

Voting-Based Extreme Learning Machine.
In ELM, randomized hidden nodes are used and remain unchanged during the training.Some testing samples could be misclassified in certain situations, for example, when they are near the classification boundary.To tackle this issue, V-ELM incorporates multiple individual ELMs and makes decisions with majority voting.V-ELM uses a fixed number of hidden nodes for all individual ELMs.All these ELMs are trained with the same dataset and the learning parameters of each ELM are randomly initialized.The predicted class label is then determined by majority voting on all results obtained from ELMs.

Evolutionary Voting-Based ELM
Given a learning set  consisting of samples {(x  ,   )},  = 1, 2, . . ., , where   is the class label.We assume that x is the input and  is predicted by (x, ).In V-ELM, the aim is to better predict  using multiple ELMs than a single one.Suppose that (x, ) predicts a class label  ∈ {1, 2, . . ., } and the prediction of th classifier is  , ∈ {0, 1} where  = 1, 2, . . ., , the ensemble decision can be defined as The voting is plurality version which means that the output is the value with highest number of votes whether or not the sum of votes exceeds half.In many applications, not all the classifiers contribute equally to decision making.The overall performance of the ensemble system is able to be improved by weighing the decisions prior to combination [27].In this section, an evolutionary voting-based ELM (EV-ELM) using weighted majority voting is proposed.The general algorithm is elaborated in Algorithm 1.We denote   as the weights for th individual ELM.The mathematical representation of weighted voting algorithm is shown as In the framework of weighted majority voting algorithm, the weight   needs to be optimized to improve the generalization performance.If we know certain classifiers working better, we are able to assign larger weights to the corresponding ones.However, such knowledge is usually absent.Conventional parameter updating methods find the optimal weights to provide better generalization performance.But the optimization process has the risk of getting the local minima and maxima.Methods that discover global optimum can be implemented to further improve classification accuracy.Genetic algorithm (GA) [29] is a class of optimization procedures inspired by the biological mechanisms of reproduction.Many applications utilize the advantages of GA to find optimal solutions, for example, face recognition [30] and clustering techniques [31].GA is implemented in this paper for demonstration purpose.In practice, many new emerging techniques are potential alternatives such as differential evolution [32] and particle swarm optimization [33].
In order to use GA to select proper weights, the chromosomes are formed by  1 ,  2 , . . .,   .At the beginning, a population of   chromosomes is generated randomly.Then, the fitness function  is calculated for each chromosome as where   and   are the weight and training accuracy for the th ELM.Maximizing the fitness implies that by choosing appropriate weights we are able to achieve the best normalized training accuracy across all  ELM classifiers in the ensemble.Such a set of optimal weights provide the decision ensemble good generalization performance on unseen testing samples.The fitness function is the most important measurement to determine the composition of the next generation and to guide the entire evolutionary process.After chromosome selection, parts of the current population are inherited into the next generation.The remaining strings are reproduced and some of the parent chromosomes have undergone crossover operation.The crossover probability is defined as   .To extend the search space to find global optimum, mutation is applied to some of offsprings randomly with a very small mutation probability   .The mutation will introduce a degree of diversity to the population and prevent a premature convergence.The evolutionary process will be terminated after   generations, which are considered enough for convergence.

Performance Evaluation
Experiments were carried out in MATLAB 7 environment under a desktop equipped with Intel 3.2 GHz CPU and 4 G RAM.The learning and testing processes were repeated 50 times and the mean and standard deviations were reported in results.In the experiments, the range of weight was [exp(0), exp(1)] where the exponential function was used to enhance the difference between lower-bound and upperbound values of the weight.Moreover, the crossover probability   and mutation probability   were chosen as 0.65 and 0.004, respectively.The population size   was 100 and the number of generations in GA was 150 for all experiments.
4.1.Databases.We evaluated the proposed EV-ELM with two types of data: UCI machine learning data and face recognition data.The UCI data was used to test the methods in generalpurpose classification problems; the face data was able to examine how good the methods were at handling data with the problem of small sample size (i.e., each pattern class had only a few samples).Details of these databases are presented in Table 1.A total of 12 real world datasets were downloaded from the UCI database [34].Five benchmark face databases were used, namely, ORL [35], UMIST [36], Yale [37], FERET [38], and Georgia Tech face database (GTFD) [39].ORL, UMIST, and Yale formed a combo database for testing.The combo set consisted of training samples and 575 testing samples in total, and all images belonged to 75 different classes with large variations of illumination, poses, and facial expressions.The FERET database used in this paper was a preprocessed subset [40] composed of 2713 images from 320 subjects.In GTFD, each of 50 subjects had 15 images.Before the experimental evaluation, we cropped and resized images in Yale and GTFD databases to 112 × 92 to make their dimensions identical to those of samples in ORL and UMIST.Furthermore, we applied the discrete cosine transform (DCT) [41] to convert 2D face images to low-dimensional vectors of DCT coefficients.

Results and Discussion
. Table 2 presents the comparison results where both training time and testing accuracy are averaged across 50 repeats of the evaluation process.It is shown that ELM is the fastest learner but performs poorly in classification.V-ELM and EV-ELM achieve much better performance.Since both V-ELM and EV-ELM create an ensemble of individual ELM classifiers, they run slower compared with ELM unless a parallel computing structure is implemented.In all databases, EV-ELM outperforms V-ELM in terms of both accuracy and variance.However, this improvement is not as much as that between votingbased methods and the original ELM algorithm.The results also show that the evolutionary weighing method is able to increase classification accuracy while bringing down the variance.In general, EV-ELM needs more time than V-ELM to train a model.When in an application that online training is not required, EV-ELM is a good alternative to V-ELM.
Figure 1 illustrates five examples on the changes in classification performance during the evolutionary process.Within 150 generations, the GA process is usually able to converge.Reflected in the figure, the classification accuracy tends to become stable toward the end of the evolutionary process.The selected weights after 150 generations cannot guarantee achieving the best classification accuracy, but they are able to provide a consistent output.This characteristic is important as testing accuracy is not available during classifier training, and it cannot be used to guide the evolutionary process.Figure 2 depicts the changes of 3 weights as examples during the GA evolution.Initially, these weights are randomly generated.With the increasing of generations, they tend to converge to either upper-or lower-bound values so that the fitness is maximized.The variety among the weights creates a dynamic ELM ensemble for decision making.

Conclusions
In this paper, we proposed an enhanced V-ELM method.
Weights were introduced to distinguish the difference among various individual ELM classifiers and the genetic algorithm was used for optimization.Experimental results demonstrated the effectiveness of EV-ELM in terms of classification accuracy.However, slow training speed prohibits the use of EV-ELM in applications that require online training.This study is a preliminary research on optimizing ELM ensembles; many evolutionary algorithms are potentially useful in optimizing weights.Furthermore, reducing training time is of great interest in future work.

Figure 1 :
Figure 1: Five examples show the changes in classification performance during the evolutionary process.The -axis is the number of generations and the -axis is classification accuracy during each generation.The results are based on UCI Heart dataset.

Figure 2 :
Figure 2: There example weights and their value changes in value changes during the evolutionary process.The results are based on UCI Heart dataset.

Table 1 :
Databases used in the experiments.

Table 2 :
Comparison results among ELM, V-ELM, and EV-ELM algorithms using benchmark UCI and face databases.Database Algorithms Number of nodes Number of ensembles Training time (s) Testing accuracy (%) Standard deviation (%)