The optimisation for local coupled extreme learning machine using differential evolution

Many strategies have been exploited for the task of reinforcing the effectiveness and efficiency of extreme learning machine (ELM), from both methodology and structure perspectives. By activating all the hidden nodes with different degrees, local coupled extreme learning machine (LC-ELM) is capable of decoupling the link architecture between the input layer and the hidden layer in ELM. Such activated degrees are jointly determined by the associated addresses and fuzzy membership functions assigned to the hidden nodes. In order to further refine the weight searching space of LC-ELM, this paper implements an optimisation, entitled evolutionary local coupled extreme learning machine (ELC-ELM). This method makes use of the differential evolutionary (DE) algorithm to optimise the hidden node addresses and the radiuses of the fuzzy membership functions, until the qualified fitness or themaximumiterationstepisreached.Theefficacyofthepresentedworkisverifiedthroughsystematicsimulatedexperimentationsinbothregressionandclassificationapplications.ExperimentalresultsdemonstratethattheproposedtechniqueoutperformsthreeELMalternatives,namely,theclassicalELM,LC-ELM,andOSFuzzyELM,accordingtoaseriesofreliableperformances.


Introduction
Due to the significant efficiency and simple implementation, extreme learning machine (ELM) [1,2] has recently enjoyed much attention as a powerful tool in regression and classification applications (e.g., [3,4]).A variety of the extensions of ELM, therefore, have been developed in an attempt to improve their performances.In general, there are two manners: one is to optimise the methodology of ELM (e.g., online sequential ELM [5] and evolutionary ELM [6]); the other is to refine the hidden layer of ELM for optimising the learning model (e.g., incremental ELM [7], pruned-ELM [8], and twostage ELM [9]).Several promising performances have been observed through these two schemes, at both theoretical and empirical levels.
Local coupled extreme learning machine (LC-ELM) ulteriorly develops the classical ELM algorithm by assigning an address to each hidden node in the input space.Given a learning sample, the hidden nodes will be activated at different levels in accordance with the distances from their locations to the input sample.In so doing, the fully coupled architecture between the input layer and the hidden layer in ELM gets simplified.And the complexity of the weight searching space will be reduced correspondingly.In fact, when the input information is modified, only those highly relevant hidden nodes will be influenced.This process is similar to the learning process of a brain: when a new learning sample is achieved, only relative knowledge needs to be revised with different memory inspired degrees.
In LC-ELM, the addresses and the window radiuses are preset empirically or randomly at present.However, the existence of the nonoptimal addresses and radiuses may yield an inappropriate underlying model, by accident.As a type of metaheuristics, the differential evolution (DE) approach [10] entails few or no assumptions regarding the problem being optimized and has the ability to search for the candidate solutions in very large spaces.In this case, this paper presents an approach termed evolutionary local coupled extreme learning machine (ELC-ELM).The proposed method makes use of DE in an attempt to address the challenges raised by the stochastically predetermined addresses and radiuses.Specifically, in ELC-ELM, DE is utilised to optimise the addresses and radiuses, according to the resulting root mean squared error (RMSE).Hence, the associated activation degrees are improved.This optimisation procedure is capable of searching for a superior framework of ELC-ELM, until the qualified fitness (consisting of the addresses and radiuses) or the maximum iteration step is reached.To evaluate the performance of this approach, comparative studies between ELC-ELM and the alternative ELMbased techniques (including the classical ELM, LC-ELM, and OSFuzzyELM [11]) are also presented through systematic experimental investigations.The results demonstrate that the proposed work entails improved performances in both regression and classification applications.
The remainder of this paper is structured as follows.An outline of the relevant background materials is presented in Section 2, including LC-ELM and the differential evolution algorithm.The optimisation of LC-ELM, termed evolutionary local coupled extreme learning machine (ELC-ELM), is then described in Section 3. In Section 4, the systematical comparisons between ELC-ELM and several relevant ELMbased algorithms (ELM, LC-ELM, and OSFuzzyELM) are carried out in an experimental evaluation.Section 5 concludes the paper with a short discussion of the potential further works.

Theoretical Background
For completeness, the basic ideas of local coupled extreme learning machine and differential evolution (DE) [10] are briefly recalled first.

Local Coupled Extreme Learning
Machine.Conventionally, extreme learning machine (ELM) algorithms [1,2] are implemented with a fully coupled framework as, in general, single input activates all hidden nodes.Such structure leads to the computation cost in proportion with the scale of a given network.In LC-ELM, a strategy to decouple the framework linking the input layer to the hidden layer in ELM was proposed.Different from the classical ELM, LC-ELM introduces a parameter, termed "address, " to each hidden node in the input space.Given a learning sample, the distances from the hidden nodes to the input sample are gauged by the fuzzy membership functions as the activated degree of the relevant hidden nodes.Due to the utilisation of these two improvements, this strategy implements the structural simplification of the weight searching space in LC-ELM.
For a dataset which contains  distinct objects (x  , t  ), where x  ∈ R  and t  ∈ R  , the output of an -hidden-node nonlinear LC-ELM is where (⋅) denotes the activation function.w  ,   , and   are the network weights.(⋅) is a fuzzy membership function.(x  , d  ) is the similarity between the th input and the th hidden node.d  ∈ R  is the address of the th hidden node.
Here, (⋅) is said to be piecewise continuous if it has only a finite number of discontinuities in any interval, and its left and right limits are defined (not necessarily equal) at each discontinuity [2].In order to adjust the width of the activated area, the underlying radius parameter  is employed in (⋅).
Note that, in (1), when the (⋅) is a constant function which is equal to 1, LC-ELM is reduced to the classical ELM.Moreover, when w  in (1) is equal to zero, the fuzzy membership function (⋅) is nonconstant, and the similarity function (x, d) is determined by the norm distance ‖x − d‖; then, the framework of LC-ELM is reduced to the ELM with RBF hidden nodes [2].In [12], both of these two cases of ELM are proven to own universal approximation capabilities.Therefore, it is reasonable to consider that, for an arbitrary multivariate continuous function, LC-ELM may have the ability to approximate the function under a given accuracy.
For the linear system generated by LC-ELM, the hidden-layer output matrix in LC-ELM is = 1, . . ., ,  = 1, . . ., . ( = [ 1 , . . .,   ]  × is the matrix of output weights and   denotes the weight vector connecting the th hidden node and the output layer.T = [t 1 , . . ., t  ]  × is the matrix of target outputs.Given such presentation, in the initialisation phase of LC-ELM, the hidden node address d  as well as the hidden layer parameters (w  ,   ) is assigned randomly as well.
Following the above discussion, a three-step LC-ELM algorithm can be summarised in Algorithm 1. [10] is known as one of the most efficient evolutionary algorithms [13].It has been widely used to tune the parameters in neural networks [14,15].Given a set of parameter vectors { , |  = 1, 2, . . ., } as a population at each generation , the basic learning process of DE involves the iteration of the following procedures.

Differential Evolution. Differential evolution (DE)
(i) Mutation.For each target vector  , ,  = 1, 2, . . ., , a mutant vector is generated according to with random and mutually different indices  1 ,   (ii) Crossover.In this procedure, the -dimensional trial vector is formed such that where rand () is the th evaluation of a uniform random number generator with an outcome in [0, 1],  is the crossover constant in [0, 1] which is specified independent of the algorithm, and () is a random chosen integer index ∈ which ensures that ^,+1 obtains at least one parameter from ^,+1 .
Overall, DE is an approach that optimises a problem through iterative attempt of improving a candidate solution with regard to a given measure of quality (i.e., fitness function).As a type of metaheuristics, such strategy entails few or no assumptions regarding the problem being optimised and has the ability to search in the large spaces (such as the weight searching space of LC-ELM) of candidate solutions [10].

Evolutionary Local Coupled Extreme Learning Machine
In LC-ELM, the strategy to decouple the linking architecture between the input layer and the hidden layer is guided by the predetermined addresses and the radiuses.Such parameters are preset randomly and empirically.However, the existence of the nonoptimal addresses and radiuses may yield an inappropriate model by accident.In order to make an optimisation for these addresses and the radiuses, an evolutionary local coupled extreme learning machine (ELC-ELM) method is hereby exploited in this paper.Such approach considers the tuples of addresses and radiuses as the solutions of an optimisation problem and searches for them by the use of DE.
In so doing, ELC-ELM can expect a more reliable implementation in a variety of applications.
For a dataset which contains  distinct objects (x  , t  ), where x  ∈ R  and t  ∈ R  , the main procedure of an -hidden-node ELC-ELM algorithm consists of the following.
(i) Random Generation of a Population of Individuals.Each individual in the population is composed of a set of the addresses and radiuses where . ., } and r ∈ R  are initialised within the range of [0, 1] at random.Then, these parameters are employed to measure the activated degrees of the hidden nodes.Note that, in this step, the input weights w and hidden node biases b are chosen within the range of [0, 1] randomly as well.However, they are excluded in the underlying populations in ELC-ELM.

(ii) Analytical Computation of the Output Weights for Each
Individual.This step is implemented by the use of the Moore-Penrose generalised inverse as with many other ELM algorithms, instead of running any iterative tuning.
(iii) Evaluation of Each Individual.The resulting root mean squared error (RMSE) of ELC-ELM is employed to assess the fitness of the individuals in this method, leading to a fitness value for each individual in the population.The mapping between the datasets and the fitness values is termed as the fitness function below.Specifically, in this paper, the RMSE is defined as Here, the parameters are defined the same as those in (2).
(iv) Application of the Three Steps of DE: Mutation, Crossover, and Selection.In addition to the RMSE, the norm of the output weights ‖‖ is also used as a criterion to be added to reinforce the selection procedure.In so doing, when the differences of the fitness between distinct individuals are (2) Calculate the output weight .
Algorithm 2: Evolutionary local coupled extreme learning machine.
insignificant, the one that leads to the minimum ‖‖ is selected.
(v) Determination of a New Population  ,+1 .This is computed as follows: The same as LC-ELM, the implementation of ELC-ELM is highly flexible in dealing with a variety of problems.Specifically, a collection of certain commonly used similarity measures [16,17] are listed as follows: As well as the similarity relations, the fuzzy membership functions in ELC-ELM also enjoy a variety of implementations.For instance, Gaussian function equation (10), the reversed sigmoid function equation (11), and reversed tanh function equation (12) are alternatives in practice:

Experimental Evaluation
This section presents a systematic evaluation of ELC-ELM experimentally.The results and discussions are divided into three parts.Each of them is carried out as a comparison between ELC-ELM and three alternative ELM algorithms: the classical ELM, OSFuzzyELM [11], and LC-ELM.
OSFuzzyELM is a variance of ELM which is based on the fuzzy rules.
The first part evaluates ELC-ELM in the aspect of function approximation.The comparison on real-world regression problems is performed in the second part.The third part provides an investigation of the classification performance of ELC-ELM on several benchmark datasets.Note that the fuzzy membership functions and the similarity relations in the following OSFuzzyELM, LC-ELM, and ELC-ELM methods are assigned empirically.In this example, 51 × 51 training and testing patterns are stochastically selected from a [−0.5, 0.5] × [−0.5, 0.5] square region, respectively.The specific configurations of the algorithms involved in this experiment are introduced in Table 1.For simplicity, RN stands for "random number."

Function Regression. This task is to approximate the Gabor function
A series of experiments are carried out in order to ascertain the variation in the resulting regression RMSEs by changing the number of the hidden nodes, or fuzzy rules, in the relevant algorithms.It is noteworthy that the number of hidden nodes (or fuzzy rules) is the tens ranging from 10 to 150.For each of these values, 10 trials have been conducted for the four ELM approaches.The average testing RMSEs are illustrated in Figure 1.Furthermore, across all the 15 numbers of the hidden nodes of each method, the lowest result is Overall, it can be observed from Figure 1 and Table 2 that ELC-ELM consistently outperforms the alternative methods with less hidden nodes at the levels of both the lowest and the average RMSEs.In particular, even with the 10 hidden nodes/ rules, ELC-ELM is still able to result in a comparable performance (RMSE = 0.0887).

Real-World Regression
Problems.This section presents the comparative studies between the proposed approach and the other ELM algorithms on the benchmark regression datasets taken from UCI Machine Learning Repository [18] and Statlib [19].The specifications of these datasets are shown in Table 3.
In this experiment, 10 trials are conducted for each problem.The training and testing data of the corresponding datasets are reshuffled at each trial of simulation.The configurations of the testing ELM algorithms are roughly the same as those in Table 1.However, the fuzzy membership functions of LC-ELM and ELC-ELM are reversed tanh function (12), and the classical ELM algorithm is constructed with RBF hidden nodes with multiquadric function () = (‖ − ‖ 2 +  2 ) 1/2 .The average RMSE, the corresponding standard deviation (SD), the average training/testing time, and the number of hidden nodes or fuzzy rules, over the training data and the testing data across the 10 trials, are listed in Table 4.
In Table 4, the superior RMSE results, which are lower than their counterparts by more than 0.005, will be shown in boldface.It can be seen from these results that, compared to the remaining three ELM algorithms, ELC-ELM performs better with the lowest RMSE and SD results in general.In particular, ELC-ELM gains significant improvements compared to the others for all datasets except the Abalone dataset.Occasionally, for Autoprice and CPU datasets, the training RMSE results of OSFuzzyELM are better than those of ELC-ELM.However, given the associated testing RMSE results, this significance may be caused by the overfitting that happened to OSFuzzyELM.Although, due to the complexity of DE, the evolution procedure in ELC-ELM is more time consuming than the conventional ones, the generalisation ability of ELC-ELM is improved.

Classification Problems.
In this section, the classification performances of ELC-ELM will be compared against those of ELM, OSFuzzyELM, and LC-ELM on several benchmark datasets [18].The specifications of the datasets are displayed in Table 5.Again, in this experiment, each problem will run 10 trails with reshuffling the training and testing data.Different from the configuration in Section 4.2, the sigmoid hidden nodes will be adopted in ELM and the window radius in LC-ELM is fixed to be 0.7 empirically.For these four ELM-based methods, the average classification accuracy (Accy), the standard deviation (SD), the average training/testing time, and the numbers of hidden nodes or fuzzy rules, over the training data and the testing data across the 10 trials, are listed in Table 6.The same numbers of hidden nodes are used in ELM, LC-ELM, and ELC-ELM.
Likewise, in Table 6, the superior classification accuracies which are higher than their counterparts by more than 0.5% will be marked in boldface.Overall, the LC-ELM method outperforms the other three ELM-based algorithms in all testing results.In particular, for Ecoli dataset, ELC-ELM also results in the best training accuracy.This indicates that the model of ELC-ELM enjoys a remarkable generalisation.Although OSFuzzyELM yields the better training results performance on 3 of 6 datasets, given the corresponding testing results, it suffers from overfitting.Again, the ELC-ELM costs more time to implement the classification models which perform better for testing data.
In summary, examining all of the results obtained, it is clear that, due to an evolved weight searching space by DE, ELC-ELM is more reliable than the others in addressing the regression and classification problems.Although, since DE is adopted in ELC-ELM, the evolution procedure is more time consuming than the conventional ones, the resulting model enjoys a greater generalisation ability.Even with a small number of hidden nodes, ELC-ELM still has the ability to gain certain considerable performances.Additionally, given the large number of the similarity relations and the fuzzy membership functions, ELC-ELM can be implemented into various forms, the same as LC-ELM.This mechanism allows ELC-ELM to have the ability to generate solutions for different problems flexibly.

Conclusion
This paper has presented an approach entitled evolutionary local coupled extreme learning machine (ELC-ELM), in an attempt to address the challenges raised by the stochastically predetermined addresses and radiuses of LC-ELM.The existence of such nonoptimal parameters may yield an inappropriate model of LC-ELM, accidentally.In ELC-ELM, the differential evolution (DE) algorithm is utilised to optimise this tuple (address and radius) and the associated activated degrees, according to the resulting root mean squared errors.This optimisation procedure will improve the underlying model of ELC-ELM, until the satisfactory solution (population) or the maximum iteration step is reached.Due to the massive existence of the fuzzy membership functions and the similarity relations, the implementation of ELC-ELM is highly flexible.Experimental results demonstrate that the proposed algorithm entails better performances, compared to three alternative ELM-based approaches.Though promising, further research will help strengthen the potential of the proposed approach.In particular, due to the use of DE, as the scale of the problem increases, the training progress of ELC-ELM will become more time consuming than the alternative methods in this paper.Although ELC-ELM enjoys a significant generalisation ability, the efficiency of ELC-ELM still requires enhancing in the future.Topics for further research also include a more comprehensive study of how ELC-ELM would perform with other fuzzy membership functions and similarity relations [20] as the alternative.
Correspondingly, the sensitivity of these chosen functions is also necessary to be exploited in theory.Furthermore, a more complete comparison of ELC-ELM against the other state-ofthe-art learning techniques over different datasets from real application domains [3,21] would form the basis for a wider series of topics for future studies.

( 1 )
Randomly assign hidden node parameters (w, b) and the hidden node address d. (2) Calculate the hidden layer output matrix H. (3) Calculate the output weight .Algorithm 1: Local coupled extreme learning machine.

Figure 1 :
Figure 1: Testing RMSEs of ELM, OSFuzzyELM, LC-ELM, and ELC-ELM with respect to different numbers of hidden nodes.

Table 3 :
Specifications of tested regression problems.

Table 5 :
Specifications of tested classification problems.