New Dandelion Algorithm Optimizes Extreme Learning Machine for Biomedical Classification Problems

Inspired by the behavior of dandelion sowing, a new novel swarm intelligence algorithm, namely, dandelion algorithm (DA), is proposed for global optimization of complex functions in this paper. In DA, the dandelion population will be divided into two subpopulations, and different subpopulations will undergo different sowing behaviors. Moreover, another sowing method is designed to jump out of local optimum. In order to demonstrate the validation of DA, we compare the proposed algorithm with other existing algorithms, including bat algorithm, particle swarm optimization, and enhanced fireworks algorithm. Simulations show that the proposed algorithm seems much superior to other algorithms. At the same time, the proposed algorithm can be applied to optimize extreme learning machine (ELM) for biomedical classification problems, and the effect is considerable. At last, we use different fusion methods to form different fusion classifiers, and the fusion classifiers can achieve higher accuracy and better stability to some extent.


Introduction
Nature has evolved over hundreds of millions of years, showing the perfect efficiency and magic. People learn a lot from the study of natural systems and use them to develop new algorithms and models to solve complex problems. Therefore, imitation of biological intelligence behavior, drawing on its intelligent mechanism, making many new ways to solve complex problems continue to emerge. Through the modeling of natural intelligence, a number of intelligent algorithms have been proposed, including genetic algorithms [1], ant colony algorithm [2], particle swarm algorithm [3,4], center gravity search algorithm [5,6], and quantum computing [7]. Each intelligent algorithm corresponds to an actual source of inspiration. For example, DNA calculations are based on a double helix structure proposed by Watson and Crick who win the Nobel Prize in physiology or medicine and a polymerase linker response proposed by a Nobel Prize winner Mullis [8]. Artificial bee colony algorithm is based on the decoding of the bees dance behavior [9]. Artificial immune algorithm is based on immune network theory [10]. The bat algorithm is presented by simulating the bat echo positioning behavior [11]. Inspired by observing fireworks explosion, enhanced fireworks algorithm is proposed for global optimization of complex functions [12]. In recent years, many intelligent algorithms have been applied in engineering problems successfully [13][14][15][16][17][18][19][20], which not only reduce the time consumed but also can guarantee better performance than manual adjustment.
The above-mentioned intelligent algorithms are all parallel to search for the optimal solution. However, the individuals in them are using the same mechanism in the process of searching. In this paper, inspired by the behavior of dandelion sowing, a novel swarm intelligence algorithm called dandelion algorithm (DA) is proposed for function optimization. Such an optimization algorithm has advantages such as a simple computational process and ease of understanding. In DA, dandelion populations are divided into two subpopulations, suitable for sowing and unsuitable for sowing, and then perform different sowing ways for different subpopulations. Meanwhile, another way of sowing is to carry out the subpopulation which is suitable for sowing, in order to avoid falling into the local optimum. To validate the performance of the proposed DA, in our simulation, we apply the twelve standard functions and compare the proposed algorithm (DA) with bat algorithm (BA), particle swarm optimization (PSO), and enhanced fireworks algorithm (EFWA). The results show that the proposed algorithm has better overall performance on the test functions.
Extreme learning machine is an advanced neural network [21]. The input weight and the hidden layer bias are randomly generated according to the number of input neurons and hidden layer nodes, and the output weight matrix is calculated according to the Moore-Penrose generalized inverse of the hidden layer output matrix. Although the extreme learning machine has many advantages over traditional neural networks, it causes its instability due to its random input weight and hidden layer bias. In order to obtain higher accuracy, this paper proposes a method to optimize the extreme learning machine with proposed algorithm (DA) for biomedical classification problems. Moreover, we combine multiple classifiers to form a fusion classifier with different fusion methods for biomedical classification, and the results show that it has better performance.
The paper is organized as follows. In Section 2, the dandelion algorithm is introduced. The simulation experiments and results analysis are given in detail in Section 3. Using DA to optimize ELM and combining multiple classifiers to form a fusion classifier with different fusion methods are presented in Section 4. Finally, the conclusion is summarized in final part.

DA Framework.
In DA, we assume that the earth is divided into two types: suitable for dandelion sowing and unsuitable for dandelion sowing, and the dandelion living in suitable environment is called core dandelion (CD); on the contrary, the dandelions except for the core dandelion are called assistant dandelions (AD).
Without loss of generality, consider the following minimization problem: the objective is to find an optimal with minimal evaluation (fitness) value. When a dandelion is sown, the seeds of dandelion will be scattered around the dandelion. In our view, the process of dandelion sowing can be seen to search an optimal in a particular space around a point. For example, now we need to find a point to satisfy = ( ); then using the dandelion to sow the seeds in potential space until finding a point is infinitely close to the point . Mimicking the process of dandelion sowing, a rough framework of the DA is depicted in Figure 1.
In DA, with each generation of sowing, firstly, we need to select dandelions; that is to say, here we have dandelions Computational Intelligence and Neuroscience 3 to sow. Then after sowing, the locations of seeds are obtained and assessed. The algorithm will stop when the optimal location is found. Otherwise, the algorithm needs to select other dandelions from the all seeds and dandelions for the next generation of sowing.
From Figure 1, we can see that the process of sowing and selection strategy are important for DA, and they are, respectively, described in detail in the following.

Design of DA.
In this section, we will introduce the design of the various operators of the dandelion algorithm and the mathematical model in detail. In the DA, we assume that there are only two types of dandelion: core dandelion (CD) and assistant dandelions (AD), and different types of dandelions perform different sowing ways. Meanwhile, another way of sowing, called mutation sowing, is designed to avoid falling into local optimum. Finally, the selection strategy is designed to select dandelions to enter the next generation.
To sum up, the dandelion algorithm consists of normal sowing, mutation sowing, and selection strategy.

Normal Sowing.
In the DA, we stipulate that the core dandelion can produce more seeds, and the assistant dandelion produces less seeds, because the land with the core dandelion is suitable for the seeds to grow. The number of seeds produced by the sowing is calculated based on its relative dandelions fitness values in the dandelion population. Assume that the maximum number of seeds is max and the minimum number of seeds is min; the number of seeds for each dandelion is calculated as follows.
where max = max( ( )), min = min( ( )), and is the machine epsilon to avoid the denominator which is equal to 0. From (2), for the minimization problem, we can see that the dandelion with small fitness value will sow more seeds, and the dandelion with large fitness value will sow less seeds but can not be less than the minimum number of seeds.
In DA, dandelions are divided into two types: assistant dandelions and core dandelion; the core dandelion (CD) is the dandelion with the best fitness, and it is calculated by The calculation of the radius of the assistant dandelions and the core dandelion is different. The assistant dandelions' sowing radius (except for CD) is calculated by where t is the the number of iterations, UB is upper bound of the function, LB is lower bound of the function, and infinite norm is the maximum of all dimensions. From (4), at the beginning of the algorithm, the sowing radius for the assistant dandelions is set to the diameter of the search space. After that, it is set to difference of distance between current assistant dandelion and core dandelion; here we use infinite norm to measure distance. Moreover, in order to slow down the convergence rate to improve the global search performance, on the basis of the above, we added the sowing radius of assistant dandelion in the previous generation, and the is a weight factor, to adjust the impact of the sowing radius of previous generation on the current sowing radius dynamically. The weight factor is designed as follows.
where Fe is the current function evaluations and Fe max is the maximum number of function evaluations. It can be seen that the value of changed from large to small, it means that the sowing radius of the previous generation has less and less impact on the current sowing radius. But for the CD, it is another way to calculate the sowing radius, which is adjusted based on the CD in the last generation; it is designed as follows.
where CD ( ) is the sowing radius of the CD in generation . At the beginning of the algorithm, the sowing radius for the CD is also set to the diameter of the search space. and are the withering factor and growth factor, respectively, and reflects the growth trend, which is calculated by where is the machine epsilon to avoid the denominator which is equal to 0. When = 1, it means that the algorithm does not find a better solution, and the place is not suitable for sowing; thus, we need to reduce the sowing radius, and the withering factor is designed to describe this situation; of course can not be too small; the value should be in [0.9, 1). On the contrary, when ̸ = 1, it means that the algorithm finds a better solution than last generation, and the place is suitable for sowing, and the sowing radius should be enlarged, which can speed up the convergence rate; based on this, the growth factor is proposed; of course can not be too large; the value should be in (1, 1.1].
Algorithm 1 describes the process of the normal sowing in DA. min and max refer to the lower and upper bounds of the search space in dimension .

Mutation
Sowing for the Core Dandelion. In order to avoid falling into the local optimal and keep the diversity of (1) Calculate the number of seeds (2) Calculate the ADs of sowing radius (3) Calculate the CD of sowing radius CD (4) Set = rand(1, ) (5) For = 1 : do (6) If ∈ then (7) If is core dandelion then Algorithm 1: Generating normal seeds.
(1) Find out the core dandelion CD in current population Endif (9) End if (10) End for Algorithm 2: Generating mutation sparks. the population, another way to sow, called mutation sowing, is proposed for the CD. It is defined as where Levy( ) is a random number generated by the Levy distribution, and it can be calculated with parameter = 1.5. Algorithm 2 is performed for mutation sowing for CD to generate location of seeds. This algorithm is performed times ( is a constant to control the number of mutation seeds).

Selection Strategy.
In the DA, it requires that the current best location is always kept for the next iteration. In order to keep the diversity, the remaining locations are selected based on disruptive selection operator. For location , the selection probability is calculated as follows.
where is the fitness value of the objective function, avg is the mean of all fitness values of the population in generation , and SN is the set of all dandelions (dandelions, normal seeds, and mutation seeds). The selection probabilities determined by this method can give both good and poor individuals more chances to be selected for the next iteration, while individuals with midrange fitness values will be eliminated. This method can not only keep the diversity of the population but also reflect the better global searching ability.

Summary.
Assume that the number of dandelion populations is . Algorithm 3 summarizes the framework of the DA. During each sowing, two types of seeds are generated, respectively, according to Algorithms 1 and 2. Firstly, the number of seeds and sowing radius are calculated based on the quality of the corresponding dandelion. Moreover, another type is designed with a Levy mutation, which can help to avoid falling into local optimum. After that, locations are selected for the next generation. In the DA, we assume that the total number of normal seeds is , and the number of mutation seeds is . So approximate + + function evaluations are done in each generation. Suppose the optimum of a function can be found in generations; then we can deduce that the complexity of the DA is ( × ( + + )). Obtain CD by Eq. (5)  (6) Obtain the number of seeds by Eq. (2)  (7) Produce normal seeds using Algorithm 1 (8) Produce mutation seeds using Algorithm 2 (9) Assess all seeds' fitness (10) Retain the best seed as a dandelion (11) Selectother − 1 dandelions randomly by Eq. (9) (12) Until termination condition is satisfied (13) Return the best fitness and a dandelion location Algorithm 3: Framework of DA.

Experiments
To assess the performance of DA, it is compared with BA, EFWA, and PSO.

Experiment Settings.
The parameters of DA, BA, EFWA, and PSO are setting as Table 1, and the settings are applied in all the comparison experiments.
In Table 1, is population size, is the loudness, is the the rate of pulse emission, is the total number of sparks, and and are fixed constant parameters that confine the range of the population size. max is the the maximum explosion amplitude and is the number of mutation dandelions. In the experiment, the function of each algorithm is repeated 51 times, and the final results after the 300,000 function evaluations are presented. In order to verify the performance of the algorithm proposed in this paper, we use the 12 different types of test functions, which are listed in Table 2 and their expressions are listed in the appendix.
Finally, we use Matlab R2014a software on a PC with a 3.2 GHz CPU (Intel Core i5-3470), and 4 GB RAM, and Windows 7 (64 bit).  Table 3 shows the optimization accuracy of the four algorithms on twelve benchmark functions, which are averaged over 51 independent runs. It can be seen that the proposed DA clearly outperforms among BA, EFWA, and PSO on most functions. In the function Six-Hump Camel-Back, the four algorithms almost achieve the same accuracy.

Comparison of Convergence Speed.
Besides optimization accuracy, convergence speed is quite essential to an optimizer. To validate the convergence speed of the DA, we conducted more thorough experiments. Figure 2 shows the convergence curves of the DA, the BA, the EFWA, and the PSO on twelve benchmark functions averaged over 51 runs. From these results, in the function Six-Hump Camel-Back, the four algorithms have the same convergence speed, except for the fact that, in the other functions, we can arrive at a conclusion that the proposed DA has a much faster speed than the BA, the EFWA, and the PSO.
3.3. Discussion. As shown in the experiments, we can see that the proposed algorithm DA is a very promising algorithm. It is potentially more powerful than bat algorithm, particle swarm optimization, and enhanced fireworks algorithm. The primary reason lies in the following two aspects.
(1) In the DA, the dandelion population is divided into two separate populations: core dandelion and assistant dandelion, and these two types of dandelion are applied in different ways to sow seeds. The two dandelion populations complement each other and coevolve to fully extend the search range, which increases the probability of finding the optimal location.
(2) Two types of seeds are generated to avoid falling into local optimal and keep the diversity of seeds, and the selection strategy is a mechanism for keeping diversity. Therefore, the DA has the capability of avoiding premature convergence.

Brief Introduction of Extreme Learning Machine.
Extreme learning machine (ELM) is a neural network algorithm, which is proposed by Huang [21]. The biggest feature of ELM is faster and has better generalization performance than the traditional neural network learning algorithm (especially the single hidden layer feed-forward neural network).
where is the single hidden layer input weight, is the single hidden layer output weight, and is the single hidden layer bias.
The purpose of neural network training is to minimize the error of the output value: From (11), we can see that there are , , that make the following formula set up.
expressed by the matrix as = .
In the extreme learning machine, once the input weight and the hidden layer bias are randomly determined, the output matrix of the hidden layer is uniquely determined. Then the training single hidden layer neural network is transformed into solving a linear equation = , and the output weights can be determined, = −1 .

Optimization for Extreme Learning Machine with DA.
In ELM, the single hidden layer input weights and bias are randomly generated based on the number of hidden layer nodes and neurons and then calculate the output weight matrix. Randomly generated input weights and bias are only a few of which are superior. And even part of the input weight and bias is 0, which leads to the result that the hidden layer node is invalid directly.
In order to solve the above problems of ELM, a new dandelion algorithm is proposed to optimize the ELM (DA-ELM). DA is a new evolutionary algorithm with strong advantages in accuracy and convergence performance. The DA chooses the best input weight and bias matrix by the iteration. The most suitable input weight and bias form a new matrix, and then the output weight matrix is calculated.
The specific steps of the DA-ELM algorithm are as follows.
Step 1. Set the initial parameters of the ELM, including the number of hidden layer nodes and the stimulus function ( ).
Step 2. Initialize the parameters of the DA (refer to Table 1).
Step 3. Initialize the dandelion population and randomly generate the initial solution. The dimension of each solution is × ( + 1) ( is the number of neurons). × dimension represents the input weight, and the remaining dimension represents the hidden layer bias.
Step 4. Perform the dandelion algorithm to find the optimal solution, and the root mean square error (RMSE) calculated from the training sample is taken as the fitness function of the dandelion algorithm.
Step 5. Determine whether the DA has reached the maximum number of iterations, and if it is satisfied, go to Step 6; otherwise return to Step 4 to continue the algorithm.
Step 6. The optimal input weight and the hidden layer bias can be obtained by the returned optimal solution.
Step 7. Use the input weight value and the hidden layer bias to train the ELM.

Parameter Settings.
In order to measure the relative performance of the DA-ELM, a comparison among the DA-ELM, ELM, PSO-ELM, BA-ELM, and EFWA-ELM is conducted on the biomedical datasets. The algorithms compared here are described as follows.
(1) ELM: basic ELM with randomly generated hidden nodes and random neurons (2) PSO-ELM: using PSO to optimize for extreme learning machine (3) BA-ELM: using BA to optimize for extreme learning machine (4) EFWA-ELM: using EFWA to optimize for extreme learning machine In this simulation, the performance of DA-ELM is evaluated on 4 real biomedical datasets classification problems from the UCI database, namely, the EEG Eye State dataset (EEG), the Blood Transfusion Service Center dataset (Blood), the Statlog (Heart) dataset (Statlog), and the SPECT Heart dataset (SPECT). The following lists a detailed description of these 4 biomedical datasets.
(1) EEG: the dataset consists of 14 EEG values and a value indicating the eye state.
(2) Blood: the dataset is taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan.
(3) Statlog: this dataset concerns the presence of heart disease in the patient by using 13 attributes.
(4) SPECT: data on cardiac Single Proton Emission Computed Tomography (SPECT) images, each patient classified into two categories: normal and abnormal.
The specification of these 4 datasets is shown in Table 4. All the attributes (inputs) have been normalized to the range of [−1, 1] in our simulations and, for each trial of simulations, the training set and testing set are randomly generated from the whole dataset with the partition number shown in Table 4.
The parameters of the BA, the EFWA, the PSO, and the DA are setting as Table 1, and the algorithms all have 1000

Optimization of ELM by Dandelion Algorithm for
Biomedical Classification. In this section, we propose a new method to optimize the extreme learning machine by using the DA. The DA is used to optimize the input weight and the hidden layer bias of ELM. Combining the advantages of DA and ELM, the algorithm of DA-ELM is proposed.
The averaging classification results of multiple trials for all these four datasets are shown in Table 5. The one with the best testing rate or the best deviation is shown in boldface. We can easily find that the DA-ELM has higher classification accuracy and better stability among five algorithms in the biomedical classification problems.

Comparison between DA-ELM and Fusion Classifiers
for Biomedical Classification. In order to further improve the accuracy and stability of the classification, we combine five classifiers to form a fusion classifier. There are some fusion methods available, such as majority voting method [22], maximum method [22], minimum method [22], median method [22], a new method for fusing scores corresponding to different detectors [23], and fusion of nonindependent detectors [24]. Here we select some simple and effective fusion methods to form fusion classifiers. The classifiers compared here are described as follows.   The averaging classification results for DA-ELM and the four fusion classifiers are shown in Table 6. The one with the best testing rate or the best deviation is shown in boldface. We can find that the Max-ELM (fusion classifier) has achieved the higher accuracy and the smallest deviation in these datasets, and the Max-ELM has better stability than other fusion methods and DA-ELM.

Conclusions
The major contribution of this article is to propose a new dandelion algorithm (DA) for function optimization and optimize the extreme learning machine for biomedical classification problems. From the test results, it is found that the DA can usually find solutions correctly and it clearly outperforms the BA, the EFWA, and the PSO on twelve benchmark functions in terms of both optimization accuracy and convergence speed. Moreover, we use DA to handle the ELM optimization; the results of the ELM optimization also showed that the DA has high performance in unknown, challenging search spaces. At last, we combine five classifiers to form different fusion classifiers with different fusion methods, and the results show that the fusion classifier (Max-ELM) not only has a relatively high accuracy but also has better stability.
For future work, we will seek a deep theoretical analysis on the DA and try to apply the DA to more practical engineering applications. However, the DA is proposed by this article might not be thorough, and we hope that more researchers can participate in the promotion and test sincerely. Moreover, we will combine other neural networks with DA-ELM to achieve higher classification accuracy and better stability.

Appendix
The expression of the Sphere function is

Conflicts of Interest
The authors declare no conflicts of interest.