Developing a Novel Hybrid Biogeography-Based Optimization Algorithm for Multilayer Perceptron Training under Big Data Challenge

A Multilayer Perceptron (MLP) is a feedforward neural network model consisting of one or more hidden layers between the input and output layers. MLPs have been successfully applied to solve a wide range of problems in the fields of neuroscience, computational linguistics, and parallel distributed processing. While MLPs are highly successful in solving problems which are not linearly separable, two of the biggest challenges in their development and application are the local-minima problem and the problem of slow convergence under big data challenge. In order to tackle these problems, this study proposes a Hybrid Chaotic Biogeography-Based Optimization (HCBBO) algorithm for training MLPs for big data analysis and processing. Four benchmark datasets are employed to investigate the effectiveness ofHCBBO in trainingMLPs.The accuracy of the results and the convergence of HCBBO are compared to three well-known heuristic algorithms: (a) Biogeography-Based Optimization (BBO), (b) Particle Swarm Optimization (PSO), and (c) Genetic Algorithms (GA). The experimental results show that training MLPs by using HCBBO is better than the other three heuristic learning approaches for big data processing.


Introduction
The term big data [1][2][3] had been developed to describe the phenomenon of the increasing size of massive datasets in scientific experiments, financial trading, and networks.Since big data is always of big volume and has multiple varied types and fast update velocity [4], it is urgent for us to develop such a tool that can extract the meaningful information from big data.Neural networks (NNs) [5,6] are one of popular machine learning computational approaches, which are composed of several simple and interconnected processing elements and good at loosely modeling the neuronal structures of the human brain.A neural network can be represented as a highly complex nonlinear dynamic system [5], which has some unique characteristics: (a) high dimensionality, (b) extensive interconnectivity, (c) adaptability, and (d) ability to self-organize.
In the last decade, feedforward neural networks (FNNs) [6] have gained popularity in various areas of machine learning [7] and big data mining [1] to solve classification and regression problems.While the two-layered FNN is the most popular neural network used in practical applications, it is not suitable for solving nonlinear problems [7,8].The Multilayer Perceptron (MLP) [9,10], a feedforward neural network with one or more hidden layers between the input and the output layers, is more successful in dealing with nonlinear problems such as pattern classification, big data prediction, and function approximation.Previous research [11] shows that MLPs with one hidden layer are able to approximate any continuous or discontinuous function.Therefore, the study of MLPs with one hidden layer has gained a lot of attention from the research community.
Theoretically, the goal of the learning process of MLPs is to find the best combination of weights and biases of the connections in order to achieve minimum error for the given train and test data.However, one of the most common problems of training an MLP is that there is a tendency for the algorithm to converge on a local minimum.Since an MLP can consist of multiple local minima, it is easy to be trapped in one of them rather than converging on the global minimum.This is a common problem in most gradient-based learning approaches such as backpropagation (BP) based NNs [12].According to Mirjalili's research [13], the initial values of the learning rate and the momentum can also affect the convergence in case of BP based NNs, with unsuitable values for these variables resulting in their divergence.Thus, many studies focus on using novel heuristic optimization methods or evolutionary algorithms to resolve the problems of MLP learning algorithms [14].Classical applied approaches are Particle Swarm Optimization (PSO) algorithms [15,16], Ant Colony Optimization (ACO) [17], and Artificial Bee Colony (ABC) [18].However, the No Free Lunch (NFL) theorem [19,20] states that no heuristic algorithm is best suited for solving all optimization problems.Most of them have their own side effects and overall there has been no significant improvement [13] using these approaches.For example, Genetic Algorithms (GA) may reduce the probability of getting trapped in a local minimum, but they still suffer from slow convergence rates.
Recently, a novel optimization method called Biogeography-Based Optimization (BBO) [21] has been proposed.It is based on the motivation that geographical distribution of biological organisms can be represented by mathematical equations.It is a distributed paradigm, which seeks to simulate the collective behavior of unsophisticated individuals interacting locally with their environment to efficiently identify optimum solutions in complex search spaces.There are many related works of research [22][23][24][25] which show that the BBO algorithm is a type of evolutionary algorithm which can offer a specific evolutionary mechanism for each individual in a population.This mechanism makes the BBO algorithm more successful and robust on nonuniform training procedures than gradient-based algorithms.Moreover, compared with the PSO or ACO, the mutation operator of the BBO algorithm can enhance their exploitation capability.This allows the BBO algorithm to outperform PSOs in training MLPs.This has led to a great interest in applying the efficiency of BBO in training MLPs.In 2010, Ovreiu and Simon [24] trained a neuro-fuzzy network with BBO for classifying P-wave features for the diagnosis of cardiomyopathy.Research [13] used 11 standard datasets to provide a comprehensive test bed for investigating the abilities of the BBO algorithm in training MLPs.In this paper, we propose a hybrid BBO with chaotic maps trainer (HCBBO) for MLPs.Our approach employs chaos theory to improve the performance of the BBO with very little computational burden.In our algorithm, the migration and mutation mechanisms are combined to enhance the exploration and exploitation abilities of BBO, and a novel migration operator is proposed to improve BBO's performance in training MLPs.
The rest of this paper is organized as follows.In Section 2, a brief review of the MLP notation and a simple first-order training method are provided.In Sections 3 and 4, the HCBBO framework is introduced and analyzed.In Section 5, the computational results to demonstrate the effectiveness of the proposed improved hybrid algorithm are provided.Finally, Section 6 provides concluding remarks and suggests some directions for future research.

Review of the MLP Notation
The notation used in the rest of the paper represents a fully connected feedforward MLP network with a single hidden layer (as shown in Figure 1).This MLP consists of an input layer, an output layer, and a single hidden layer.The MLP is trained using a backpropagation (BP) learning algorithm.Let  denote the number of input nodes,  denote the number of hidden nodes, and  denote the number of output nodes.Let the input weights  , connect the th input to the th hidden unit and output weights  out(,) connect the th hidden unit to the th output.The weighted sums of inputs are first calculated by the following equation: where  is the number of the input nodes,   is the connection weight from the th node in the input layer to the th node in the hidden layer,   indicates the th input, and   means the threshold of the th hidden node.The output of each hidden node is calculated as follows: After calculating outputs of the hidden nodes, the final output can be defined as follows: where   is the connection weight from the th hidden node to the th output node and    is the bias of the th output node.
The learning error  (fitness function) is calculated as follows: where  is the number of training samples,  is the number of outputs,    is the desired output of the th input unit when the th training sample is used, and    is the actual output of the th input unit when the th training sample is used.
From the above equations, it can be observed that the final value of the output in MLPs depends upon the parameters of the connecting weights and biases.Thus, training an MLP can be defined as the process of finding the optimal values of the weights and biases of the connections in order to achieve the desirable outputs from certain given inputs.

The Proposed Hybrid BBO for Training an MLP
Biogeography-Based Optimization (BBO) is a populationbased optimization algorithm inspired by evolution and the balance of predators and preys in different ecosystems.Experiments show that results obtained using the BBO are at least competitive with other population-based algorithms.It has been shown to outperform some well-known heuristic algorithms such as PSO, GA, and ACO on some real-world problems and benchmark functions [21].
The steps of the BBO algorithm can be described as follows.In the beginning, the BBO generates a random number of search agents named habitats, which are represented as vectors of the variables in the problem (analogous to chromosomes in GA).Next, each agent is assigned emigration, immigration, and mutation rates which simulate the characteristics of different ecosystems.In addition, a variable called HSI (the habitat suitability index) is defined to measure the fitness of each habitat.Here, a higher value of HSI indicates that the habitat is more suitable for the residence of biological species.In other words, a solution of the BBO with a high value of HSI indicates a superior result, while a solution with a low value of HSI indicates an inferior result.
During the course of iterations, a set of solutions is maintained from one iteration to the next, and each habitat sends and receives habitants to and from different habitats based on their immigration and emigration rates which are probabilistically adapted.In each iteration, a random number of habitants are also occasionally mutated.That makes each solution adapt itself by learning from its neighbors as the algorithm progresses.Here, each solution parameter is denoted as a suitability index variable (SIV).
The process of BBO is composed of two phases: migration and mutation.During the migration phase, immigration (  ) and emigration (  ) rates of each habitat follow the model as depicted in Figure 2. A high number of habitants in a habitat increase the probability of emigration and decrease the probability of immigration.During the mutation phase, the mutation factor in BBO keeps the distribution of habitants in a habitat as diverse as possible.In contrast with the mutation factor in GA, the mutation factor of BBO is not set randomly; it is dependent on the probability of the number of species in each habitat.
The mathematical formula of immigration (  ) and emigration (  ) can be written as follows:

Rate
Species count where  is the maximum immigration rate,  is the maximum emigration rate,  max is the maximum number of habitants, and   is the habitant count of .
The mutation of each habitat, which improves the exploration of BBO, is defined as follows: Here  max is the maximum value of mutation defined by user,  max is the greatest mutation probability of all the habitats, and   is the mutation probability of the th habitat, which can be obtained as The complete process of the BBO algorithm is described in Algorithm 1; here  :  → {  , HSI  } initializes an ecosystem of habitats and computes each corresponding HSI and Γ = (, , , , Ω, ) is a transition function which modifies the ecosystem from one optimization iteration to the next.The elements of the 6-tuple can be defined as follows:  is the number of habitats;  is the number of SIVs;  is the immigration rate;  is the emigration rate; Ω is the migration operator; and  is the mutation operator.

The Proposed Hybrid CBBO Algorithm for Training an MLP
There are three different approaches for using heuristic algorithms for training MLPs.In the first approach, heuristic algorithms are employed to find a combination of weights  :  → {  , HSI  } While (condition = ) Γ = (, , , , Ω, ) end Algorithm 1: Pseudocode of BBO for optimization problems.and biases to provide the minimum error for an MLP.In the second approach, heuristic algorithms are utilized to find the proper architecture for an MLP to be applied to a particular problem.In the third approach, heuristic algorithms can be used to tune the parameters of a gradient-based learning algorithm.
Mirjalili et al. [13] employed the basic BBO algorithm to train an MLP using the first approach, and the results demonstrate that BBO is significantly better at avoiding local minima compared to PSO, GA, and ACO algorithms.However, the basic BBO algorithm still has some drawbacks, such as (a) the large number of iterations needed to reach the global optimal solution and (b) the tendency to converge to solutions which may be locally the best.Many methods have been proposed to improve the capabilities for the exploration and exploitation of the BBO algorithm.[26] refers to the study of chaotic dynamical systems, which is embodied by the so-called "butterfly effect."As nonlinear dynamical systems, chaotic systems are highly sensitive to their initial conditions, and tiny changes to their initial conditions may result in significant changes in the final outcomes of these systems.

Chaotic Systems. Chaos theory
In this paper, chaotic systems are applied to BBOs instead of random values [25][26][27] for their initialization.This means that chaotic maps substitute the random values to provide chaotic behaviors to heuristic algorithms.During the processing of the BBO algorithm, the most important random values are calculated to choose a habitat for emigrating the new habitants during the migration phase.We utilize chaotic maps, which use the logistic model in (8), and choose a value from the interval of [0, 1], whenever there is a need for a random value.
here  +1 ∈ [0, 1] and  are named logistic parameters.When  equals 4, the iterations produce values which follow a pseudorandom distribution.This means that a tiny difference in the initial value of  1 will give rise to a large difference in its long-time behavior.We employ this feature to avoid a local convergence of the BBO algorithm.

Habitat Suitability Index (Fitness Function).
During the training phase of an MLP, each training data sample should be involved in calculating the HIS of each candidate solution.
In this work, the Mean Square Error (MSE) is utilized for evaluating all training samples.The MSE is defined as follows: here  is the number of training samples,  is the number of outputs,    is the desired output of the th input unit when the th training sample is used, and    is the actual output of the th input unit when the th training sample is used.Thus, the HSI value for the th candidate is given by HSI(  ) = (  ).

Opposition-Based Learning.
To improve the convergence of BBO algorithm during the mutation phase, a method named opposition-based learning (OBL) has been used in [22].The main idea of opposition-based learning is to consider an estimate and its opposite at the same time to achieve a better approximation of the current candidate solution.
Thus, the vector and its opposite vector are evaluated simultaneously to obtain the fitter one.

Outline of HCBBO for MLP.
In this section, the main procedure of HCBBO is described.To guarantee an initial population with a certain quality and diversity, the initial population is generated using a combination of the chaotic system and the OBL approach.By fusing the local search strategies with the migration and mutation phases of the BBO algorithm, the exploration and exploitation capabilities of the HCBBO can be well balanced.The main procedure of our proposed HCBBO to train an MLP can be described as Algorithm 2.

Experimental Analysis
This study focuses on finding an efficient training method for MLPs.To evaluate the performance of the proposed HCBBO algorithm in this paper, a series of experiments were developed using the Matlab software environment (V2009).The system configuration is as follows: (a) CPU: Intel i7; (b) RAM: 4 GB; (c) operating system: Windows 8. Based on the works described in [13,28,29], we choose four publicly available classification big datasets to benchmark our system: (1) balloon, (2) iris, (3) heart, and (4) vehicle.All these datasets are freely available from the University of California at Irvine (UCI) Machine Learning Repository [30], thus ensuring replicability.And the characteristics of these datasets are listed in Table 1.
(1) input: habitat size , maximum migration rate  and  (emigration and immigration rate), the maximum mutation rate  max ; (2) Initialize set of MLPs (habitats) by chaos maps on formula Eq. ( 8); (3) For each habitat, calculate its mean square error by relative parameters based on formulas (9).And the basic rule of fitness function is the better performance maintains the smaller value of MSE.Then elite habitats are identified by the values of HSI.(4) Combing MLPs according to immigration and emigration rates based on Eq. ( 6) Probabilistically use immigration and emigration to modify each non-elite habitat based on Eq. ( 7).
(5) Select number of MLPs and recomputed (mutate) some of their weights or biases by chaos maps.(6) Save some of the MLPs with low MSE; (7) This loop will be terminated if a predefined number of generations are reached or an acceptable problem solution has been found, otherwise go to step (3) for the next iteration.(8) output: the MLP with minimum MSE (HSI).
Algorithm 2: The framework of HCBBO algorithm.In this paper, we compare the performances of 4 algorithms, BBO, PSO, GA, and HCBBO, over the benchmark datasets described in Table 1.Since manually choosing appropriate parameters for each of these algorithms is time-consuming, the initial parameters and property structures for both the classical BBO algorithm and HCBBO algorithms (which were adjusted as Table 2) were chosen as in paper [13].
In order to increase the accuracy of the experiment, each algorithm was run 20 times, and different MLP structures will be used to deal with different datasets, which were listed in Table 3.
The running time (RT) and convergence curves of each algorithm are shown in Figures 3-7.From Figure 3, it can be observed that the average computational time of HCBBO is 8 to 13% lower than the best time obtained for the BBO.It is also lower than the computational time of all the other algorithms compared in this experiment.This decrease in the running time can be attributed to the fact that the HCBBO's search ability was enhanced by OBL.The convergence curves in Figures 4-7 show that, among all the algorithms, HCBBO has the fastest convergence behavior on all the datasets.In Figure 4, under the same experimental conditions, HCBBO achieved the optimal values for its parameters after 150 generations while BBO could not converge to an optimal value even after 200 generations.The same pattern in faster convergence for the HCBBO was observed for the other classical problems (Figures 5-7).Statistically speaking, HCBBO performs the best on all the classification datasets, since it is able to avoid local minima better than any  other algorithm.And the classification results obtained by HCBBO are better than all other algorithms for the chosen datasets.
The experimental results of mean classification rate are provided in Table 4. Statistically speaking, HCBBO has the best results in all of the classification datasets because it avoids local minima better.

Figure 1 :
Figure 1: An MLP with one hidden layer.

Figure 3 :
Figure 3: Total running time of each algorithm.

Table 2 :
The main parameters of BBO and HCBBO.

Table 4 :
Experimental results for classification rate.