An Improved Brain Storm Optimization with Differential Evolution Strategy for Applications of ANNs

Brain StormOptimization (BSO) algorithm is a swarm intelligence algorithm inspired by human being’s behavior of brainstorming. The performance of BSO ismaintained by the creating process of ideas, but when it cannot find a better solution for some successive iterations, the result will be so inefficient that the population might be trapped into local optima. In this paper, we propose an improved BSO algorithm with differential evolution strategy and new step size method. Firstly, differential evolution strategy is incorporated into the creating operator of ideas to allow BSO jump out of stagnation, owing to its strong searching ability. Secondly, we introduce a new step size control method that can better balance exploration and exploitation at different searching generations. Finally, the proposed algorithm is first tested on 14 benchmark functions of CEC 2005 and then is applied to train artificial neural networks. Comparative experimental results illustrate that the proposed algorithm performs significantly better than the original BSO.


Introduction
In the past few decades, swarm intelligence algorithms, which were derived from the concepts, principles, and mechanisms of nature-inspired computation, have attracted more and more attention from researchers. Though swarm intelligence is a new category of stochastic, population-based optimization algorithms compared with evolutionary algorithms (EAs) such as Genetic Algorithms (GA) [1], Evolutionary Programming (EP) [2], Evolutionary Strategies (ES) [3], Genetic Programming (GP) [4], and differential evolution (DE) [5], so far, a lot of swarm intelligence optimization algorithms, such as Particle Swarm Optimization (PSO) [6,7], have been proposed to tackle many challenging complex computational problems in science and industry.
Brain Storm Optimization (BSO) algorithm [8,9] is a swarm intelligence algorithm inspired by human being's behavior of brainstorming. Derived from human creative problem-solving process, BSO has achieved more and more attention and has been applied to the function of optimization, optimal satellite formation reconfiguration [10], the design of DC Brushless Motor [11], economic dispatch considering wind power [12], multiobjective optimization [13,14], and so forth. Due to the flexibility and scalability of the original BSO algorithm [15], many new versions of BSOs have been designed and implemented. To improve the performance and reduce the computation burden of BSO algorithm, Zhan et al. [16] modified the grouping operator and the creating operator and designed a BSO variant named Modified BSO (MBSO). Cheng et al. [17] used two kinds of partial reinitialization strategies to improve the population diversity in BSO algorithm. Recently, Yang et al. [18] proposed advanced discussion mechanism-based brain storm optimization (ADMBSO), which incorporated intercluster and intracluster discussions into BSO to control global and local searching ability. However, optimization problems have become more and more complex, from simple unimodal functions to hybrid rotated shifted multimodal functions. To the best of our knowledge, all variants of BSO have not been tested on the complex benchmark functions of CEC2005 [19]. Hence, more effective improvement to the original BSO algorithms is always necessary.

Mathematical Problems in Engineering
Among existing metaheuristics, differential evolution (DE) is a simple yet powerful global optimization technique with successful applications in various areas [5]. To achieve the most satisfactory optimization performance, an improved BSO algorithm with differential evolution strategy and new step size method named BSODE, which can maintain both the exploration ability and the diversity of population, is proposed in the paper. The improved algorithm will be first tested on 14 benchmark functions of CEC 2005 and then will be applied to train artificial neural networks.
The rest of this paper is organized as follows. Section 2 briefly introduces the original BSO algorithm and the basic DE operator. Section 3 describes the improved BSO with differential evolution strategy (BSODE). Section 4 presents the tests on 14 benchmark functions. The applications for artificial neural network are given in Section 5, followed by conclusions in Section 6.

Brain Storm Optimization.
The BSO algorithm is motivated by the philosophy of brainstorming, and brainstorming is a widely used tool for increasing creativity in organizations which has achieved wide acceptance as a mean of facilitating creative thinking [20]. A potential solution in the solution space represents an idea in BSO. BSO sticks to the rules of interchange of ideas by a team and uses clustering, replacing, and creating operators to produce global optimum generation by generation. In the procedure of BSO, firstly, ideas are randomly initialized within the solution space, and then each idea is evaluated according to its fitness function. Next points of cluster center are also randomly selected and initialized like ideas, where is less than . As presented in [8,9], the rest of the process in BSO can be described as follows.

Clustering Individuals.
Clustering is a process of grouping similar objects together, and, during each generation, all the ideas are clustered into clusters according to idea (or individual) features. The best idea in each center is chosen as its cluster center, and the clustering operation can refine a search area. -means is a popular algorithm used in clustering; herein it is used in clustering operation.

Disrupting Cluster Center.
Cluster center disrupting operation randomly chooses a cluster center and replaces it with a newly generated idea with a probability of p replace, which is also named as the replacing operation. The value of p replace is utilized to control the probability to replace a cluster center by a randomly generated solution. This is used to avoid premature convergence and help individuals "jump out" of the local optima.

Creating Individuals.
To maintain the diversity of population, a new idea (individual) can be generated based on one idea or two in one cluster or two, respectively. In the creating operation, BSO first randomly chooses one cluster or two according to a probability of p one. Then, in the basis of choosing one cluster or two, an idea of cluster center or a random idea is selected with a probability of p one center and p two center. The selecting operation is defined below as where rand is a random value between 0 and 1. After choosing one idea or two, the selected idea(s) is updated according to where normrnd is the Gaussian random value with mean 0 and variance 1 and is an adjusting factor slowing the convergence speed down as the evolution goes, which can be expressed as where rand is a random value between 0 and 1. The max iteration and current iteration denote the maximum number of iterations and current number of iterations, respectively. The logsig is a logarithmic sigmoid transfer function, and such form is beneficial to global search ability at the beginning of the evolution and enhances local search ability when the process is approaching to the end. is a predefined parameter for changing slopes of the logsig function. The new created idea is evaluated, and if the fitness value is better than the current idea, the old idea will be replaced by the new one.

Differential
Evolution. Differential evolution (DE) [5] is a population-based and stochastic optimization algorithm. Like other EAs, DE begins with an initial population vector containing a number of target individuals. The current generation evolves into the next generation through evolutionary operations repeatedly until the termination condition is attained. For the original DE, the mutation, crossover, and selection operators are defined as follows.

Mutation Operation.
The mutation operation of classical DE scheme (DE/rand/bin) can be summarized as follows: where is the generation number, the indices 1, 2, and 3 are mutually exclusive integers randomly chosen from the range between 1 and , and is a mutation scaling factor which affects the differential variation between two individuals.

Crossover Operation.
After the process of mutation, crossover operation is used to each pair of the target vector in order to enhance the potential diversity of the population. DE applies a crossover operator on and to generate the Mathematical Problems in Engineering 3 offspring individual at the th generation. The crossover operation is defined as where rand is a random value between 0 and 1 and CR is a parameter of crossover probability.

Selection Operation.
The greedy selection is employed by means of comparing a parent and its corresponding offspring. The selection operation at the th generation is described as where ( ) is the objective function value of the trial vector .

The Improved Algorithm: BSODE
In order to achieve better performance, an improved BSO algorithm needs to make use of both the global search information about the search space and the local search information of solutions found so far. The global search information can guide the search for exploring promising areas, while the local search information of solutions can be helpful for exploitation. In this paper, we effectively integrate the mutation and crossover scheme of DE into BSO in the idea creating operator of intracluster and intercluster. The key reason for integrating differential evolution strategy into BSO is that it can take advantage of DE that is mainly based on the distance and direction information and has the advantage of not being biased towards any prior defined guider. Although the differential strategy has been utilized in [16] to design a BSO variant named Modified BSO (MBSO) and [10] to develop another BSO variant termed Close Loop BSO (CLBSO), our BSODE algorithm has significant differences with MBSO and CLBSO. On one hand, in the MBSO and CLBSO, a new idea is created by adding the difference of two distinct random ideas 1 and 2 from all the current ideas to a current idea according to where rand is a random value between 0 and 1 and indices 1 and 2 are mutually exclusive integers randomly chosen from the range between 1 and . Even though formula (7) is similar to the mutation formula (4) of DE algorithm, they are obviously different in essence. In formula (4), , the mutation scaling factor, is a real constant between 0 and 2, and the idea behind is to control the amplification of the differential variation. However, in formula (7), the factor, rand, represents a random direction scaling, and the scope of distance is less than in some cases. In mutation operator, our BSODE will be strictly in accordance with formula (4) of DE algorithm. On the other hand, the MBSO and CLBSO do X center X r2 X r1 Figure 1: Flowchart showing the working of intracluster differential evolution operator.
not take the crossover operator between ideas into account, and our BSODE uses the crossover operator to enhance the performance of local exploitation according to formula (5). Furthermore, in our BSODE, we mimic Osborn's rule that different unusual ideas generating from different perspectives and suspending assumptions by participants are welcome in the brainstorming of human being. According to the inspiration, we add the differential evolution strategy into the normal idea to generate more different ideas. According to the above analysis, the differential evolution operators of BSODE algorithm include intracluster differential evolution operator and intercluster differential evolution operator. The detailed differential evolution operators of the BSODE algorithm are described below.

Intracluster Differential Evolution
Operator. In creating operator of BSO, if the selected idea is a normal one in one cluster according to the probability of p one and (1, p one center), we will let the idea to learn from the differential value of two random selected ideas and the cluster center. The differential evolution operator of a normal idea is defined as where is the mutation scaling factor which affects the differential variation between two ideas and the indices 1 and 2 are mutually exclusive integers randomly chosen in the selected cluster. Then, according to formula (5), the crossover operation is used to generate new solutions by shuffling competing vectors and also to increase the diversity of the population. Figure 1 shows the working process of intracluster differential evolution operator of a normal idea. The search space of the normal idea includes the area between two random ideas 1 and 2 and the cluster center center , and it can be seen that the local exploitation extends within the cluster space.

Intercluster Differential Evolution
Operator. In creating operator of BSO, if the selected ideas are two normal ones in two clusters according to the probability of (1, p one) and (1, p two center), and we will let the idea to learn from the differential value of two randomly selected normal ideas in two clusters and the best idea in all clusters, respectively.
Select a cluster according to the probabilities based on number of individuals (4) if rand( ) < one center (5) execute original operator (6) e l s e (7) execute intra-cluster DE operator to create new using formula (8) and (5)  (8) endif (9) e l se (10) R a n d o m l ys e l e c tt w oc l u s t e r s (11) if rand( ) < two center (12) execute original operator (13) e l s e (14) execute inter-cluster DE operator to create new using formula (9) and (5)  (15) endif (16) endif (17) end Algorithm 1: Pseudocode of the differential evolution operator( ).

X r1 X r2
GlobalIdea Cluster 1 Cluster 2 Figure 2: Flowchart showing the working of intercluster differential evolution operator.
The differential evolution operator of two normal ideas is defined as where is the mutation scaling factor, GobalIdea is the best idea in all clusters, and 1 and 2 denote the normal idea of cluster 1 and cluster 2. Then, according to formula (5), the crossover operation is used to generate new solutions. Figure 2 shows the working process of two normal ideas in intercluster differential evolution operator. The search space of the two normal ideas includes the area between global best idea point and two normal ideas, and it can be seen that the global exploration extends between the two clusters.
According to the above analysis, the pseudocode of differential evolution operator about intracluster and intercluster is summarized in Algorithm 1.

A New
Step Size Method. To adjust the convergence speed as the evolution goes in idea generation, the original BSO algorithm defines an adjusting factor described by formula (3). Figure 3(a) shows the adjusting factor which controls the scale of step. We can observe that at first the adjusting factor keeps around 1, while when half the number of generations has been reached, it rapidly turns to be near 0. This method to control the size of step can also balance exploration and exploitation at different searching generations. However, it just takes effect only for very short interval. Hence, we introduce a simple new step size method which is shown in Figure 3(b). The dynamic mutation function is described as follows: where rand is a random value between 0 and 1. The max iteration and current iteration denote the maximum number of iterations and current number of iterations, respectively.

Flowchart and Pseudocode of the BSODE.
As previously analyzed, the complete flowchart of the BSODE algorithm is shown in Figure 4. This efficient differential evolution operator in intercluster improves the global search capability and avoids convergence to local minima. At the same time, the differential evolution operator in intracluster extends the capability of local exploitation and accelerates the convergence. The pseudocode of the BSODE is summarized in Algorithm 2.

Benchmark Functions Optimization Problem
The Step size Cluster idea individuals and calculate G best and C mean Rand < p_one_center Rand < p_two_center Rand < p_one  Mathematical Problems in Engineering Randomly initialize ideas and evaluate their fitness (3) Initialize cluster centers ( < ) (4) while (stopping condition not met) (5) C l u s t e r ideas into clusters %clustering operation (6) Record the best idea in each cluster as the cluster center (7) f o r( = 1 to ) %creating operation (8) E x e c u t eAlgorithm 1: differential evolution operator() (9) C r e a t e new1 using new according formula (9)  (10) Accept new1 if ( new1 ) is better than ( ) (11) e n d f o r (12) endwhile (13) end Algorithm 2: Pseudocode of the BSODE( ).  [21] and SaDE [22]. To eliminate influences of statistical errors, each problem function is independently  run 25 times which is a prescribed evaluation criterion in CEC2005 [19]. For all approaches, the population size is set to 30. The same stopping criterion is used in all algorithms, that is, reaching certain number of iterations or function evaluations (FEs). In our experiments, we have run all algorithms on the benchmark functions using the same FEs of 10000 * for a fair comparison, where is the size of the problem dimension. The parameter settings of all the algorithms are given in Table 2.

The Contribution of the New Step Size Method to BSODE.
To verify the rationality and effectivity of the new step size in BSODE and to fully understand the effect of the new step size method, herein, we investigate the contribution of the new step size to BSODE. Considering the consistency of the paper, we compare all 14 benchmark functions with 25 independent runs. We test the BSO, BSO with new step size (BSOnewstep), BSODE, and the BSODE with the original adjusting factor, that is, without new step size (BSODE-nonewstep).
The experimental results are presented in Table 3. Table 3 shows that the BSO-newstep (BSO with new step size) has better performance than the original BSO and BSODE has better performance than the BSODE-nonewstep (BSODE without new step size), and it can be concluded that the new step size plays a significant role in BSODE algorithm. This is in good agreement with our analysis in Section 3.2 that the control of the new step size can balance exploration and exploitation at different searching generations in Section 3.2.

Comparisons on Solution Accuracy.
The results of solution accuracy are given in Table 4 in terms of the mean optimum solution and the standard deviation of the solutions obtained in the 25 independent runs by each algorithm over 300,000 FEs on 14 benchmark functions. In all experiments, the dimensions of all problems are 30. In each row of the table, the mean values are listed in the first part, and the standard deviations are listed in the last part, and the two parts are divided with a symbol "±." The best results among the algorithms are displayed in bold.
From Table 4, it can be observed that the mean value and the standard deviation value of the original BSO algorithm performs well only for function 6 compared with the BSODE. The BSODE algorithm performs better than the other algorithms for 4 functions 4, 7, 8, and 10, especially for 4, and the mean value and the standard deviation value are far better than the other five algorithms. PSO outperforms the other algorithms in terms of solution accuracy only for function 12, and DE outperforms the other algorithms in terms of solution accuracy only for functions 3 and 5. Although CoDE performs better for 5 functions 1, 6, 9, 11, and 14 and SaDE performs better for 3 functions 1, 2, and 13, we can conclude that the BSODE algorithm remains still a good performance of the solution accuracy for complex shifted and rotated benchmark functions. Figure 5 shows the convergence graphs of the compared 6 algorithms in terms of the mean fitness values of 25 runs. Due to page limitation, only graphs for 3 functions 1, 7, and 14 which include shifted, shifted rotated, and expanded rotated extended functions are shown. From Figure 5, we can observe that BSODE has fast convergence speed compared to the original BSO or similar convergence speed compared to the other four algorithms.

The Comparison Results of -Tests.
To statistically compare BSODE with the other five algorithms, -test [23,24] is carried out to show the differences between BSODE and the other five algorithms. The values on every function by two-tailed test with a significance level of 0.05 between BSODE and other algorithms are given in Table 5. Better, same, and worse give the number of experiments that BSODE performs significantly better than, almost the same as, and significantly worse than the compared algorithm under the same conditions, respectively. Table 5 gives information about the value, value, and ℎ value of 25 independent runs of 6 algorithms over 300,000 FEs on 14 test functions. The best results among the algorithms are shown in bold.   For example, comparing BSODE and DE, the former significantly outperformed the latter in 7 functions ( 2, 4, 7, 8, 10, 11, and 12), does the same as the latter in 3 functions ( 6, 13, and 14), and does worse in 4 function ( 1, 3, 5, and 9), yielding a "General Merit" value 7 − 4 = 3, indicating that the BSODE generally outperforms the DE. Although it performed slightly weaker in some functions, the BSODE in general offered much improved performance than other compared algorithms, as confirmed in Table 5.

Applications of ANN Using BSODE
Artificial neural network (ANN) [25] simulates the learning pattern of human brain for solving many difficult problems in the field of applications such as prediction of principal ground-motion parameters [26] and prediction of peak ground acceleration [27]. ANN is a flexible mathematical scheme which is capable of identifying complex nonlinear relationships between input data and output data. The task to train ANN needs vigorous optimization techniques because the search space is highly dimensional and multimodal which suffers from noises and data in general. The usually used training method is Back-Propagation (BP) algorithm [28]. However, BP algorithm has some potential problems. On one hand, for the complex pattern classification problems [29], BP algorithm may lightly get trapped into local minima. On the other hand, the training time of the BP algorithm is very long for complex nonlinear problems. In recent decades, in order to deal with the complicated training problem of the ANN, many metaheuristic optimization algorithms, such as Simulated Annealing (SA) [26,27], GA [30], and PSO [31], have been utilized to optimize the weights and biases of ANN. Besides, the recently proposed EvoNN [32,33] algorithm, which utilizes the multiobjective optimization technique in the training process of a feedforward neural network, ensures correct neural training by working out a Pareto tradeoff between the accuracy of the training and the complexity of the network. In this section, we use our BSODE algorithm to adjust connection weights and biases of ANN. To validate the usefulness and the efficiency of our BSODE algorithm, two types of applications including time series prediction and System Identification are illustrated.

Training ANN with BSODE Algorithm.
The structure of training ANN using our BSODE algorithm is a three-layer feed-forward network. The basic structure of the proposed scheme is given in Figure 6.
In our experiments, activation function used in the input layer is a linear combination of input variables, activation function used in the hidden layer is the sigma function, and activation function used in the output layer is a linear combination of output variables of hidden layer. In training ANN, the aim is to find a set of weights and biases with the smallest error value. Herein, the objective function is the mean sum of squared errors (MSE) which is given as follows: where is the number of training data sets, is the number of output units, is desired output, and is output inferred from ANN.

Time Series Prediction of Lorenz Chaotic
System. The first application is the time series prediction of Lorenz chaotic system described by the following equation [34]: where , , and are parameters of the Lorenz chaotic system. The system is chaotic when = 10, = 28, and = 8/3. The differential equations were solved numerically using 4th order Runge-Kutta integration with a step size 0.01 and initial values (0) = 0, (0) = −1, and (0) = 0. The -coordinate of the Lorenz time series system is chosen for prediction. The goal is to construct the prediction model of chaotic time series according to the following formula: ( + 1) = ( ( − 18) , ( − 12) , . . . , ( − 6) , ( )) .
In order to train the ANN, we choose 10000 data sets according to the real model, where the first 8000 data sets are discarded; the rest normalized 2000 data sets are considered as experiment data. The former 1500 points are selected as the training data points and the rest 500 points are selected as the testing data points to test the validity of the model.
In this experiment, the parameter settings of the six algorithms are the same as those in Section 4.2, and the maximal generation used as ended condition of algorithm is 100 for BSODE, BSO, PSO, DE, CoDE, and SaDE. The results are shown in Table 6 in terms of the mean optimum solution, the standard deviation of the solutions, and run time independent runs by each algorithm.
As is seen in Table 6, SaDE has the worst MSE in both training and testing cases. BSODE performs better than BSO, PSO, DE, CoDE, and SaDE in the mean of the MSE. The model trained by six algorithms follows the chaotic behavior of the Lorenz chaotic time series which is demonstrated in Figure 7. It can be concluded that BSODE is the most effective learning algorithm for Lorenz chaotic time series even though BSODE needs more running time than PSO algorithm.

Nonlinear System Identification.
The second application is to build a three-layer feed-forward ANN with three input units, five hidden units, and one output unit. The ANN is used to model the curve of a nonlinear function which is given by the following equation: ) .
To train the ANN, we choose 200 pairs of data from the real model. In addition, to further evaluate the performance of BSODE algorithm, we add 20 db additive white Gaussian noise into the 200 pairs of data.
In this experiment, population size of each algorithm is 30, and the maximal generation used as termination 12 Mathematical Problems in Engineering  14 Mathematical Problems in Engineering  condition of each algorithm is 1000 for BSODE, BSO, PSO, DE, CoDE, and SaDE. The parameter settings of the six algorithms are also the same as in Section 4.2. The results are given in Table 7 in terms of the mean MSE and the standard deviation obtained in the 25 independent runs for six algorithms. From Table 7, in case without noise, we observed that BSODE performs worse than SaDE, CoDE, and PSO, but BSODE performs better than BSO and DE in terms of the mean MSE. However, in case of 20 db additive white Gaussian noise, BSODE performs better than all other five algorithms. Figure 8 shows that the output of the identified system tracks the target output for three algorithms. We can conclude that BSODE has the best performance in case without noise and a better performance compared with BSO and DE in building the approximation model.

Conclusion
In this paper, an improved BSO with differential evolution strategy and new step size method named BSODE is proposed for solving complex optimization problems and training artificial neural networks. Experiments on the 14 widely used benchmark functions of CEC 2005 and the two realworld applications of artificial neural network are carried out to compare the proposed BSODE with five state-of-the-art algorithms in this paper. From these results, we can draw the conclusion that the BSODE significantly improves the performance of the original BSO. Moreover, the BSODE gives good performance on the training of artificial neural networks compared with other algorithms. We expect that BSODE will be used to solve more real-world global optimization problems. Further work includes research into neighbor search to make the algorithm more efficient. In the near future, we also plan to apply the algorithm to large scale optimization domains.