A Hybrid Backtracking Search Optimization Algorithm with Differential Evolution

The backtracking search optimization algorithm (BSA) is a new nature-inspired method which possesses a memory to take advantage of experiences gained from previous generation to guide the population to the global optimum. BSA is capable of solving multimodal problems, but it slowly converges and poorly exploits solution. The differential evolution (DE) algorithm is a robust evolutionary algorithm and has a fast convergence speed in the case of exploitive mutation strategies that utilize the information of the best solution found so far. In this paper, we propose a hybrid backtracking search optimization algorithm with differential evolution, calledHBD. InHBD,DEwith exploitive strategy is used to accelerate the convergence by optimizing oneworse individual according to its probability at each iteration process. A suit of 28 benchmark functions are employed to verify the performance of HBD, and the results show the improvement in effectiveness and efficiency of hybridization of BSA and DE.


Introduction
Optimization plays an important role in many fields, for example, decision science and physical system, and can be abstracted as the minimization or maximization of objective functions subject to constraints on their variables mathematically.Generally speaking, the optimization algorithms can be employed to find their solutions.The stochastic relaxation optimization algorithms, such as genetic algorithm (GA) [1], particle swarm optimization algorithm (PSO) [2,3], ant colony algorithm (ACO) [4], and differential evolution (DE) [5], are one of the methods for solving solutions effectively and almost nature-inspired optimization techniques.For instance, DE, one of the most powerful stochastic optimization methods, employs the mutation, crossover, and selection operators at each generation to drive the population to global optimum.In DE, the mutation operator is one of core components and includes many differential mutation strategies which reveal different characteristics.For example, the strategies, which utilize the information of the best solution found so far, have fast convergence speed and favor exploitation.These strategies are classified as the exploitative strategies [6].
Inspired by the success of GA, PSO, ACO, and DE for solving optimization problems, new nature-inspired algorithms have been a hot topic in the development of the stochastic relaxation optimization techniques, such as artificial bee colony [7], cuckoo search [8], bat algorithm [9], firefly algorithm [10], social emotional optimization [11][12][13], harmony search [14], and biogeography based optimization [15].A survey has pointed out that there are about 40 different nature-inspired algorithms [16].
The backtracking search optimization algorithm (BSA) [17] is a new stochastic method for solving real-valued numerical optimization problems.Similar to other evolutionary algorithms, BSA uses the mutation, crossover, and selection operators to generate trial solutions.When generating trial solutions, BSA employs a memory to store experiences gained from previous generation solutions.Taking advantage of historical information to guide the population to global optimum, BSA focuses on exploration and is capable of Input: ,  and  output:  Step 1. Initiate  1:,1: = 1 Generate ,  drawn from uniformly distribution with the range between 0 and 1.

If 𝑎 > 𝑏 then
For  from 1 to  do Generate a vector containing a random permutation of the integers  from 1 to  Generate  drawn from uniformly distribution with the range between 0 and 1.  , ( solving multimodal optimization problems.However, utilizing experiences may make BSA converge slowly and prejudice exploitation on later iteration stage. On the other hand, researches have paid more and more attention to combine different search optimization algorithms or machine learning methods to improve the performance for real-world optimization problems.Some good surveys about hybrid metaheuristics or machine learning methods can be found in the literatures [18][19][20].In this paper, we also concentrate on a hybrid metaheuristic algorithm, called HBD, which combines BSA and DE.HBD employs DE with exploitative mutation strategy to improve convergence speed and to favor exploitation.Furthermore, in HBD, DE is invoked to optimize only one worse individual selected with the help of its probability at each iteration process.We use 28 benchmark functions to verify the performance of HBD, and the results show the improvement in effectiveness and efficiency of hybridization of BSA and DE.The major advantages of our approach are as follows.(i) DE with exploitive strategies helps HBD converge fast and favor exploitation.(ii) Since DE optimizes one individual, HBD expends only one more function evaluation at each iteration and will not increase the overall complexity of BSA.(iii) DE is embedded behind BSA, and therefore HBD does not destroy the structure of BSA, and it is still very simple.
The remainder of this paper is organized as follows.Section 2 describes BSA and DE.Section 3 presents the HBD algorithm.Section 4 reports the experimental results.Section 5 concludes this paper.

Preliminary
2.1.BSA.The backtracking search optimization algorithm is a new stochastic search technique developed recently [17].BSA has a single control parameter and a simple structure that is effective and capable of solving different optimization problems.Furthermore, BSA is a population-based method and possesses a memory in which it stores a population from a randomly chosen previous generation for generating the search-direction matrix.In addition, BSA is a nature-inspired method employing three basic genetic operators: mutation, crossover, and selection.
BSA employs a random mutation strategy that used only one direction individual for each target individual, formulated as follows: where  is the current population, old is the historical population, and  is a coefficient which controls the amplitude of the search-direction matrix (old − ).
BSA also uses a nonuniform and more complex crossover strategy.There are two steps in the crossover process.Firstly, a binary integer-values matrix (map) of size × ( and  are the population size and the problem dimensions) is generated to indicate the mutant individual to be manipulated by using the relevant individual.Secondly, the relevant dimensions of mutant individual are updated by using the relevant individual.This crossover process can be summarized as shown in Algorithm 1. BSA has two types of selection operators.The first type selection operator is employed to select the historical population for calculating search direction.The rule is that the historical population should be replaced with the current population when the random number is smaller than the other one.The second type of selection operator is greedy to determine the better individuals to go into the next generation.
According to the above descriptions, the pseudocode of BSA is summarized as shown in Algorithm 2.

DE.
DE is a powerful evolutionary algorithm for global optimization over continuous space.When being used to solve optimization problems, it evolves a population of  candidate solutions with -dimensional parameter vectors, noted as .In DE, the population is initiated by uniform sampling within the prescribed minimum and maximum bounds.
After initialization, DE steps into the iteration process where the evolutionary operators, namely, mutation, crossover, and selection, are invoked in turn, respectively.
DE employs the mutation strategy to generate a mutant vector .So far, there are several mutant strategies, and the most well-known and widely used strategies are listed as follows [21,22]: "DE/best/1": "DE/current-to-best/1": "DE/best/2": "DE/rand/1": "DE/current-to-rand/1": "DE/rand/2": where the indices  1 ,  2 ,  3 ,  4 , and  5 are uniformly random mutually different integers from 1 to ,  best denotes the best individual obtained so far, and   and   are the th vector of  and , respectively.
The crossover operator is performed to generate a trial vector   according to each pair of   and   after the mutant vector   is generated.The most popular strategy is the binomial crossover described as follows: where   is called the crossover rate,  rand is randomly sampled from 1 to , and  , , V , , and  , are the th element of   ,   , and   , respectively.Finally, DE uses a greedy mechanism to select the better vector from each pair of   and   .This can be described as follows:

HBD
In this section, we describe the HBD algorithm in detail.First, the motivations of this paper are given.Second, the framework of HBD is shown.

Motivations.
BSA uses an external archive to store experiences gained from previous generation solutions and makes use of them to guide the population to global optimum.According to BSA, permuting arbitrary changes in position of historical population makes the individuals be chosen randomly in the mutation operator; therefore, the algorithm focuses on exploration and is capable of solving multimodal optimization problems.However, just due to random selection, by utilizing experiences, BSA may be led to converge slowly and to prejudice exploitation on later iteration stage.This motivates our approach which aims to accelerate the convergence speed and to enhance the exploitation of the search space to keep the balance between the exploration and exploitation capabilities of BSA.
On the other hand, some studies have investigated the exploration and exploitation ability of different DE mutation strategies and pointed out the mutation operators that incorporate the best individual (e.g., (2), (3), and (4)) favor exploitation because the mutant individuals are strongly attracted around the best individual [6,23].This motivates us to hybridize these exploitative mutation strategies to enhance the exploitation capability of BSA.In addition, this paper is also in light of some studies which have shown that it is an effective way to combine other optimization methods to improve the performance for real-world optimization problems [24][25][26][27].

Framework of HBD.
Generally speaking, there are many ways to hybridize BSA with DE.In this study, we propose another hybrid schema between BSA with DE.In this schema, HBD employs DE with exploitive strategy behind BSA at each iteration process to share the information between BSA and DE.However, more individuals are optimized by DE, and more function evaluations will be spent.In this case, HBD would gain the premature convergence, resulting in prejudicing exploration.Thus, to keep the exploration capability of HBD, DE is used to optimize only one worse individual according to its probability.In addition, (2) is used as default mutation strategy in HBD because (3) and ( 4) have stronger exploration capabilities by introducing more perturbation with the random individual [6] or a modification combining "DE/best/1" and "DE/rand/1" [28].The performance influenced by different exploitative strategies will be discussed in Section 4.3.
In order to select one individual for DE, in this work, we assign a probability model for each individual according to its fitness.It can be formulated as follows: where  is the population size and   is the ranking value of each individual when the population is sorted from the worst fitness to the best one.Note that the probability equation is similar to the selection probability in DE with ranking-based mutation operators [29].In general, the worse individuals are more far away from the best individual than the better ones; thus, they will have higher probabilities to get around the best one.This selection strategy can be defined as follows: where   is selected individual and optimized by DE.It is worth pointing out that our previous work [30], called BSADE, splits the whole iteration process into two parts: the previous two-third and the latter one-third stages.BSA is used in the first stage, and DE is employed in the second stage.In this case, DE does not share the population information with BSA.Moreover, it is difficult to split the whole iteration process into two parts.Thus, the difference between HBD and BSADE is that HBD shares the population information between BSA and DE, while BSADE does not.The comparison can be found in Section 4.4.
According to the above descriptions, the pseudocode of HBD is described in Algorithm 3.

Experimental Verifications
In this section, to verify the performance of HBD, we carry out comprehensive experimental tests on a suit of 28 benchmark functions proposed in the CEC-2013 competition [31].These 28 benchmark functions include 5 unimodal functions  1 - 5 , 15 basic multimodal functions  6 - 20 , and 8 composition functions  21 - 28 .More details about 28 functions can be found in [31].
To make a fair comparison, we use the same parameters for BSA and HBD, unless a change is mentioned.Each algorithm is performed 25 times for each function with the dimensions  = 10, 30, and 50, respectively.The population size of each algorithm  is  when  = 30 and  = 50, while it is 30 in the case of  = 10.The maximum function evaluations are 10000 × .The mutation factor  and the crossover factor   are 0.8 and 0.9 for HBD, respectively.In addition, we use the boundary handling method given in [17].
To evaluate the performance of algorithms, we use Error as an evaluation indicator first.Error, which is the function error value for the solution  obtained by the algorithms, is defined as () − ( * ), where  * is the global optimum of function.In addition, the average and standard deviation of the best error values, presented as "AVG Er ± STD Er , " are used in the different tables.Second, the convergence graphs are employed to show the mean error values of the best solutions at iteration process over the total run.Third, a Wilcoxon signed-rank test at the 5% significance level ( = 0.05) is used to show the significant differences between two algorithms.The "+" symbol shows that the null hypothesis is rejected at the 5% significant level and HBD outperforms BSA, the "−" symbol says that the null hypothesis is rejected at the 5% significant level and BSA exceeds HBD, and the "=" symbol reveals that the null hypothesis is accepted at the 5% significant level and HBD ties BSA.Additionally, we also give the total number of statistical significant cases at the bottom of each table.

The Effect of HBD.
To show the effect of the proposed algorithm, Table 1 lists the average error values obtained by BSA and HBD for 30-dimentional benchmark functions.For unimodal functions  1 - 5 , HBD overall obtains better average error values than BSA does.For instance, HBD gains the global optimum on  5 and brings solutions with high quality to  2 - 4 in terms of average error values.HBD exhibits a little inferiority to BSA for  1 , but these two approaches are not significant.For 15 basic multimodal functions  6 - 20 , with the help of average error values, HBD brings superior solutions to 10 out of 15 functions, equal ones to 2 out of 15 functions, and inferior ones to 3 out of 15 functions.However, according to the results of Wilcoxon test, Initiate the population  and the historical population  randomly sampled from search space.

While (Stop Condition doesn't meet)
Perform the first type selection:  =  in the case of  < , where  and  are drawn from uniformly distribution with the range between 0 and 1. Permute arbitrary changes in position of .
Generate the mutant according to (1) In order to further show the convergence speed of HBD, the convergence curves of two algorithms for six selected benchmark functions are given in Figure 1.
It is observed that the selected functions can be divided into four groups, and overall the convergence performance of HBD is better than BSA.For example, for the first group of functions, for example,  6 and  27 in which HBD has significantly better average error values than BSA, HBD converges faster than BSA in terms of the convergence curves seen in Figures 1(c) and 1(f).For  20 and  23 belong to the second group where HBD cannot bring the solutions with higher quality significantly, HBD still converges faster than BSA does.Third, for  5 in which both of the two algorithms reach the global optimum, convergence performance of HBD is better compared to BSA.Additionally, HBD outperforms BSA according to the convergence curves seen in Figure 1(a), although the average error values optimized by HBD are inferior but not significant to BSA.
All in all, HBD overall outperforms BSA in terms of solution quality and convergence speed.This is because DE with exploitive mutation strategy enhances the exploitation capability of HBD, and it does not expend too much function evaluations.

Scalability of HBD.
In this section, to analyze the performance of HBD affected by the problem dimensionality, a scalability study is investigated, respectively, on the 28 functions with 10- and 50- due to their definition up to 50- [31].The results are tabulated in Table 2.
In the case of  = 10, according to average error values shown in Table 2, HBD exhibits superiority in the majority of functions while inferiority in a handful of ones.Additionally, in terms of the total of "+/=/−, " HBD wins and ties BSA in 9 and 19 out of 28 functions, respectively.
When  = 50, HBD still can bring solutions with higher quality than BSA does in most of benchmark functions.Moreover, HBD outperforms and ties BSA in 13 and 15 out of 28 functions, respectively.
In summary, it suggests that the advantage of HBD over BSA is stable when the dimensionality of problems increases.

The Effect of Mutation Strategy.
In HBD, the "DE/best/1" mutation strategy is used to enhance the exploitation capability of HBD in default.To show the performance of HBD influenced by other exploitive mutation strategies, the experiments are carried on benchmark functions and the results are listed in Table 3 where cHBD and bHBD mean that HBD uses "DE/current-to-best/1" and "DE/best/2, " respectively.The results obtained by cHBD and bHBD, which are highly accurate compared to those obtained by HBD, are marked in bold.
From Table 3, in terms of the average error values, bHBD shows the higher accuracy compared to HBD for a few functions since "DE/best/2" usually exhibits better exploration than "DE/best/1" because of one more difference of randomly selected individuals in the former [23].cHBD also gains higher accuracy of solutions than HBD does for a handful of functions because "DE/current-best/1, " a modification combining "DE/best/1" and "DE/rand/1" [28], shows better exploration than "DE/best/1." In other words, for a few functions, "DE/best/2" and "DE/current-best/1" can balance the exploration and exploitation capabilities of HBD better.For example, bHBD and cHBD bring the solutions with higher quality to  1 ,  3 ,  8 ,  10 ,  11 , and  16 ; in particular, they reach the global optimum.However, for most of the functions, HBD with "DE/best/1" performs better than cHBD and bHBD.
Additionally, Table 4 reports the results of the multipleproblem Wilcoxon test which was done similarly in [29,32] between HBD and its variants for all functions.We can see from Table 4 that HBD is significantly better than bHBD and HBD gets higher  + value than  − value although two values are not significant.Therefore, HBD uses "DE/best/1" in the tradeoff.

The Effect of Hybrid Schema.
In this section, we analyze the performance of HBD affected by the hybrid schema.Firstly, to show the effect of more than one individual optimized by DE, the algorithm, called aHBD which uses DE to optimize the whole population, is used to compare with HBD.Secondly, we add a probability   on aHBD to control the use of DE and propose paHBD.In paHBD, if the random number  drawn from uniform distribution between 0 and 1 is less than   , then DE is invoked.The   is defined as follows: where  is the number of function evaluations which had been spent and  is the maximum number of function evaluations.Additionally, BSADE is compared with HBD to show their differences.Table 5 lists the error values obtained by aHBD, paHBD, BSADE, and HBD for 28 functions at  = 30.It can be observed that HBD wins, ties, and loses aHBD in 10, 12, and 6 out of 28 functions in terms of "+/=/−, " respectively.It says that optimizing more individuals using DE costs more function evaluations when DE is embedded behind BSA directly, resulting in reducing the iteration process cycles and then getting poor performance for most functions.Regarding BSADE, since BSA and DE are invoked in different stages where they cannot exchange the population information, it is clear that this schema cannot balance the exploitation and exploration well.Thus, compared with BSADE, HBD brings solutions with higher accuracy for most functions.Moreover, HBD wins, ties, and loses BSADE in 8, 17, and 3 out of 28    functions with the help of "+/=/−, " respectively.However, paHBD uses the probability to control the use of DE.In this case, it can decrease the cost of function evaluation at early evolution stage.Thus, paHBD is almost similar to HBD according to "+/=/−." In addition, we also perform the multiple-problem Wilcoxon test for HBD, aHBD, paHBD, and BSADE for 28 functions and list the results in Table 6.
It can be found from Table 6 that HBD is not significant to aHBD, paHBD, and BSADE.But HBD gets higher  + values than  − values, compared with aHBD and BSADE, respectively.But HBD obtains slightly lower  − value than  + value in comparison with paHBD.This is because HBD brings weakly lower accurate solutions on  2 ,  3 ,  4 , and  15 , resulting in higher ranking.Nevertheless, it indicates that the hybrid schema used in HBD is a reasonable choice.

The Effect of Probability Model.
In HBD, a linear model seen (10) is used to select one individual to optimize.It is worth pointing out that other models, for example, nonlinear, can also be adopted in our algorithm.In this section, we do not seek the optimal probability model but only analyze the performance influenced by different models.Thus, two models, as similarly used in [29,33], are employed to study the performance affected by other models.They are the quadratic model and the sinusoidal model, formulated as seen in ( 13) and ( 14), respectively.The average error values and the results of the multiple-problem Wilcoxon test are reported in Tables 7 and 8, respectively, where qHBD is HBD with the quadratic model and sHBD means HBD with the sinusoidal one.Consider From Table 7, we can find that qHBD can bring higher solutions to 11 out of 28 functions compared with HBD, although the results they obtain are not significant in terms of "+/=/−." In addition, qHBD gets lower  − values than  + values HBD gained, though they are not significant at the 5% and 10% significance level.It says that the linear model is a reasonable choice compared with the quadratic model.However, it is not the optimal one compared with    [17], namely, PSO2011 [34], CMAES [35,36], ABC [7], JDE [37], CLPSO [38], and SADE [39].Moreover, to compare fair and conveniently, we use the 25 functions and the parameters which are employed and suggested in [17].More details about these 25 functions can be found in CEC-2005 competition [40].Table 9 lists the minimal fitness and average fitness of 7 approaches, where the results of 6 non-BSA algorithms are adopted from [17] directly.In addition, the results of multiple-problem Wilcoxon test and Friedman test similarly done in [29] for the seven algorithms are listed in Tables 10 and 11, respectively.From Table 9, we find that each algorithm does well in some functions according to its average error value.For instance, PSO2011, CMAES, ABC, JDE, CLPSO, SADE, and HBD perform better in 8, 5, 9, 3, 3, 3, and 7 out of 25 functions, respectively.However, Table 10 shows that HBD gets higher  + values than  − values in all cases.This suggests that HBD is better than the other 6 algorithms.Moreover, for Wilcoxon test at  = 0.05 and  = 0.01 in three cases, there are significant differences for CEC2005 functions.Furthermore, with respect to the average rankings of different algorithms by the Friedman test, it can be seen clearly from Table 11 that HBD offers the best overall performance, while SADE is the second best, followed by ABC, PSO2011, CLPSO, JDE, and CMAES.
Table 12 lists the average error values which are dealt with from [46], and the average rankings of the six algorithms by the Friedman test for CEC-2013 functions at  = 30 are given in Table 13.Since NBIPOP-aCMA is one of top three performing algorithms for CEC-2013 functions [47], seen from Table 12, it shows the promising performance in almost all of functions.Other algorithms bring solutions with higher accuracy in a handful of functions.For example, fk-PSO, SPSO2011, SPSOABC, PVADE, and HBD yield the better performance on 3, 2, 6, 4, and 5 out of 28 functions in terms of the average error values.However, according to the average rankings of different algorithms by the Friedman test in Table 13, we can find that NBIPOP-aCMA is the best, and HBD offers the second best overall performance, followed by SPSOABC, fk-PSO, PVADE, and PSO2011.

Conclusion
In this paper, we presented a hybrid BSA, called HBD, which combined BSA and DE with exploitive mutation strategy.At each iteration process, DE was embedded behind the BSA algorithm to optimize one individual which was selected according to its probability in order to enhance the convergence of BSA and to bring solutions with higher quality.
Comprehensive experiments have been carried out in 28 benchmark functions proposed in CEC-2013 competition.The experimental results reveal that the hybridization of BSA and DE provides the high effectiveness and efficiency in most of functions, contributing to solutions with higher accuracy, faster convergence speed, and more stable scalability.HBD was also compared with other evolutionary algorithms and has shown its promising performance.
There are several interesting directions for future work.Experimentally, the linear probability model used to select one individual to optimize is a reasonable but not optimal one; thus, firstly, the comprehensive tests will be performed on various probability models in HBD.Secondly, although experimental results have shown that HBD owns the stable scalability, we plan to investigate HBD for large-scale optimization problems.Last but not least, we plan to apply HBD to some real-world optimization problems for further examinations.

Figure 1 :
Figure 1: The convergence curves of BSA and HBD for selected benchmark functions.
For  from 1 to  do For  from 1 to  do If  , = 1 then  , =  , Else  , =  , (1)tiate the population  and the historical population  randomly sampled from search space.While (Stop Condition doesn't meet)Perform the first type selection:  =  in the case of  < , where  and  are drawn from uniformly distribution with the range between 0 and 1. Permute arbitrary changes in position of oldP.Generate the mutant according to(1).Generate the population  based on Algorithm 1. Perform the second type selection: select the population with better fitness from  and .

Table 1 :
. Generate the population T based on Algorithm 1. Perform the second type selection: select the population with better fitness from T and P. Update the best solution.//Invoke DE with exploitive strategy Select One Individual according to its probability:   .Optimize  with the help of DE, and get  DE If (fitness( DE ) <= fitness(  )) Error values obtained by BSA and HBD for 30-dimensional CEC-2013 benchmark functions.significant for HBD and BSA for 3 functions in which HBD gains lower solution quality.For composition functions  21 - 28 , HBD and BSA draw a tie on  26 and  28 by the aid of average error values; however, HBD significantly outperforms BSA according to the results of Wilcoxon test.Moreover, according to average error values, HBD performs better than BSA in  23 ,  24 ,  25 , and  27 but worse than BSA in  21 and  22 .Nevertheless, two algorithms almost are not significant for these 8 composition functions in terms of the results of Wilcoxon test.Summarily, according to "+/=/−, " HBD wins and ties BSA on 12 and 16 out of 28 benchmark functions, respectively.

Table 2 :
Error values obtained by BSA and HBD for 10-and 50-dimensional CEC-2013 benchmark functions.

Table 10 :
Results of the multiple-problem Wilcoxon test for seven algorithms for CEC2005 functions at  = 10.

Table 11 :
Average ranking of seven algorithms by the Friedman test for CEC2005 functions at  = 10.

Table 12 :
Error values obtained by HBD and 5 compared algorithms for CEC-2013 benchmark functions at  = 30.

Table 13 :
Average ranking of six algorithms by the Friedman test for CEC2013 functions at  = 30.