Multiple-Try Simulated Annealing Algorithm for Global Optimization

Simulated annealing is a widely used algorithm for the computation of global optimization problems in computational chemistry and industrial engineering. However, global optimum values cannot always be reached by simulated annealing without a logarithmic cooling schedule. In this study, we propose a new stochastic optimization algorithm, i.e., simulated annealing based on the multiple-try Metropolis method, which combines simulated annealing and the multiple-try Metropolis algorithm. The proposed algorithm functions with a rapidly decreasing schedule, while guaranteeing global optimum values. Simulated and real data experiments including a mixture normal model and nonlinear Bayesian model indicate that the proposed algorithm can significantly outperform other approximated algorithms, including simulated annealing and the quasi-Newton method.


Introduction
Since the 21 st century, the modern computers have greatly expanded the scientific horizon by facilitating the studies on complicated systems, such as computer engineering, stochastic process, and modern bioinformatics.A large volume of high dimensional data can easily be obtained, but their efficient computation and analysis present a significant challenge.
With the development of modern computers, Markov chain Monte Carlo (MCMC) methods have enjoyed a enormous upsurge in interest over the last few years [1,2].During the past two decades, various advanced MCMC methods have been developed to successfully compute different types of problems (e.g., Bayesian analysis, high dimensional integral, and combinational optimization).As an extension of MCMC methods, the simulated annealing (SA) algorithm [1][2][3] has become increasingly popular since it was first introduced by Kirkpatrick et al. (1983).As Monte Carlo methods are not sensitive to the dimension of data sets, the SA algorithm plays an important role in molecular physics, computational chemistry, and computer science.It has also been successfully applied to many complex optimization problems.
Several improved optimization methods have been proposed recently [4,5] and successfully applied to polynomial and vector optimization problems, min-max models, and so on [6][7][8][9][10].Although Sun and Wang (2013) discussed the error bound for generalized linear complementarity problems, all of these improved methods were designed for special optimization problems and not for global optimization problems.The SA algorithm is a global optimization algorithm that can obtain global optimization results with slowly decreasing temperature schedule.However, Holley et al. (1989) pointed out that only with the use of a "logarithmic" cooling schedule could the SA algorithm converge to the global minimum with probability one [11,12].Liang et al. (2014) improved the SA algorithm by introducing the simulated stochastic approximation annealing (SAA) algorithm [13,14].Such algorithm can work with a square-root cooling schedule in which the temperature can decrease much faster than that in a "logarithmic" cooling schedule.Karagiannis et al. (2017) extended the SAA algorithm by using population Monte Carlo ideas and introduced the parallel and interacting stochastic approximation annealing (PISAA) algorithm [15].
In the present study, we propose a variation of the SA algorithm, i.e., the multiple-try Metropolis based simulated annealing (MTMSA) algorithm, for global optimization.The MTMSA algorithm is a combination of the SA algorithm and multiple-try Metropolis (MTM) algorithm [16].The MTM algorithm, which allows several proposals from different proposal distributions in the multiple-try step simultaneously, achieves a higher rate of convergence than the standard Metropolis algorithm (e.g., random walk Metropolis algorithm) [1,2,17].Thus, the MTMSA algorithm can guarantee that global minima are reached with a rapidly decreasing cooling schedule.Comparing with PISAA, which should run on multicore computer to their advantage, the MTMSA often owns high convergent rate by use of the efficient vector operation with MATLAB or R on personal and super computer.Simulation and real data examples show that, under the framework of the multiple-try algorithm, the MTMSA algorithm can reach global minima under a rapidly decreasing cooling schedule relative to that of the SA algorithm.
The remainder of this paper is organized as follows.Section 2 describes the framework of the MTMSA algorithm.Section 3 illustrates the comparison of the MTMSA algorithm with other optimization methods in a mixture normal model.Section 4 presents the application of the MTMSA algorithm to Bayesian analysis and half-space depth computation through real data sets.Finally, Section 5 summarizes the conclusions derived from the study.

MTMSA Algorithm
2.1.Overview of SA Algorithm.The SA algorithm originates from the annealing process, which is a thermodynamic process used to attain a low energy state in condensed matter physics [1].The process comprises two steps.The first state is the high temperature state, in which solids transform into liquid and particles move freely to ward ideal.Then the temperature drops to zero slowly and the movement of the particles become restricted such that the desired structure is achieved.Realizing that the Metropolis algorithm [18] can be used to simulate the movements of particles, Kirkpatrick et al. (1983) proposed a computer simulation based physical annealing process, i.e., the SA algorithm.
Suppose our goal is to find the minimum value of ℎ(x), x ∈ .This goal is equivalent to the search for the maximum value of exp{−ℎ(x)/}, x ∈ , with any positive temperature .Let  1 >  2 > ⋅ ⋅ ⋅ >   > ⋅ ⋅ ⋅ be a decreasing temperature sequence, with large  1 and lim →+∞   = 0.In every temperature   , we use the Metropolis-Hastings (MH) algorithm (or Gibbs sampling algorithm [19]) to update the Markov chain   times, with   (x) ∝ exp{−ℎ(x)/  } as its stationary distribution.When  is increasing, an increasing number of samples concentrate in the maximum value nearby.The SA algorithm can be summarized as follows: (1) Initialize x (0) with starting temperature  1 .
(2) At current temperature   , update the Markov chain   times, with   (x) as its stationary distribution, and transmit the last state x to the next temperature.
The SA algorithm can reach the global optimum when the temperature sequence decreases slowly (e.g., the inverse logarithmic rate, i.e., the order of (log(  ) −1 ), where   =  1 + ⋅ ⋅ ⋅ +   [1,2,11,12]).However, no one can afford to use such a slow cooling schedule.Various improved SA algorithms have thus been designed and to overcome the excessively slow cooling schedule and to successfully resolve various optimization problems in industries and commerce [14,[20][21][22][23]. Realizing that the MTM algorithm can overcome the "local-trap" problem and enjoy a high convergence rate, we propose the MTMSA algorithm, which is a combination of the MTM algorithm and the SA algorithm.

MTMSA Algorithm.
The Metropolis algorithm is the first iterative sampling algorithm of the MCMC algorithm.Hastings extended this algorithm by allowing the proposal distribution to be an asymmetric distribution, i.e., the MH algorithm [24].
The MH algorithm is an iterative MCMC sampling algorithm, whose iterative points { 0 ,  1 , . . .,   , . ..} show a limiting distribution ().The challenge of using the MH algorithm is that it tends to suffer from the "local-trap" problem when the target distribution function () is a multimodal distribution [2,19].It eventually impedes the convergence of the SA algorithm.The MTM algorithm can overlap the "local-trap" problem.The MTM algorithm allows several proposal points simultaneously and selects the best one as the next sampling point while keeping the stationary distribution unchanged.Thus, combining the MTM algorithm and SA algorithm yields the MTMSA algorithm.
Suppose the global optimization problem is The whole procedure of the MTMSA algorithm for problem (1) is summarized below.
Then update the current state of the Markov chain with probability .Set x = x * ; otherwise, reject it, and keep x unchanged.
(3) If   <   , output the last solution x and the minimum value of (1) of the whole procedure; otherwise, update  to  + 1, and proceed to step (2).
Furthermore, Algorithm 1 gives the pseudocode of MTMSA algorithm for the computation of problem (1).
The convergence of the MTMSA algorithm can be obtained from the stationary distribution of the MTM algorithm (i.e., the detailed balance condition of the MTM algorithm [1]).Theoretically, when   approaches zero and the step number of the MTM algorithm is sufficiently large, all samples drawn from   would be in the vicinity of the global minimum of ℎ(x) in .
The next proposition gives the computation complex of the MTMSA algorithm.

Proposition 1. The computation complex of the MTMSA algorithm is
where  is the length of decreasing cooling temperature,  is the frequency of the Markov chain update, and  is the number of multiple-try points.
Proof.The proof of the proposition directly follows the procedure of the MTMSA algorithm described above.The decreasing cooling temperature and the length of the Markov chain are the external loops of the MTMSA algorithm.By combining the external loops with the internal loop of the multiple-try model, we then complete the proof of the proposition.
The proposition indicates that the computation complex of the MTMSA algorithm is a polynomial in  and .Given the computation complex of a stationary distribution   (x), the computation complex of the MTMSA algorithm is not greater than the polynomial in  (where  is the dimension of x).
The MTMSA algorithm has many advantages over other approximation algorithms.Compared with traditional optimization algorithms (such as the Nelder-Mead (NM) method [25] and the quasi-Newton (QN) method [26]), which are local optimization methods, the MTMSA algorithm gets more accurate results often, as shown in our simulated multimodal experiment.In practice, the speed of the MTMSA algorithm is generally high, particularly for an efficient vector operation (or parallel computing) with MATLAB or R in the evaluation of multiple-try points.The MTMSA algorithm clearly outperforms the SA algorithm in our experiment.Furthermore, by setting the number of multiple-try points  = 1, we can obtain a special case of the MTMSA algorithm, that is, the SA algorithm.Simulated and real data examples in subsequent sections show the advantage of the MTMSA over other approximated algorithms.

Simulated Experiment Results
This section presents a simulation example (i.e., mixture normal model).The main purpose of this example is to demonstrate that the MTMSA algorithm could compute the optimization problem in the case of multiple local maxima and outperform its counterparts in terms of accuracy and efficiency.All results are obtained using R language (version X64 2.3.4) and MATLAB (version R2017a) on a Dell Opti-Plex7020MT desktop computer with Intel(R) Core(TM) i7-4790 CPU @ 3.6 GHz, RAM 8.00 GB, and Windows 7 Ultimate with Service Pack 1 (x64).The R and MATLAB codes in this work are available upon request to the corresponding author.
In this simulation example, we consider a two-dimensional multimodal mixture normal model modified from [27,28].In this model, the objective function is the probability density function, which is the combination of 20 normal models where Figure 1 illustrates the mesh and contour plots of ( 5), which contains 20 modes in this objective function.This example poses a serious challenge for optimization because many classical optimization methods may converge on the local optimum in this multimodal example.Clearly, the global maximum point is the last mode (0.183, 7.501) with the maximum value of 2.449.
Four methods are used to find the global maximum point of this optimization problem: the NM method, modified QN method, SA algorithm, and MTMSA algorithm.The NM and QN methods are commonly applied numerical optimization algorithms, and the QN method allows box constraints.The SA and its improved version, i.e., the MTMSA algorithm, are stochastic optimization algorithms.Apart from the minimum temperature   , another commonly used parameter that controls the degree of decreasing temperature is   : For the SA algorithm, we set the degree of decreasing temperature   = 75, the starting temperature   = 10, the decreasing parameter  = 0.9, the length of the Markov chain   = 100, and the proposal variance of the Metropolis algorithm V  = 2.For the MTMSA algorithm, we set the number of multiple-tries   = 100.The other parameters are   = 25,  = 0.8,   = 1,   = 100, and V  = 2.With different   (75 and 25) and  (0.9 and 0.8, respectively) values, the SA and MTMSA algorithms have similar   .We tested these four algorithms (NM, QN, SA, and MTMSA algorithms) to compute the optimization problem and repeated this computation 50 times.The computation results are summarized in Figure 2 and Table 1.
The mean value, standard deviation (sd), mean square error (MSE), total CPU time (in seconds), and average CPU time for one accurate result (in seconds) of the results from different algorithms are summarized in Table 1, where , and   = 2.449 is the exact maximum value in the model (5). Figure 2 illustrates the boxplot of 50 computation results from these four algorithms.
Suffering from the "local-trap" problem, NM and QN algorithms cannot find the global maximum successfully in 50 computations (they often find other local modes in ( 5)).Compared with the MTMSA algorithm, the SA algorithm uses a slowly decreasing temperature schedule (  = 75) and consumes more CPU time.However, only 10 results of 50 repetitions from the SA algorithm converge to the We compared the differences in the decreasing temperature schedules used in SA and MTMSA algorithms."The slower the better" was found to be applicable to the decreasing temperature schedules.A rapidly decreasing temperature schedule may result in the "local-trap" problem.In the next simulation, we set the temperature schedules to decrease from 10 to 5 × 10 −3 , and the length of decreasing temperature was set to 500, 75 and 25 for SA and MTMSA algorithms.Each computation was repeated 50 times.
The length of decreasing temperature of the SA algorithm is set to 75, 500, and denoted as the SA1 and SA2 algorithm, respectively.The SA2 algorithm shows the slowest decreasing schedule.It uses 500 steps to drop from the highest temperature to the lowest one.Thus almost all 50 results converge to the global maximum (about 94% percent of computation results escape from local optima and reaches the global maximum).The SA1 algorithm uses a rapidly decreasing schedule, and only about half of the 50 results converge to the global maximum (about 54% percent of computation results escape from local optima and reaches the global maximum).By contrast, the MTMSA algorithm only uses 25 steps in decreasing temperature, but all of the 50 results converge to the global maximum.Figure 3 shows the decreasing schedules and convergence paths of the three algorithms.We find that when the temperature decreases to about 0.02 (corresponding to the 50th, 400th, and 20th steps in SA1, SA2, and MTMSA), all sample paths from the three algorithms converge to their local and global optima.All the sample paths of MTMSA converge to the global optima, and lots of sample paths of SA1 and SA2 converge to the local optima because the average sample path of MTMSA in Figure 3 is the highest and at the level about 2.43.The MTMSA algorithm uses the rapidly decreasing schedule and achieves the fastest convergence rate.Therefore, the MTMSA algorithm is the most efficient and accurate in this simulation example.

Real Data Examples
4.1.Bayesian Analysis Using MTMSA.In this section, we illustrate the application of the MTMSA algorithm in Bayesian analysis with real data from [29].In this example, we fit a nonlinear model derived from exponential decay (7) with a fixed rate that is constant to a real data set [30] (Table 2).
The variables BOD (mg/I)) and time (days) in Table 2 are the response and control variables in model (7) (denoted as the BOD problem) with a constant variance  2 for independent normal errors.The likelihood for the BOD problem is where  = ( 1 , . . .,  6 ) and  = ( 1 , . . .,  6 ).
Step   While choosing the flat prior for the parameters  2 and ( 1 ,  2 ) (i.e., the uniform distribution in (0, +∞) and [−20, 50] × [−2, 6], respectively) and integrating out  2 , we obtain the following (improper) posterior distribution of ( 1 ,  2 ): where For a Bayesian analysis, one often treats the parameters ( 1 ,  2 ) as random variables.In this work, we use the posterior distribution of ( 1 ,  2 ) for their statistical inference and use the posterior mode of (9) as the estimation of ( 1 ,  2 ), which coincides with the maximum likelihood estimation.The Bayesian statistical inference of the parameters In addition, we use the MTMSA algorithm to compute the global optimization problem (11).The parameters of the MTMSA algorithm are set to be   = 20,   = 25,  = 0.6,   = 1, and   = 1000.The computation is then repeated 20 times.Figure 4 and Table 3 illustrate the decreasing temperature schedule and the convergence paths of 20 repetitions from the MTMSA algorithm.After 20 steps, all 20 computation paths become convergent to 1.48 × 10 −3 , which has the largest mean and smallest sd.
Figure 5 shows the mesh (a) and contour (b) plots of the posterior distribution (9). Figure 6 and Table 4 show the locations of the scatters ( 1 ,  2 ) from 20 repetitions at different temperature steps.With the temperature decreasing from 0.6 to 2.8×10 −6 , all scatters converge to the optimization point (19.15,0.53).

Half-Space Depth Computation
Using MTMSA.As a powerful tool for nonparametric multivariate analysis, halfspace depth (HD also known as Tukey depth) has been eliciting increased interest since it was introduced by Tukey [31,32].HD, which extends univariate order-related statistics to multivariate settings, provides a center-outward ordering    of multivariate samples and visualizes data in high dimensional cases [33,34].However, the computation of HD is challenging, and the exact algorithm is often inefficient, especially when the dimension is high [35].In this subsection, we use MTMSA to compute HD and compared MTMSA with other approximated and exact algorithms.
Given a sample data set of size  X  = {X 1 , X 2 , . . ., X  } in R  , x is a point in R  , and the HD of x with respect to (w.r.t.) X  is defined by where S −1 = { ∈ R  | ‖‖ = 1}, N = {1, 2, . . ., }, and #{⋅} denotes the counting measure.Then, the computation of HD ( 12) is a global optimization problem in S −1 .Next, we considered a concrete data set (Table 6) obtained from [35] and can be found in the Records Office of the Laboratory School of the University of Chicago.The original data consisted of 64 subjects' scores obtained from eighthgrade levels to eleventh-grade levels.Then, we compared MTMSA with three approximated algorithms (NM, QN, and SA) and the exact algorithm from [35] for the HD computation of the first data point w.r.t. the data set.
We tested two sets of parameters for the SA algorithm.The first is   = 20,   = 50,   = 1, and  = 0.7 and denoted as the SA1 algorithm.The second one is  r = 20,   = 200,   = 1, and  = 0.7 and denoted as the SA2 algorithm.For the MTMSA algorithm, we set the parameter to be   = 20,  = 100,   = 30,   = 1, and  = 0.7.The three algorithms (SA1, SA2, and MTMSA) use the same decreasing temperature schedule.Then, we used the six algorithms (exact, NM, QN, SA1, SA2, and MTMSA) for this computation and repeated the computation 50 times.Figure 7 and Table 5 show the computation results.Figure 7 and Table 5 show that the exact algorithm consumed the most CPU time (about 2450 seconds) and obtained exact computation results (0.2344).MTMSA also obtained the exact results but consumed only 15.87 seconds.The SA algorithms (SA1 and SA2) consumed suitable CPU time (5.741 and 23.103 seconds, respectively) but obtained only 6 and 24 exact results, respectively.The results of NM and QN fell into the local optima because all of them were larger than the exact result.With regard to the average CPU time, MTMSA used only 0.3174 for the computation of one exact result, which is the least amount of time compared with the time for the other exact and approximated algorithms.Hence, MTMSA outperformed the other algorithms in this experiment example.

Conclusions
We developed the MTMSA algorithm for global optimization problems in the fields of mathematical/biological sciences, engineering, Bayesian data analysis, operational research, life sciences, and so on.The MTMSA algorithm is a combination of the SA algorithm and the MTM algorithm.Using simulated and real data examples, it demonstrated that, relative to the QN and SA algorithm, the MTMSA algorithm can function with a rapidly decreasing cooling schedule while guaranteeing that the global energy minima are reached.Several directions can be taken for future work.First, combined with the quasi-Monte Carlo method, the lowdiscrepancy sequences and experimental design [36][37][38][39] can be used to accelerate the convergence of the SA algorithm.Second, aside from the MTM algorithm, the MTMSA algorithm can also be implemented with several parallel interacting Markov chains to improve the SA algorithm by making full use of modern multicore computer [40,41].Third, we anticipate that a parallel SA algorithm can be used efficiently for variable selection in high dimensional cases [42][43][44][45] because the variable selection problem is a special case of the optimization problem.Finally, data depth [32,33,35,46] is an important tool for multidimensional data analysis, but the computation of data depth in high dimensional cases is challenging.The example of half-space depth computation in Section 4 shows the advantage of the MTMSA algorithm in low dimensional case.Hence, we believe that the MTMSA algorithm can be successfully applied to compute highly complex data depths (e.g., projection and regression depths) in high dimensional cases.Further analysis along these directions would be interesting.
(a) Mesh plot of objective function Contour plot of objective function

Figure 1 :
Figure 1: Mesh and contour plot of the objective function.
0.1 and ( 1 ,  2 , . . .,  20 ), which are the weights of the 20 normal models, are chosen to be the arithmetic progression from 1 to 5, except the last one  20 = 10.The 20 mean vectors are independently sampled from the uniform distribution from [0, 10].
The convergence paths of different algorithms

Figure 3 :
Figure 3: Decreasing temperature schedules (a) and convergence paths (b) of the SA1 algorithm, SA2 algorithm, and MTMSA algorithm.The convergence paths are the average of 50 paths.

Figure 4 :
Figure 4: The decreasing temperature schedule (a) and the convergence paths of 20 repetitions (b) from the MTMSA algorithm.

Table 1 :
Computation results (mean, sd, MSE, consumed total CPU time, and average CPU time (in seconds)) of the 50 computations from different algorithms.
global maximum point (0.183, 7.501), and the mean of the SA algorithm is 1.7523.By contrast, the MTMSA algorithm functions with a rapidly decreasing temperature schedule.The MTMSA algorithm consumes minimal CPU time (only about 8 min), but it yields highly accurate results (all 50 results converge to the global maximum).Furthermore, the MTMSA algorithm only needs approximately 10 seconds to compute one accurate result, whereas the SA algorithm requires about 130 seconds.All results from NM and QN algorithms suffer from the "local-trap" problem.

Table 2 :
Biochemical oxygen demand versus time.

Table 5 :
Computation results (mean, sd, MSE, consumed total CPU time, and average CPU time (in seconds)) of the 50 computations from different algorithms.

Table 6 :
Concrete data set.