A Cuckoo Search Algorithm for Multimodal Optimization

Interest in multimodal optimization is expanding rapidly, since many practical engineering problems demand the localization of multiple optima within a search space. On the other hand, the cuckoo search (CS) algorithm is a simple and effective global optimization algorithm which can not be directly applied to solve multimodal optimization problems. This paper proposes a new multimodal optimization algorithm called the multimodal cuckoo search (MCS). Under MCS, the original CS is enhanced with multimodal capacities by means of (1) the incorporation of a memory mechanism to efficiently register potential local optima according to their fitness value and the distance to other potential solutions, (2) the modification of the original CS individual selection strategy to accelerate the detection process of new local minima, and (3) the inclusion of a depuration procedure to cyclically eliminate duplicated memory elements. The performance of the proposed approach is compared to several state-of-the-art multimodal optimization algorithms considering a benchmark suite of fourteen multimodal problems. Experimental results indicate that the proposed strategy is capable of providing better and even a more consistent performance over existing well-known multimodal algorithms for the majority of test problems yet avoiding any serious computational deterioration.


Introduction
Optimization is a field with applications in many areas of science, engineering, economics, and others, where mathematical modelling is used [1]. In general, the goal is to find an acceptable solution of an objective function defined over a given search space. Optimization algorithms are usually broadly divided into deterministic and stochastic ones [2]. Since deterministic methods only provide a theoretical guarantee of locating a local minimum for the objective function, they often face great difficulties in solving optimization problems [3]. On the other hand, stochastic methods are usually faster in locating a global optimum [4]. Moreover, they adapt easily to black-box formulations and extremely ill-behaved functions, whereas deterministic methods usually rest on at least some theoretical assumptions about the problem formulation and its analytical properties (such as Lipschitz continuity) [5].
Evolutionary algorithms (EA), which are considered to be members of the stochastic group, have been developed by a combination of rules and randomness that mimics several natural phenomena. Such phenomena include evolutionary processes such as the evolutionary algorithm (EA) proposed by Fogel et al. [6], de Jong [7], and Koza [8]; the genetic algorithm (GA) proposed by Holland [9] and Goldberg [10]; the artificial immune system proposed by de Castro and von Zuben [11]; and the differential evolution algorithm (DE) proposed by Storn and Price [12]. Some other methods which are based on physical processes include simulated annealing proposed by Kirkpatrick et al. [13], the electromagnetismlike algorithm proposed by Birbil and Fang [14], and the gravitational search algorithm proposed by Rashedi et al. [15]. Also, there are other methods based on the animalbehavior phenomena such as the particle swarm optimization (PSO) algorithm proposed by Kennedy and Eberhart [16] and the ant colony optimization (ACO) algorithm proposed by Dorigo et al. [17].
Most of research work on EA aims for locating the global optimum [18]. Despite its best performance, a global optimum may be integrated by parameter values that are considered impractical or prohibitively expensive, limiting their adoption into a real-world application. Therefore, from a practical point of view, it is desirable to have access to not only the global optimum but also as many local optima as possible (ideally all of them). Under such circumstances, a local optimum with acceptable performance quality and 2 The Scientific World Journal modest cost may be preferred over a costly global solution with marginally better performance [19]. The process of finding the global optimum and multiple local optima is known as multimodal optimization.
EA perform well for locating a single optimum but fail to provide multiple solutions [18]. Several methods have been introduced into the EA scheme to achieve multimodal optimization, such as fitness sharing [20][21][22], deterministic crowding [23], probabilistic crowding [22,24], clustering based niching [25], clearing procedure [26], species conserving genetic algorithm [27], and elitist-population strategies [28]. However, most of these methods have difficulties that need to be overcome before they can be employed successfully to multimodal applications. Some identified problems include difficulties in tuning some niching parameters, difficulties in maintaining discovered solutions in a run, extra computational overheads, and poor scalability when dimensionality is high. An additional problem represents the fact that such methods are devised for extending the search capacities of popular EA such as GA and PSO, which fail in finding a balance between exploration and exploitation, mainly for multimodal functions [29]. Furthermore, they do not explore the whole region effectively and often suffer premature convergence or loss of diversity.
As alternative approaches, other researchers have employed artificial immune systems (AIS) to solve multimodal optimization problems. Some examples are the clonal selection algorithm [30] and the artificial immune network (AiNet) [31,32]. Both approaches use operators and structures which attempt to find multiple solutions by mimicking the natural immune system's behavior.
Every EA needs to address the issue of exploration and exploitation of the search space [33]. Exploration is the process of visiting entirely new points of a search space, whilst exploitation is the process of refining those points within the neighborhood of previously visited locations in order to improve their solution quality. Pure exploration degrades the precision of the evolutionary process but increases its capacity to find new potential solutions. On the other hand, pure exploitation allows refining existent solutions but adversely driving the process to local optimal solutions.
Multimodal optimization requires a sufficient amount of exploration of the population agents in hyperspace so that all the local and global attractors can be successfully and quickly detected [34,35]. However, an efficient multimodal optimization algorithm should exhibit not only good exploration tendency but also good exploitative power, especially during the last stages of the search, because it must ensure accurately a distributed convergence to different optima in the landscape. Therefore, the ability of an EA to find multiple solutions depends on its capacity to reach a good balance between the exploitation of found-so-far elements and the exploration of the search space [36]. So far, the explorationexploitation dilemma has been an unsolved issue within the framework of EA.
Recently, a novel nature-inspired algorithm, called the cuckoo search (CS) algorithm [37], has been proposed for solving complex optimization problems. The CS algorithm is based on the obligate brood-parasitic strategy of some cuckoo species. One of the most powerful features of CS is the use of Lévy flights to generate new candidate solutions. Under this approach, candidate solutions are modified by employing many small changes and occasionally large jumps. As a result, CS can substantially improve the relationship between exploration and exploitation, still enhancing its search capabilities [38]. Recent studies show that CS is potentially far more efficient than PSO and GA [39]. Such characteristics have motivated the use of CS to solve different sorts of engineering problems such as mesh generation [40], embedded systems [41], steel frame design [42], scheduling problems [43], thermodynamics [44], and distribution networks [45].
This paper presents a new multimodal optimization algorithm called the multimodal cuckoo search (MCS). The method combines the CS algorithm with a new memory mechanism which allows an efficient registering of potential local optima according to their fitness value and the distance to other potential solutions. The original CS selection strategy is mainly conducted by an elitist decision where the best individuals prevail. In order to accelerate the detection process of potential local minima in our method, the selection strategy is modified to be influenced by individuals that are contained in the memory mechanism. During each generation, eggs (individuals) that exhibit different positions are included in the memory. Since such individuals could represent to the same local optimum, a depuration procedure is also incorporated to cyclically eliminate duplicated memory elements. The performance of the proposed approach is compared to several state-of-the-art multimodal optimization algorithms considering a benchmark suite of 14 multimodal problems. Experimental results indicate that the proposed strategy is capable of providing better and even more consistent performance over the existing well-known multimodal algorithms for the majority of test problems avoiding any serious computational deterioration.
The paper is organized as follows. Section 2 explains the cuckoo search (CS) algorithm, while Section 3 presents the proposed MCS approach. Section 4 exhibits the experimental set and its performance results. Finally, Section 5 establishes some concluding remarks.

Cuckoo Search (CS) Method
CS is one of the latest nature-inspired algorithms, developed by Yang and Deb [37]. CS is based on the brood parasitism of some cuckoo species. In addition, this algorithm is enhanced by the so-called Lévy flights [46], rather than by simple isotropic random walks. Recent studies show that CS is potentially far more efficient than PSO and GA [39].
Cuckoo birds lay their eggs in the nests of other host birds (usually other species) with amazing abilities such as selecting nests containing recently laid eggs and removing existing eggs to increase the hatching probability of their own eggs. Some of the host birds are able to combat this parasitic behavior of cuckoos and throw out the discovered alien eggs or build a new nest in a distinct location. This cuckoo breeding analogy is used to develop the CS algorithm. Natural systems are complex, and therefore they cannot be modeled exactly by a computer algorithm in its basic form. Simplification of The Scientific World Journal 3 natural systems is necessary for successful implementation in computer algorithms. Yang and Deb [39] simplified the cuckoo reproduction process into three idealized rules.
(1) An egg represents a solution and is stored in a nest.
An artificial cuckoo can lay only one egg at a time.
(2) The cuckoo bird searches for the most suitable nest to lay the eggs in (solution) to maximize its eggs' survival rate. An elitist selection strategy is applied, so that only high-quality eggs (best solutions near the optimal value) which are more similar to the host bird's eggs have the opportunity to develop (next generation) and become mature cuckoos.
(3) The number of host nests (population) is fixed. The host bird can discover the alien egg (worse solutions away from the optimal value) with a probability of ∈ [0, 1], and these eggs are thrown away or the nest is abandoned and a completely new nest is built in a new location. Otherwise, the egg matures and lives to the next generation. New eggs (solutions) laid by a cuckoo choose the nest by Lévy flights around the current best solutions.
From the implementation point of view, in the CS operation, a population, E ({e 1 , e 2 , . . . , e }), of eggs (individuals) is evolved from the initial point ( = 0) to a total gen number iterations ( = 2 ⋅ gen). Each egg, e ( ∈ [1, . . . , ]), represents an -dimensional vector, { ,1 , ,2 , . . . , , }, where each dimension corresponds to a decision variable of the optimization problem to be solved. The quality of each egg, e (candidate solution), is evaluated by using an objective function, (e ), whose final result represents the fitness value of e .Three different operators define the evolution process of CS: (A) Lévy flight, (B) replacement of some nests by constructing new solutions, and (C) elitist selection strategy.

Lévy Flight (A).
One of the most powerful features of cuckoo search is the use of Lévy flights to generate new candidate solutions (eggs). Under this approach, a new candidate solution, e +1 ( ∈ [1, . . . , ]), is produced by perturbing the current e with a change of position c . In order to obtain c , a random step, s , is generated by a symmetric Lévy distribution. For producing s , Mantegna's algorithm [47] is employed as follows: where u ({ 1 , . . . , }) and k ({V 1 , . . . , V }) are -dimensional vectors and = 3/2. Each element of u and v is calculated by considering the following normal distributions: where Γ(⋅) represents the gamma distribution. Once s has been calculated, the required change of position c is computed as follows: where the product ⊕ denotes entrywise multiplications whereas e best is the best solution (egg) seen so far in terms of its fitness value. Finally, the new candidate solution, e +1 , is calculated by using  1]. In order to implement this operation, a uniform random number, 1 , is generated within the range [0, 1]. If 1 is less than , the individual e is selected and modified according to (5). Otherwise, e remains without change. This operation can be resumed by the following model: where rand is a random number normally distributed, whereas 1 and 2 are random integers from 1 to .

Elitist Selection Strategy (C).
After producing e +1 either by operator A or by operator B, it must be compared with its past value e . If the fitness value of e +1 is better than e , then e +1 is accepted as the final solution. Otherwise, e is retained. This procedure can be resumed by the following statement: This elitist selection strategy denotes that only high-quality eggs (best solutions near the optimal value) which are more similar to the host bird's eggs have the opportunity to develop (next generation) and become mature cuckoos.

Complete CS Algorithm.
CS is a relatively simple algorithm with only three adjustable parameters: , the population size, , and the number of generations gen. According to Yang and Deb [39], the convergence rate of the algorithm is not strongly affected by the value of and it is suggested to use = 0.25. The operation of CS is divided in two parts: initialization and the evolution process. In the initialization ( = 0), the first population, E 0 ({e 0 1 , e 0 2 , . . . , e 0 }), is produced. The values, { 0 ,1 , 0 ,2 , . . . , 0 , }, of each individual, e , are randomly and uniformly distributed between the prespecified 4 The Scientific World Journal (1) Input: , N and gen (2) In the evolution process, operators A (Lévy flight), B (replacement of some nests by constructing new solutions), and C (elitist selection strategy) are iteratively applied until the number of iterations = 2 ⋅ gen has been reached. The complete CS procedure is illustrated in Algorithm 1.
From Algorithm 1, it is important to remark that the elitist selection strategy (C) is used two times, just after operator A or operator B is executed.

The Multimodal Cuckoo Search (MCS)
In CS, individuals emulate eggs which interact in a biological system by using evolutionary operations based on the breeding behavior of some cuckoo species. One of the most powerful features of CS is the use of Lévy flights to generate new candidate solutions. Under this approach, candidate solutions are modified by employing many small changes and occasionally large jumps. As a result, CS can substantially improve the relationship between exploration and exploitation, still enhancing its search capabilities. Despite such characteristics, the CS method still fails to provide multiple solutions in a single execution. In the proposed MCS approach, the original CS is adapted to include multimodal capacities. In particular, this adaptation contemplates (1) the incorporation of a memory mechanism to efficiently register potential local optima according to their fitness value and the distance to other potential solutions, (2) the modification of the original CS individual selection strategy to accelerate the detection process of new local minima, and (3) the inclusion of a depuration procedure to cyclically eliminate duplicated memory elements.
In order to implement these modifications, the proposed MCS divides the evolution process in three asymmetric states. The first state ( = 1) includes 0 to 50% of the evolution process. The second state ( = 2) involves 50 to 75%. Finally, the third state ( = 3) lasts from 75 to 100%. The idea of this division is that the algorithm can react in a different manner depending on the current state. Therefore, in the beginning of the evolutionary process, exploration can be privileged, while, at the end of the optimization process, exploitation can be favored. Figure 1 illustrates the division of the evolution process according to MCS.
The next sections examine the operators suggested by MCS as adaptations of CS to provide multimodal capacities. These operators are (D) the memory mechanism, (E) new selection strategy, and (F) depuration procedure.

Memory Mechanism (D).
In the MCS evolution process, a population, E ({e 1 , e 2 , . . . , e }), of eggs (individuals) is evolved from the initial point ( = 0) to a total gen number iterations ( = 2 ⋅ gen). Each egg, e ( ∈ [1, . . . , ]), represents an -dimensional vector, { ,1 , ,2 , . . . , , }, where each dimension corresponds to a decision variable of the optimization problem to be solved. The quality of each egg, e (candidate solution), is evaluated by using an objective function, (e ), whose final result represents the fitness value of e . During the evolution process, MCS maintains also the best, e best, , and the worst, e worst, , eggs seen so far, such that Global and local optima possess two important characteristics: (1) they have a significant good fitness value and (2) The Scientific World Journal 5 they represent the best fitness value inside a determined neighborhood. Therefore, the memory mechanism allows efficiently registering potential global and local optima during the evolution process, involving a memory array, M, and a storage procedure. M stores the potential global and local optima, {m 1 , m 2 , . . . , m }, during the evolution process, with being the number of elements so far that are contained in the memory M. On the other hand, the storage procedure indicates the rules that the eggs, {e 1 , e 2 , . . . , e }, must fulfill in order to be captured as memory elements. The memory mechanism operates in two phases: initialization and capture.

Initialization Phase.
This phase is applied only once within the optimization process. Such an operation is achieved in the null iteration ( = 0) of the evolution process. In the initialization phase, the best egg, e , of E 0 , in terms of its fitness value, is stored in the memory M (m 1 =e ), where e = arg min ∈{1,2,..., } ( (e 0 )), for a minimization problem.

Capture
Phase. This phase is applied from the first ( = 1) iteration to the last iteration ( = 2, 3, . . . , 2 ⋅ gen), at the end of each operator (A and B). At this stage, eggs, {e 1 , e 2 , . . . , e }, corresponding to potential global and local optima are efficiently registered as memory elements, {m 1 , m 2 , . . . , m }, according to their fitness value and the distance to other potential solutions. In the operation, each egg, e , of E is tested in order to evaluate if it must be captured as a memory element. The test considers two rules: (1) significant fitness value rule and (2) nonsignificant fitness value rule.
Significant Fitness Value Rule. Under this rule, the solution quality of e is evaluated according to the worst element, m worst , that is contained in the memory M, where m worst = arg max ∈{1,2,..., } ( (m )), in case of a minimization problem. If the fitness value of e is better than m worst ( (e ) < (m worst )), e is considered potential global and local optima. The next step is to decide whether e represents a new optimum or it is very similar to an existent memory element, {m 1 , m 2 , . . . , m } (if it is already contained in the memory M). Such a decision is specified by an acceptance probability function, Pr( , , ), that depends, on one side, on the distances , from e to the nearest memory element m and, on the other side, on the current state of the evolution process (1, 2, and 3). Under Pr( , , ), the probability that e would be part of M increases as the distance , enlarges. Similarly, the probability that e would be similar to an existent memory element {m 1 , m 2 , . . . , m } increases as , decreases. On the other hand, the indicator that relates a numeric value with the state of the evolution process is gradually modified during the algorithm to reduce the likelihood of accepting inferior solutions. The idea is that in the beginning of the evolutionary process (exploration), large distance differences can be considered, while only small distance differences are tolerated at the end of the optimization process.
In order to implement this procedure, the normalized distance , ( ∈ [1, . . . , ]) is calculated from e to all the elements of the memory M {m 1 , m 2 , . . . , m }. , is computed as follows: where { ,1 , ,2 , . . . , , } represent the components of the memory element m , whereas high and low indicate the low parameter bound and the upper parameter bound ( ∈ [1, . . . , ]), respectively. One important property of the normalized distance , is that its values fall into the interval [0, 1]. By using the normalized distance , the nearest memory element m to e is defined, with m = arg min ∈{1,2,..., } ( , ). Then, the acceptance probability function Pr( , , ) is calculated by using the following expression: In order to decide whether e represents a new optimum or it is very similar to an existent memory element, a uniform random number 1 is generated within the range [0, 1]. If 1 is less than Pr( , , ), the egg e is included in the memory M as a new optimum. Otherwise, it is considered that e is similar to m . Under such circumstances, the memory M is updated by the competition between e and m , according to their corresponding fitness values. Therefore, e would replace m in case (e ) is better than (m ). On the other hand, if (m ) is better than (e ), m remains with no change. The complete procedure of the significant fitness value rule can be resumed by the following statement: with probability 1 − Pr ( , , ) .
In order to demonstrate the significant fitness value rule process, Figure 2 illustrates a simple minimization problem that involves a two-dimensional function, (x) (x = { 1 , 2 }). As an example, it assumed a population, E , of two different particles (e 1 ,e 2 ), a memory with two memory 6 The Scientific World Journal (1) Input: e , e best, , e worst, (2) Calculate (e , e best, , e worst, ) = 1 − ( (e ) − (e best, )) / ( (e worst, ) − (e best, ) ) if (rand(0, 1) ≤ ) then (6) e is considered a local optimum With probability (7) else (8) e is ignored With probability 1 − (9) end if Algorithm 2: Nonsignificant fitness value rule procedure. elements (m 1 ,m 2 ), and the execution of the first state ( = 1). According to Figure 2, both particles e 1 and e 2 maintain a better fitness value than m 1 , which possesses the worst fitness value of the memory elements. Under such conditions, the significant fitness value rule must be applied to both particles. In case of e 1 , the first step is to calculate the correspondent distances 1,1 and 1,2 . m 1 represents the nearest memory element to e 1 . Then, the acceptance probability function Pr( 1,1 , 1) is calculated by using (10). Since the value of Pr( 1,1 , 1) is high, there exists a great probability that e 1 becomes the next memory element (m 3 = e 1 ). On the other hand, for e 2 , m 2 represents the nearest memory element. As Pr( 2,2 , 1) is very low, there exists a great probability that e 2 competes with m 2 for a place within M. In such a case, m 2 remains with no change considering that (m 2 ) < (e 2 ).
Nonsignificant Fitness Value Rule. Different to the significant fitness value rule, the nonsignificant fitness value rule allows capturing local optima with low fitness values. It operates if the fitness value of e is worse than m worst ( (e ) ≥ (m worst )). Under such conditions, it is necessary, as a first step, to test which particles could represent local optima and which must be ignored as a consequence of their very low fitness value. Then, if the particle represents a possible local optimum, its inclusion inside the memory M is explored.
The decision on whether e represents a new local optimum or not is specified by a probability function, , which is based on the relationship between (e ) and the so far valid fitness value interval ( (e worst, ) − (e best, )). Therefore, the probability function is defined as follows: where e best, and e worst, represent the best and worst eggs seen so far, respectively. In order to decide whether p represents a new local optimum or it must be ignored, a uniform random number, 2 , is generated within the range [0, 1]. If 2 is less than , the egg e is considered to be a new local optimum. Otherwise, it must be ignored. Under , the so far valid fitness value interval ( (e worst, ) − (e best, )) is divided into two sections: I and II (see Figure 3). Considering this division, the function assigns a valid probability (greater than zero) only to those eggs that fall into the zone of the best individuals (part I) in terms of their fitness value. Such a probability value increases as the fitness value improves. The complete procedure can be reviewed in Algorithm 2.
If the particle represents a possible local optimum, its inclusion inside the memory M is explored. In order to consider if e could represent a new memory element, another procedure that is similar to the significant fitness value rule process is applied. Therefore, the normalized distance , ( ∈ [1, . . . , ]) is calculated from p to all the elements of the memory M {m 1 , m 2 , . . . , m }, according to (9). Afterwards, the nearest distance , to e is determined.
The Scientific World Journal 7 Then, by using Pr( , , ) (10), the following rule can be thus applied: Under this rule, a uniform random number, 3 , is generated within the range [0, 1]. If 3 is less than Pr( , , ), the egg e is included in the memory M as a new optimum. Otherwise, the memory does not change.

New Selection Strategy (E).
The original CS selection strategy is mainly conducted by an elitist decision where the best individuals in the current population prevail. Such an operation, defined in this paper as operator C (Section 2.3), is executed two times, just after operators A and B in the original CS method. This effect allows incorporating interesting convergence properties to CS when the objective considers only one optimum. However, in case of multiple-optimum detection, such a strategy is not appropriate. Therefore, in order to accelerate the detection process of potential local minima in our method, the selection strategy is modified to be influenced by the individuals contained in the memory M.
Under the new selection procedure (operator E), the final population E +1 is built by considering the first element from the memory M instead of using the best individuals between the currents E +1 and E . In case of the number of elements in M is less than , the rest of the individuals are completed by considering the best elements from the current E +1 .

Depuration Procedure (F).
During the evolution process, the memory M stores several individuals (eggs). Since such individuals could represent the same local optimum, a depuration procedure is incorporated at the end of each state (1, 2, and 3) to eliminate similar memory elements. The inclusion of this procedure allows (a) reducing the computational overhead during each state and (b) improving the search strategy by considering only significant memory elements.
Memory elements tend to concentrate on optimal points (good fitness values), whereas element concentrations are enclosed by areas holding bad fitness values. The main idea in the depuration procedure is to find the distances among concentrations. Such distances, considered as depuration ratios, are later employed to delete all elements inside them, except for the best element in terms of their fitness values.
The method used by the depuration procedure in order to determine the distance between two concentrations is based on the element comparison between the concentration corresponding to the best element and the concentration of the nearest optimum in the memory. In the process, the best element m best in the memory is compared to a memory element, m , which belongs to one of both concentrations (where m best = arg min ∈{1,2,..., } ( (m ))). If the fitness value of the medium point, ((m best + m )/2), between both is not worse than both, ( (m best ), (m )), the element m is part of the same concentration of m best . However, if ((m best +m )/2) is worse than both, the element m is considered as part of the nearest concentration. Therefore, if m and m best belong to different concentrations, the Euclidian distance between m and m best can be considered as a depuration ratio. In order to avoid the unintentional deletion of elements in the nearest concentration, the depuration ratio is lightly shortened. Thus, the depuration ratio is defined as follows: The proposed depuration procedure only considers the depuration ratio between the concentration of the best element and the nearest concentration. In order to determine all ratios, preprocessing and postprocessing methods must be incorporated and iteratively executed. The preprocessing method must (1) obtain the best element m best from the memory in terms of its fitness value, (2) calculate the Euclidian distances from the best element to the rest of the elements in the memory, and (3) sort the distances according to their magnitude. This set of tasks allows identification of both concentrations: the one belonging to the best element and that belonging to the nearest optimum, so they must be executed before the depuration ratio calculation. Such concentrations are represented by the elements with the shortest distances to m best . Once has been calculated, it is necessary to remove all the elements belonging to the concentration of the best element. This task is executed as a postprocessing method in order to configure the memory for the next step. Therefore, the complete depuration procedure can be represented as an iterative process that at each step determines the distance of the concentration of the best element with regard to the concentration of the nearest optimum.
A special case can be considered when only one concentration is contained within the memory. This case can happen because the optimization problem has only one optimum or because all the other concentrations have been already detected. Under such circumstances, the condition where ((m best + m )/2) is worse than (m best ) and (m ) would be never fulfilled.
In order to find the distances among concentrations, the depuration procedure is conducted in Procedure 1.
At the end of the above procedure, the vector Y will contain the depurated memory which would be used in the next state or as a final result of the multimodal problem.
In order to illustrate the depuration procedure, Figure 4 shows a simple minimization problem that involves two different optimal points (concentrations). As an example, it assumed a memory, M, with six memory elements whose positions are shown in Figure 4(a). According to the depuration procedure, the first step is (1) to build the vector Z and (2) to calculate the corresponding distance Δ 1, among the elements. Following such operation, the vector Z and the set of distances are configured as Z = {m 5 , m 1 , m 3 , m 4 , m 6 , m 2 } and {Δ 1 1,2 , Δ 2 1,3 , Δ 3 1,5 , Δ 4 1,4 , Δ 5 1,6 }, respectively. Figure 4(b) shows the configuration of X where, for sake of easiness, only the two distances Δ 1 1,2 and Δ 3 1,5 have been represented. Then,

8
The Scientific World Journal (1) Define two new temporal vectors Z and Y. The vector Z will hold the results of the iterative operations whereas Y will contain the final memory configuration. The vector Z is initialized with the elements of M that have been sorted according to their fitness values, so that the first element represents the best one. On other hand, Y is initialized empty. (2) Store the best element z 1 of the current Z in Y.
(3) Calculate the Euclidian distances Δ 1, between z 1 and the rest of elements from Z ( ∈ {2, . . . , |Z|}), where |Z| represents the number of elements in Z. (4) Sort the distances Δ 1, according to their magnitude. Therefore, a new index is incorporated to each distance Δ 1, , where a indicate the place of Δ 1, after the sorting operation. ( = 1 represents the shortest distance). (5) Calculate the depuration ratio : for = 1 to |Z| − 1 Obtain the element z corresponding to the distance Δ 1, There is only one concentration end if end for (6) Remove all elements inside from Z.
if ( has changed) (12) M ← OperatorF(M) Section 3.3 (13) end if (14) end until the depuration ratio is calculated. This process is an iterative computation that begins with the shortest distance Δ 1 1,2 . The distance Δ 1 1,2 (see Figure 4(c)), corresponding to z 1 and z 2 , produces the evaluation of their medium point ((z 1 +z 2 )/2). Since ( ) is worse than (z 1 ) but not worse than (z 2 ), the element z 2 is considered to be part of the same concentration as z 1 . The same conclusion is obtained for Δ 2 1,3 in case of z 3 , after considering the point V. For Δ 3 1,5 , the point is produced. Since ( ) is worse than (z 1 ) and (z 5 ), the element z 5 is considered to be part of the concentration corresponding to the next optimum. The iterative process ends here, after assuming that the same result is produced with Δ 4 1,4 and Δ 5 1,6 , for z 4 and z 6 , respectively. Therefore, the depuration ratio is calculated as 85% of the distances Δ 3 1,5 . Once the elements inside of have been removed from Z, the same process is applied to the new Z. As a result, the final configuration of the memory is shown in Figure 4(d).

Experimental Results
This section presents the performance of the proposed algorithm beginning from Section 4.1 that describes the experimental methodology. For the sake of clarity, results are divided into two sections, Section 4.2 and Section 4.3, which report the comparison between the MCS experimental results and the outcomes produced by other multimodal metaheuristic algorithms.

Experimental Methodology.
This section examines the performance of the proposed MCS by using a test suite of fourteen benchmark functions with different complexities. Table 3 in the appendix presents the benchmark functions used in our experimental study. In the table, NO indicates the number of optimal points in the function and indicates the search space (subset of 2 ). The experimental suite contains some representative, complicated, and multimodal functions with several local optima. Such functions are considered complex entities to be optimized, as they are particularly challenging to the applicability and efficiency of multimodal metaheuristic algorithms. A detailed description of each function is given in the appendix. In the study, five performance indexes are compared: the effective peak number (EPN), the maximum peak ratio (MPR), the peak accuracy (PA), the distance accuracy (DA), and the number of function evaluations (NFE). The first four indexes assess the accuracy of the solution, whereas the last measures the computational cost.
The effective peak number (EPN) expresses the amount of detected peaks. An optimum o is considered as detected if the distance between the identified solution z and the optimum o is less than 0.01 (‖o −z ‖ < 0.01). The maximum peak ratio (MPR) is used to evaluate the quality and the number of identified optima. It is defined as follows: where represents the number of identified solutions (identified optima) for the algorithm under testing and represesnts the number of true optima contained in the function. The peak accuracy (PA) specifies the total error produced between the identified solutions and the true optima. Therefore, PA is calculated as follows: Peak accuracy (PA) may lead to erroneous results, mainly if the peaks are close to each other or hold an identical height. Under such circumstances, the distance accuracy (DA) is used to avoid such error. DA is computed as PA, but fitness values are replaced by the Euclidian distance. DA is thus defined by the following model: 10 The Scientific World Journal The number of function evaluations (NFE) indicates the total number of function computations that have been calculated by the algorithm under testing, through the overall optimization process. The experiments compare the performance of MCS against the crowding differential evolution (CDE) [22], the fitness sharing differential evolution (SDE) [21,22], the clearing procedure (CP) [26], the elitist-population strategy (AEGA) [28], the clonal selection algorithm (CSA) [30], and the artificial immune network (AiNet) [31].
Since the approach solves real-valued multimodal functions and a fair comparison must be assured, we have used for the GA approaches a consistent real coding variable representation and uniform operators for crossover and mutation. The crossover probability of = 0.8 and the mutation probability of = 0.1 have been used. We have employed the standard tournament selection operator with tournament size = 2 for implementing the sequential fitness sharing, the clearing procedure, and the elitist-population strategy (AEGA). On the other hand, the parameter values for the AiNet algorithm have been defined as suggested in [31], with the mutation strength of = 100, the suppression threshold of (aiNet) = 0.2, and the update rate of = 40%. Algorithms based on DE use a scaling factor of = 0.5 and a crossover probability of = 0.9. The crowding DE employs a crowding factor of CF = 50 and the sharing DE considers = 1.0 with a share radius of share = 0.1.
In case of the MCS algorithm, the parameters are set to = 0.25, the population size is = 50, and the number of generations is gen = 500. Once they have been all experimentally determined, they are kept for all the test functions through all experiments.
To avoid relating the optimization results to the choice of a particular initial population and to conduct fair comparisons, we perform each test 50 times, starting from various randomly selected points in the search domain as it is commonly done in the literature.
All algorithms have been tested in MatLAB© over the same Dell Optiplex GX520 computer with a Pentium-4 2.66G-HZ processor, running Windows XP operating system over 1 Gb of memory. The sections below present experimental results for multimodal optimization problems which have been divided into two groups. The first one considers functions 1 -7 , while the second gathers functions 1 -7 . This section presents a performance comparison for different algorithms solving the multimodal problems 1 -7 that are shown in Table 3. The aim is to determine whether MCS is more efficient and effective than other existing algorithms for finding all multiple optima of 1 -7 . All the algorithms employ a population size of 50 individuals using 500 successive generations. Table 1 provides a summarized performance comparison among several algorithms in terms of the effective peak number (EPN), the maximum peak ratio (MPR), the peak accuracy (PA), the distance accuracy (DA), and the number of function evaluations (NFE). The results are averaged by considering 50 different executions.

Comparing MCS Performance for Functions
Considering the EPN index, in all functions 1 -7 , MCS always finds better or equally optimal solutions. Analyzing results of function 1 , the CDE, AEGA, and the MCS algorithms reach all optima. In case of function 2 , only CSA and AiNet have not been able to detect all the optima values each time. Considering function 3 , only MCS can detect all optima at each run. In case of function 4 , most of the algorithms detect only half of the total optima but MCS can recognize most of them. Analyzing results of function 5 , CDE, CP, CSA, and AiNet present a similar performance whereas SDE, AEGA, and MCS obtain the best EPN values. In case of 6 , almost all algorithms present a similar performance; however, only the CDE, CP, and MCS algorithms have been able to detect all optima. Considering function 7 , the MCS algorithm is able to detect most of the optima whereas the rest of the methods reach different performance levels.
By analyzing the MPR index in Table 1, MCS has reached the best performance for all the multimodal problems. On the other hand, the rest of the algorithms present different accuracy levels, with CDE and SDE being the most consistent.
Considering thePA index, MCS presents the best performance. Since PA evaluates the accumulative differences of fitness values, it could drastically change when one or several peaks are not detected (function 3 ) or when the function under testing presents peaks with high values (function 4 ). For the case of the DA index in Table 1, it is evident that the MCS algorithm presents the best performance providing the shortest distances among the detected optima.
Analyzing the NFE measure in Table 1, it is clear that CSA and AiNet need fewer function evaluations than other algorithms considering the same termination criterion. This fact is explained by considering that both algorithms do not implement any additional process in order to detect multiple optima. On the other hand, the MCS method maintains a slightly higher number of function evaluations than CSA and AiNet due to the inclusion of the depuration procedure. The rest of the algorithms present a considerable higher NFE value.
It can be easily deduced from such results that the MCS algorithm is able to produce better search locations (i.e., a better compromise between exploration and exploitation) in a more efficient and effective way than other multimodal search strategies by using an acceptable number of function evaluations. 8 -14 . This section presents a performance comparison for different algorithms solving the multimodal problems 8 -14 that are shown in Table 3. The aim is to determine whether MCS is more efficient and effective than its competitors for finding multiple optima in 8 -14 . All the algorithms employ a population size of 50 individuals using 500 successive generations. Table 2 provides a summarized performance comparison among several algorithms in terms of the effective peak number (EPN), the maximum peak ratio (MPR), the peak accuracy (PA), the distance accuracy (DA), and the number  The Scientific World Journal          The Scientific World Journal   The goal of multimodal optimizers is to find as many as possible global optima and good local optima. The main objective in these experiments is to determine whether MCS is able to find not only optima with prominent fitness value, but also optima with low fitness values. Table 2 provides a summary of the performance comparison among the different algorithms.

Comparing MCS Performance for Functions
Considering the EPN measure, it is observed that MCS finds more optimal solutions for the multimodal problems 8 -14 than the other methods. Analyzing function 8 , only MCS can detect all optima whereas CP, AEGA, CSA, and AiNet exhibit the worst EPN performance.
Functions 9 -12 represent a set of special cases which contain a few prominent optima (with good fitness value). However, such functions present also several optima with bad fitness values. In these functions, MCS is able to detect the highest number of optimum points. On the contrary, the rest of algorithms can find only prominent optima.
For function 13 , four algorithms (CDE, SDE, CP, and MCS) can recognize all optima for each execution. In case of function 14 , numerous optima are featured with different fitness values. However, MCS still can detect most of the optima.
In terms of number of the maximum peak ratios (MPR), MCS has obtained the best score for all the multimodal problems. On the other hand, the rest of the algorithms present different accuracy levels.
A close inspection of Table 2 also reveals that the proposed MCS approach is able to achieve the smallest PA and DA values in comparison to all other methods.
Similar conclusions to those in Section 4.2 can be established regarding the number of function evaluations (NFE). All results demonstrate that MCS achieves the overall best balance in comparison to other algorithms, in terms of both the detection accuracy and the number of function evaluations.

Conclusions
The cuckoo search (CS) algorithm has been recently presented as a new heuristic algorithm with good results in realvalued optimization problems. In CS, individuals emulate eggs (contained in nests) which interact in a biological system by using evolutionary operations based on the breeding behavior of some cuckoo species. One of the most powerful features of CS is the use of Lévy flights to generate new candidate solutions. Under this approach, candidate solutions are modified by employing many small changes and occasionally large jumps. As a result, CS can substantially improve the relationship between exploration and exploitation, still enhancing its search capabilities. Despite such characteristics, the CS method still fails to provide multiple solutions in a single execution. In order to overcome such inconvenience, this paper proposes a new multimodal optimization algorithm called the multimodal cuckoo search (MCS). Under MCS, the original CS is enhanced with multimodal capacities by means of (1) incorporation of a memory mechanism to efficiently register potential local optima according to their fitness value and the distance to other potential solutions, (2) modification of the original CS individual selection strategy to accelerate the detection process of new local minima, and (3) inclusion of a depuration procedure to cyclically eliminate duplicated memory elements.
MCS has been experimentally evaluated over a test suite of the fourteen benchmark multimodal functions. The performance of MCS has been compared to some other existing algorithms including the crowding differential evolution (CDE) [22], the fitness sharing differential evolution (SDE) [21,22], the clearing procedure (CP) [26], the elitistpopulation strategy (AEGA) [28], the clonal selection algorithm (CSA) [30], and the artificial immune network (AiNet) [31]. All experiments have demonstrated that MCS generally outperforms all other multimodal metaheuristic algorithms in terms of both the detection accuracy and the number of function evaluations. The remarkable performance of MCS is explained by two different features: (i) operators (such as Lévy flight) allow a better exploration of the search space, increasing the capacity to find multiple optima, and (ii) the diversity of solutions contained in the memory M in the context of multimodal optimization is maintained and further improved through an efficient mechanism.