A Methodology for Classifying Search Operators as Intensification or Diversification Heuristics

Selection hyper-heuristics are generic search tools that dynamically choose, from a given pool, the most promising operator (lowlevel heuristic) to apply at each iteration of the search process. .e performance of these methods depends on the quality of the heuristic pool. Two types of heuristics can be part of the pool: diversification heuristics, which help to escape from local optima, and intensification heuristics, which effectively exploit promising regions in the vicinity of good solutions. An effective search strategy needs a balance between these two strategies. However, it is not straightforward to categorize an operator as intensification or diversification heuristic on complex domains. .erefore, we propose an automated methodology to do this classification..is brings methodological rigor to the configuration of an iterated local search hyper-heuristic featuring diversification and intensification stages. .e methodology considers the empirical ranking of the heuristics based on an estimation of their capacity to either diversify or intensify the search. We incorporate the proposed approach into a state-of-the-art hyper-heuristic solving two domains: course timetabling and vehicle routing. Our results indicate improved performance, including new bestknown solutions for the course timetabling problem.


Introduction
Hyper-heuristics are powerful tools for solving complex optimization problems [1][2][3]. e goal is to reduce the role of the human expert by means of more generally applicable search methodologies. Selection hyper-heuristics are high-level strategies that autonomously choose at run time the best-suited heuristic to apply at each step of the search process. A pool of heuristics to select from should be provided.
ere are, however, no clear guidelines in the literature about how to construct a potentially successful pool of heuristics [4]. It is well known that successful heuristic search methods should have a dynamic balance between diversification and intensification [5,6]. Diversification refers to exploring new promising areas of the search space, whereas intensification refers to focusing the search by exploiting the area nearby current good solutions. Move operators can thus be either predominantly diversifying or intensifying. However, for complex domains and solution representations, it is not straightforward to categorize a given operator as belonging to one of these two categories.
We consider an iterated local search hyper-heuristic for testing our proposed methodology. Iterated local search (Section 3.3) is a simple yet powerful search strategy [7]; it works by iteratively alternating between a diversification stage and an intensification stage. is algorithmic template makes it clear where to implement move operators of each category. Several variants of iterated local search hyper-heuristics have been proposed with good results [8,9]. Soria-Alcaraz et al. [10] proposed a methodology for determining the best subset of heuristics from a given pool, but they considered that all operators were intensification heuristics and the diversification stage contained a single fixed heuristic.
is article proposes an empirical methodology to classify a given operator as an intensification or diversification heuristic. e aim is to produce an effective operator pool in an iterated local search hyper-heuristic framework.
e idea is to automatically divide the given complete set of operators into two groups: diversification and intensification. en, these groups will be assigned to the respective stages of the hyper-heuristic framework. To the best of our knowledge, this is the first time such task is addressed in the literature. Our proposal is to have a preprocessing step where for a given problem domain, each heuristic in the pool is tested by means of a simple and fast probing technique, namely, a random walk with the given move operator. Based on this probing tool and statistical techniques, a set of measurements are collected indicating the effectiveness of each operator to either diversify or intensify the search. e proposed methodology is tested within a state-ofthe-art iterated local search hyper-heuristic [10] on two problem domains with different representations, namely, course timetabling and vehicle routing. Our results indicate improved performance, including new best-known solutions for the course timetabling problem. e next section introduces relevant concepts and algorithms. e proposed methodology is described in Section 3, detailing the distance metrics, ranking criteria, and high-level search method used. Sections 4 and 5 report the empirical setup, results and analysis for two selected case studies, course timetabling, and vehicle routing, respectively. Finally, Section 6 summarizes our findings and gives suggestions for future work.

Low-Level Heuristics.
Heuristics are simple problemsolving techniques or "rules-of-thumb" that aim to produce good enough solutions in a reasonable time.
is study considers perturbative heuristics also called move operators. Within a hyper-heuristic framework, these perturbative heuristics are called low-level, as they operate directly on the candidate solutions of the underlying optimization problem, leaving the higher-level decisions, such as which heuristic to apply next, to another mechanism. In this context, the lowlevel heuristics are below a domain barrier allowing a higherlevel strategy to operate on a wide range of problems. Lowlevel heuristics are thus simpler procedures that perform changes in the incumbent solution and inform a higher-level mechanism about their performance. Within a hyper-heuristic framework, not all move operators have the same role, some operators are aimed at intensifying the search around the incumbent solution, while others at exploring new regions of the search space with potential better solutions.

Fitness Landscape
Probing. Fitness landscapes constitute a widely used metaphor to describe the dynamics of search and optimization algorithms. Formally, a fitness landscape [11] is a triplet (S, N, f), where S is a set of potential solutions, i.e., a search space, N: S ⟶ 2 S , a neighborhood structure, is a function that assigns to every s ∈ S a set of neighbors N(s), and f: S ⟶ R is a fitness function that can be pictured as the height of the corresponding solutions. When several move operators are considered (as is the case of hyper-heuristics), each of them will induce a different fitness landscape. A common technique to gather fitness landscape's data is to conduct random walks. Formally, a random walk is a sequence of solutions ( Algorithm 1 outlines the random-walk probing technique we used to quantify the behavior of operators. e stopping condition is based on a fixed number of objective function calls. In line 3, a given low-level heuristic (move operator) is used to modify the incumbent solution, and after the stopping condition is reached, a function named metric is applied to compute a distance metric between the initial solution and the final solution.
e random-walk algorithm (Algorithm 1) is used to estimate the extent to which an operator changes the state of a solution. Several distance metrics (described in Section 3.1) are computed between the initial and the final solution of the walk and aggregated across several walks. e intuition is that the larger these measurements are, the more an operator is diversifying. ese metrics are used both to rank the heuristics from the less to more diversifying and to identify, by means of post hoc statistical tests, if it is possible to separate the heuristics into two sets.

Hyper-Heuristics.
Hyper-heuristics, initially conceptualized as heuristics to choose heuristics in [1], can be seen as methodologies that reduce the need for a human expert in designing effective solution schemes and, consequently, raise the level of generality at which search methodologies can operate. A recent definition considers a hyper-heuristic as "automated methodologies for selecting or generating heuristics to solve computational search problems" [3]. Two types of hyper-heuristics have been studied in the literature: (i) Heuristic selection: methodologies for choosing or selecting existing heuristics (ii) Heuristic generation: methodologies for generating new heuristics from given components is study focuses on the first type, i.e., selection hyperheuristics. Figure 1 illustrates the traditional framework for selective hyper-heuristics, with the domain barrier insulating the high-level search strategy from the underlying problem domain. e framework requires a pool of low-level heuristics from which the high-level strategy selects and applies to the incumbent solution. It is worth mentioning, however, that low-level heuristics usually encapsulate domain-specific information.
When a hyper-heuristic uses feedback from the search process to rank the performance of the pool of heuristics, it can be considered as a learning algorithm. According to the source of the feedback during learning, we can distinguish between online and offline learning hyper-heuristics [2].

Complexity
Online learning or adaptation takes place on-the-fly when the algorithm is solving a problem instance, while offline learning requires a training process a priori of the execution of the hyper-heuristic. In this work, we use online learning through adaptive operator selection.

Methodology
Once a pool of low-level heuristics is selected for the problem domain under consideration, our approach studies the behavior of each heuristic separately. is is done by conducting several runs of the random-walk algorithm (Algorithm 1) with each operator from a fixed set of initial solutions generated uniformly at random. ese runs produce a set of measurements that are used to rank operators from the less perturbative to most perturbative. e operators producing low distance measurements will be ranked top, while those producing high distance measurement will be ranked at the bottom. e top-ranked operators are categorized as intensification operators as they are the less perturbative. Section 3.1 describes the distance metrics used to measure the behavior of heuristics. We considered two genotype-based metrics and one fitness-based metric. e genotype-based metrics measure the solution differences in representation space, while the fitness-based metric measures the solution difference in fitness value.
Once the measurements are taken, the operators are ranked using nonparametric ranking techniques. We use three ranking statistical methods: Friedman, aligned Friedman, and Quade, detailed in Section 3.2, following the guidelines in [12]. ereafter, we apply post hoc procedures in order to check for statistical significant differences between pairs of operator rankings. We identify those heuristics that do not show a statistical difference in their ranking. ereafter, we separate heuristics into two groups: intensification heuristics and diversification heuristics. Finally, we execute the high-level iterated local search hyperheuristic using the two groups of heuristics identified. e sections below describe in more detail the components and steps of our proposed methodology.

Distance Metrics.
We consider three distance metrics; the first two operate at the solution (genotype) level, while the last metric simply calculates the fitness (objective function) difference.
Hamming distance: it measures the number of substitutions required to change one string into the other [13]. Formally, it is a metric on R n on two strings x 1 , x 2 , . . . , x n and y 1 , y 2 , . . . , y n of length n over a q − ary alphabet 0, 1, . . . , q − 1 described in the following equation: Lee distance: given two strings x 1 , x 2 , . . . , x n and y 1 , y 2 , . . . , y n of length n over a q − ary alphabet 0, 1, . . . , q − 1 of size q ≥ 2 [14,15]; this distance is defined as Fitness distance: it measures the difference in the fitness value between two given solutions using the absolute value operator. In our study, this metric evaluates how much the fitness of an initial solution is affected by a given heuristic. Equation (3) defines this metric. Given two strings x � (x 1 , x 2 , . . . , x n ) and y � (y 1 , y 2 , . . . , y n ) of length n over a q − ary alphabet 0, 1, . . . , q − 1,  Figure 1: General framework of a selection hyper-heuristic based on [1].
Require: metric: Solution × Solution ⟶ R, InitialSolution: a given solution to work with, h i : Heuristic or operator.
is function is domain-dependent and corresponds to a quality measurement of solutions, the objective of the optimization process.
We selected the two distance metrics at the solution level as they are well known in the study of meta-heuristics and offer a simple yet descriptive approach to measure the variation induced by the different operators. e Hamming distance measures the number of raw differences between two given solutions. For example, let us consider two integer-based strings, namely, "5623" and "5827"; the Hamming distance between these strings is 2 since they differ in two locations. e Lee distance is more sophisticated since it reports not only how many differences are there between two given strings but also how large these differences are. Consider the same two strings, "5623" and "5827"; the Lee distance between them is 6 because |6 − 8| + |3 − 7| � 2 + 4 � 6. For binary strings, the Hamming and Lee distances produce the same values. In our study, solutions are represented as integer-based strings. e third metric gauges the solution differences in terms of quality or fitness, this is complementary to two genotype-based metrics. e stopping condition in this phase is fixed to 200 function calls. On each test instance, the random-walk algorithm is run 500 times from different initial randomly generated solutions. is produces 500 distance values for each metric, and the average distance values are calculated. Finally, these average values are used to rank the heuristics from less to more perturbative in the next phase.

Ranking Heuristics with Nonparametric Statistics.
Once the distance metrics are gathered, we apply statistical tests in order to rank the heuristics. Parametric tests are sometimes used when contrasting the performance of stochastic algorithms. However, they assume independence, normality, and homoscedasticity of the data, which are not guaranteed in the case of low-level heuristics. In such cases, nonparametric statistics overcome this limitation. We used CONTROLTEST and MULTITEST [12] which are specially designed for nonparametric comparison among heuristic algorithms and considered the three tests proposed: Friedman, aligned Friedman, and Quade. e Friedman test uses mainly arithmetic mean, alignment Friedman uses a value of location computed as the average performance achieved by all heuristics in each problem, and Quade considers that some instances might be more difficult than others. All these tests consider ranks; therefore, the lower distance value will produce a higher rank. In our context, this means that heuristics with less perturbative behavior (i.e., intensification heuristics) will be ranking top.
We applied post hoc tests in order to determine if two operators are similar in terms of their perturbative behavior. A p value under 0.05 (0.95 confidence level) rejects this null hypothesis, indicating that the given pair of operators are significantly different in their perturbative behavior. erefore, those operators are not grouped together. Operator pairs for which the p value is greater than 0.05 are classified in the same group. We applied the Holm procedure to adjust the p values. In our experiments, only two well-defined groups were detected by the post hoc tests. We defined the group including the top-ranking heuristics as the intensification group and the other group as the diversification group.

Iterated Local Search
Hyper-Heuristic. In order to test the performance of the previously defined intensification and diversification groups of heuristics, we follow the experimental setup in [10] with some adjustments detailed in this section. We also provide a description of the high-level strategy and adaptive operator selection mechanism used. e pool of heuristics is problem specific, and they will be described in the respective case study section.

High-Level Strategy.
We utilize iterated local search (ILS) as the high-level strategy [8,10,16]. Iterated local search is a simple yet effective strategy, which works by iteratively alternating between an exploration move (diversification) and an exploitation move (intensification) from the perturbed solution [7]. With this search strategy, it is straightforward to identify where to apply the intensification and diversification heuristics groups.
Our implementation is outlined in Algorithm 2. Two independent adaptive operator selection steps are used: one in the local search phase (lines 2 and 5) using the intensification group of heuristics and another in the perturbation phase (line 4) using the diversification group. is implementation differs from our previous ILS hyper-heuristic [9,10] in which adaptive operator selection is used on the two algorithm stages instead of only on the intensification stage. Line 6 must be set for maximization ( > ) or minimization ( < ) problems.

Adaptive Operator Selection.
Adaptive operator selection [17,18] allows high-level algorithms to autonomously select the next heuristic to apply to the incumbent solution. Two cooperating mechanisms are required: selection rule, which defines how to select the next operator or low-level heuristic from the pool according to their estimated qualities, and credit assignment, which defines how to estimate the operators' quality based on the impact brought by their recent application. e mechanisms we implemented are described in detail: Selection rule: we use dynamic multiarmed bandits (DMAB) [19] as the selection rule, where each operator is viewed as an arm. Let l i,t denote the number of times the i th arm (heuristic) has been played and r i,t the average reward it has received up to time t. At each time step t, from K alternative arms (heuristics), the algorithm selects the arm maximizing the quantity computed by the following expression: 4 Complexity Factor C is used to balance the inner exploration and exploitation phases. e DMAB algorithm uses a Page-Hinkley statistical test, where two parameters are introduced: c ph , which controls the trade-off between false alarms and unnoticed changes, and δ, which enforces the robustness when dealing with slowly varying environments. Parameters C and c ph need to be tuned for every problem. We found in preliminary experiments that the values C � 10 and c ph � 100 obtain encouraging results consistently. For the parameter δ, we used the value suggested in [18] (δ � 0.15) for all our experiments. Credit assignment: we use extreme value criteria for determining the operator's credit [17,18]. When a heuristic op is selected and applied to the current solution by the selection rule, it is necessary to calculate an update in the reward value with the most recent behavior information of the last applied heuristic. Rewards are updated as follows depending on the ILS phase (diversification and intensification). For the intensification phase, the fitness of the new solution is computed and the change in fitness Δ f is added to a FIFO list of size W. For the diversification phase, the Hamming distance between the new solution and the initial one is calculated and added to a FIFO list of size W. A separate list is kept for each operator for both phases. FIFO data structure is used to guarantee that only the latest observations in fitness improvement or Hamming distance are considered in operator selection computations for the last W iterations before being erased of credit assignment memory. We kept a list for each operator in order to identify which operator has achieved the best performance in the last W iterations.
ereafter, the specific operator reward is updated to the maximal value in the list. Formally, let t be the current step and metric(t) be the metric value (Δ f or Hamming) estimated at time t for a given heuristic op and the expected reward r t for heuristic op is computed using the following equation:

Course Timetabling Case Study
We first apply the proposed methodology to the course timetabling problem. is problem requires the assignment of a fixed number of subjects into a number of time slots. e objective is to obtain a timetable minimizing the number of conflicts. Our formulation uses a generic modeling approach where solutions are represented as vectors of integer numbers of length equal to the number of events (courses) [9]. As test instances, we use the 24 instances from the International timetabling competition (ITC) 2007 track 2 (postenrollment course timetabling) [20]. Many metaheuristic [21,22] and hyper-heuristic [9,10] approaches have been proposed for solving variants of educational timetabling. Recent surveys have also been published [4,23].
Here, we improve the performance of hyper-heuristics by incorporating our automated approach for categorizing lowlevel heuristics into intensification and diversification groups.

Low-Level Heuristics.
Our implementation considers the following set of five low-level heuristics [10]. ey range from a simple randomized exchange or swap neighborhoods to greedy and more informed procedures: MLC (move to less conflict): it locates the variable producing the most conflicts and changes its value to that causing the minimum possible conflict BSP (best single perturbation): it chooses a variable following a sequential order and changes its value to that producing the minimum conflict WMLC (worst move to less conflict): it locates the variable and value that once modified cause the less conflict MLS (move to less size): it changes the value of a given variable to that causing the event to move to the less occupied time slot Two points: it selects uniformly at random two indexes in the integer string representation and modifies all variables between the indexes randomly Table 1 shows the nonparametric ranking of the heuristics according to the genotypic and fitness distance metrics. e ranking indicates that closeness in genotypic space correlates with closeness in fitness space, a desirable property for heuristic search methods.

Hyper-Heuristic Performance Comparison.
e selected groups of heuristics are then deployed within the iterated local search hyper-heuristic framework described in Section (1) s * � s * ′ (8) end if (9) end while (10) return s * ALGORITHM 2: High-level strategy: iterated local search.

Complexity 5
3.3. Following the ITC-2007 competition rules, each hyperheuristic variant is run 10 times and the stopping condition corresponds to a time limit of about 10 minutes following the benchmark algorithm provided in the competition website (http://www.cs.qub.ac.uk/itc2007/). We compare our approach against the following methods: Cambazard: the winner of the ICT-2007 competition [24], a multistage local search algorithm considering several neighborhoods Ceschia: a single-step meta-heuristic approach based on simulated annealing, with a neighborhood composed of moves that reschedule one event or swap two events [21] AdapExAP: an adaptive iterated local search hyperheuristics coupled with an adaptive mechanism based on the adaptive pursuit selection rule [9] Goh: Iterative two-stage algorithm that uses tabu search and simulated annealing [25] Nagata: a local search-based algorithm with a mechanism for adapting the size of search neighborhood [26] HHADL: an iterated local search hyper-heuristic with Add-Delete list, which generates heuristics based on a fixed number of add and delete operations [27] HHDMAB: an iterated local search with dynamic multiarm bandits, which selects from a pool of heuristics using an autonomous strategy [10] e main difference between our proposal and other recent methodologies is the application of a categorization process to a predefined set of low-level heuristics. is categorization, detailed in Section 3, leads us to empirical construction of a reasonable good group of heuristics to use as intensification and diversification operators in a selection hyper-heuristic approach. Our selection hyper-heuristic has an empirically effective group of operators determined a priori of any exhaustive experimentation; this characteristic enhances the performance of our approach against other methodologies whose setting was made by the mere human expertise. Table 3 shows comparative results of our proposed approach HH2DMAB against state-of-the-art solvers. e evaluation of the best-found solutions is shown, with the average and standard deviation results reported in brackets in the form (s σ ), when available. e best solutions are given in bold font.
Configurations designed by human experts are represented by the other entries in Table 3. Our approach offers an automated operator grouping and selection of the heuristics with no expert knowledge required. Results for instances 3 and 23 are new best-known solutions found by our approach. Consistently, our iterated local search hyper-heuristic with automatic selection of heuristic groups, HH2DMAB, presents lower average and standard deviation values than previous approaches. We argue that this improved performance is because of having additional heuristics at the diversification stage, which gives the algorithm more alternatives to escape from local optima. Figures 2 and  3 show the dynamic of selection probabilities for the intensification group (a) and diversification group (b) of heuristics during a HH2DMAB run on a selected instance (instance 2). Figures 2 and 3 shows in X-axis the iterations of each run (1000x) and Y-axis shows the probability of selection of each heuristic (color line); the sum of selection probabilities for all heuristics is 1 at each iteration. In the intensification group, Figure 2, the heuristics WMLC and MLC are most frequently selected across the run than the third BSP heuristic and WMLC and MLC take dominance at different stages of the run. e diversification group dynamic, Figure 3, shows that the two heuristics are useful during the search, with heuristic MLS having prominence.

Vehicle Routing Case Sudy
In order to illustrate the generality of the proposed methodology, we considered a second case study, the vehicle  routing problem with time windows (VRPTW). In this problem, a set of customer demands must be addressed using as a few vehicles as possible. e time window constraint indicates that customer demands can only be served in a time window. e formulation and experimental setting follows the rules of the Cross-Domain Heuristic Search Competition (CHeSC) 2011 [28]. CHeSC instances were taken from [29] and include 5 from the Solomon data set and 5 from the Gehring and Homberger data set.

Low-Level Heuristics.
Our implementation uses the following four heuristics: TimeRR: it removes a number of locations based on time window proximity, reinserting into the best route possible TwoOptStar: it takes the end sections of two routes and swaps them to create two new routes locRR: it removes a number of locations based on location proximity, reinserting into the best route possible ShiftMutate: it moves a single location from one route to another

Ranking and Grouping of the Low-Level
Heuristics. e experimental design resembles that of the previous case study. Each sampling procedure is executed 500 times, using 200 function calls for every instance-heuristic pair. e time expended in this phase was about 37 minutes using an i7 Intel Core, 8 Gb in Ram, Linux Operating System and JAVA Language. Table 4 shows the collected metric rankings.
As Table 4 indicates, heuristics locRR and TimeRR are the top two according to all metrics. is suggests they are more suitable as intensification heuristics. Heuristics TwoOpt * and Shift are the bottom two in the rank for all metrics. Table 5 shows the post hoc tests for heuristic pairs as the null hypothesis establishes that between two heuristics (row and column), there is no significant difference in terms of perturbative behavior. A p value less than 0.05 means the null hypothesis is rejected. According to this evidence, the groups locRR, TimeRR { } (intensification) and TwoOpt * , Shift (diversification) are defined.

Hyper-Heuristic Performance Comparison.
e selected groups of heuristics are deployed within the iterated local search hyper-heuristic framework. In order to compare its performance, we consider the contestants in the Cross-Domain Heuristic Search Competition (CHeSC) 2011 (http://www.asap.cs.nott.ac.uk/external/chesc2011/results. html). Following the competition rules, 31 runs are conducted, each lasting 600 CPU seconds according to the benchmark tool provided by the competition webpage. e algorithms are ranked according to their median performance and receive points according to a system inspired by formula (1) in [28]. e top eight performing algorithms receive 10, 8, 6, 5, 4, 3, 2, and 1 point, respectively. In case of ties, the points of the concerned positions are summed and equally shared.
According to this scoring system, our method HH2DMAB achieves 28 points (Table 6), which represents a tie in the 2 nd position when compared against CHeSC contestants [30]. Moreover, HH2DMAB achieves better performance when compared with or previous state-of-theart HHDMAB [10] hyper-heuristic, which obtained the 3 rd position. Again, the evidence suggests that our current approach with more heuristics at the diversification stage allows the hyper-heuristic to achieve better results.
Our selection hyper-heuristic was enhanced by the application of a priori phase were the diversification and intensification operators were selected using the methodology detailed in Section 3. is allowed our approach to work with an empirically good-selected group of operators. Table 7 contrasts the best results obtained by HH2DMAB against the previous HHDMAB hyper-heuristic [10] and the bestknown results for each instance. Consistently, HH2DMAB outperforms the previous HHDMAB and produces results that are close to the best-known solutions. is is a good result since the best-known solutions are achieved by problemspecific algorithms. Our approach, instead, is a more general methodology that usually requires only changing the set of low-level heuristics to address other problem domains. is

Conclusions
We have proposed an empirical methodology to classify a set of operators into intensification and diversification heuristics, to be used within hyper-heuristic methods.
Starting from a suitable set of low-level heuristics or search operators for a given domain, the proposed methodology probes their performance using fitness landscape and distance metrics and ranks the heuristics through nonparametric statistics instead of human expertise. is contributes to increasing the methodological rigor and automation for deploying hyper-heuristic approaches. Our methodology was tested within a state-of-the-art hyperheuristic framework over two complex combinatorial optimization problems, namely, course timetabling and vehicle routing problem with time windows achieving new best-known solutions for ITC 2007 track 2 course timetabling instances and better results against previous studies in the case of CHeSC vehicle routing instances. Our results indicate improved hyper-heuristic performance on both domains when our methodology is used to empirically identify a group of intensification and diversification operators.
is suggests that well-designed hyper-heuristic methods are not only more general but can also be more effective than problem-specific meta-heuristics. In the future, we will investigate additional ranking methods and methodologies to automatically tune hyper-heuristic parameters. Finally, it is necessary to test this approach to other problem domains and heuristic pools.

Data Availability
e result data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.