A Methodology to Determine the Subset of Heuristics for Hyperheuristics through Metalearning for Solving Graph Coloring and Capacitated Vehicle Routing Problems

,


Introduction
In Computer Science, a heuristic is a technique designed to solve a problem when classical methods fail to find an exact solution or when they are too slow. Currently, there is great interest from the scientific community in offering ad hoc heuristic solutions for real-world optimization problems. To achieve this, it is necessary to have a priori knowledge of the problem and, often, computationally efficient solutions are produced in a reasonable time. However, the no free lunch theorem mentions [1] that no methodology or algorithm can solve all problems; that is, ad hoc heuristics are usually not generalizable, and they do not always work well when applied to other problems even when they share some similar characteristics. is fact has led research efforts towards the development of general-purpose search methodologies known as hyperheuristics, whose main characteristic is that they are independent of the problem domain.
Hyperheuristics can be classified according to their learning methods, such as no learning, online learning [2], and offline learning [3]. In the context of combinatorial optimization, hyperheuristics are defined as "heuristics to choose heuristics" [4], or as "an automated methodology for selecting or generation heuristics to solve computational search problems," [4]. According to Pillay in [2], the generality of a hyperheuristic can be seen from three levels: a generalization on instances of problems, generalization for a particular problem, and a generalization focused on different types of problems, the latter being high level. Some variations of hyperheuristics depend on the type of learning used (e.g., with online learning [2] and offline learning [3]) or the nature of the heuristics.
One of the main problems in hyperheuristics is to propose methodologies that allow generating and/or selecting the minimum set of heuristics that perform well for the problem at hand and this heuristic set is usually selected by expert researchers in the field [2]. In order to automatically select the best heuristic that performs best for the problem, an approach called metalearning was proposed [5,6] and its use in hyperheuristics can be found in Amaya et al. [7]. Likewise, the different meanings used and taxonomy for each interaction between metalearning and optimization were studied by Song et al. [8].
As there is no methodology or algorithm that can solve all problems, our objective is to base ourselves on information about the problem and the performance of the algorithms to provide this knowledge to hyperheuristics. Metalearning generates metaknowledge, and we use this to select better heuristics for solving problems. With this approach, we pretend to propose methodologies that are like the intelligence of humans. Humans could learn from problems, their characteristics, their variables, and their restrictions, and after elaborating an analysis or discernment, the "human expert" proposes the best tool and solves the problem.
In this paper, we propose a methodology to determine a subset of heuristics for hyperheuristics through metalearning and partition for solving different problems (described below) without ad hoc adjustments by providing information about the problem and the performance of the heuristics to the hyperheuristic. It is well known that the correct characterization is the key to selecting the best heuristic [6]. Consequently, this affects the hyperheuristic design, and that is why in our approach we decided to use offline learning. Metalearning consists of two basic parts, the metafeatures and the metalearner; the first are generated from the information of the problem and the solution algorithms, while the second uses a grouping technique. Our methodology extends beyond the classic metalearner approach, and we apply nonparametric statistical tests to determine which heuristics will provide the same performance if the full set of heuristics is applied.
In order to test our proposal, we used two different wellknown problem domains: capacitated vehicle routing problem and graph coloring problem. e capacitated vehicle routing problem (CVRP) has different restrictions such as minimizing distance, time, capacity, and delivery. is problem aims to find the subtour of n cities, without repeating two cities on the same tour or different tours. In the CVRP state of the art, there exist different variants that consider this basic definition and extra restrictions. On the other hand, the graph coloring problem consists of labeling each vertex of a given graph with k-colors and it is a wellknown problem, which has been solved by exact methods, heuristics, metaheuristics, and hyperheuristics. Although each problem can be solved by ad hoc heuristics, to date, there is no general methodology capable of solving all variants of both problems. e use of partitions for constraint satisfaction problems, such as university timetabling and VRP, has had good results [9,10]. Although there are taxonomies of the said problems, characterizing and classifying the instances under a hyperheuristic context with metalearning with a statistical test is an approach that has not been explored in the literature.
Finally, it is worth mentioning that to do a better design or choice of hyperheuristics and predict which is the best algorithm according to the classification of the previous instance, we propose to use metalearning with statistical analysis of the heuristics, which will allow improving these points. Our proposal also provides information that allows us to understand the performance of heuristics and hyperheuristics in the problem of interest. e remaining content of this article contains a description of related work in Section 2, which covers a review on heuristics, hyperheuristics, and metalearning. Problem definitions and theories related to heuristics are reported in Section 3. Sections 4 and 5 present the proposed methodology. e found results and findings including performance comparison are described in Section 6. Finally, concluding remarks are presented in Section 7.

Related Work
In this section, we will give a view of the heuristic and hyperheuristic algorithms. Moreover, we will define some basic concepts of metalearning. Finally, we will give a discussion of the pros and cons of the presented methods.

Heuristics.
We made an extensive review of different heuristics applied to CVRP and GCP. We selected a total of 11 heuristics that, after a previous experimental analysis, those heuristics apply to both problems and we list them below.
K-flip or K-opt heuristic was proposed by Lin and Kernighan in [11] for the travel salesman problem (TSP).
is heuristic was based on the general interchange transformation, i.e., a city must change its position with another city on the same tour. Besides, this heuristic is one of the most popular for TSP [12]; it has been applied in other problems such as planar graphs, unconstrained binary quadratic programming, and the study of its complexity in SAT and MAX-SAT. e two-point perturbation is a case of k-flip, and we give a detailed description and algorithm of these heuristics in the following sections. e k-swap heuristic is similar and frequently confused with k-flip. e K-swap heuristic improves its performance as a perturbation move when it uses two or three movements [13]. e move to less conflict heuristic, also known as minimizing conflicts, was proposed by Minton et al. [14]. e minimizing conflict heuristic has been applied to different areas in Computer Science such as hyperheuristics, graph coloring problems, pickup-and-delivery problems, and scheduling problems. e move to less conflict heuristic is a variant of the first fit, and the only difference is that the first one takes a random variable and changes its value for another that generates the least cost. e first-fit heuristic was proved by Baker [15] for the bin-packing problem. On the other hand, in recent decades, this heuristic was applied to best-known problems such as bin packing, virtual machine relocation problem, and cutting stock. A remarkable variety of heuristics is worst fit which was studied by Baker [15] and Csirik [16], in particular, its application to bin-packing problem.
Soria-Alcaraz et al. [17] proposed three heuristics for university course timetabling best single perturbation (BSP), static-dynamic perturbation (SDP), and double dynamic perturbation as part of the pool low-level heuristics for hyperheuristic. Moreover, these heuristics were applied to the VRP in later research [18].

Hyperheuristics.
We focused on offline learning hyperheuristics selection with perturbation heuristics, whose aim is to gather knowledge in the form of rules or programs, from training set instances. Usually, the offline selection hyperheuristics belong to machine learning methods, which are trained to create a tuned methodology for a problem domain [3]. Yates and Keedwell [19] demonstrated that subsequences of heuristics were found in the offline learning database that is effective for some problem domains.
ey used the Elman network to compute sequences of heuristics which were evaluated on unseen HyFlex example problems, and the results obtained are capable of intradomain learning and generalization with 99% confidence.
One of the crucial issues in hyperheuristics design is the quality and size of the heuristic pool [20]. Soria-Alcaraz et al. [20] proposed a methodology using nonparametric statistics and fitness landscape measurements for hyperheuristics design. is methodology was tested on course timetabling and vehicle routing problems; their hyperheuristic proposal had a compact heuristic pool and competed with some traditional methods in course timetabling. In the course timetabling problem, they obtained five best-known solutions of 24 PATAT instances [21]. Finally, a recent report by Amaya et al. [7] documented a model for creating selection hyperheuristics with constructive heuristics. e effectiveness of the model proposed by Amaya depends on the delta's values used, which is useful with higher deltas.

Metalearning.
e importance of metalearning, machine learning, and optimization has been studied by Song et al. [8]. e metalearning aim may concern accumulating and adapting experiences on the performance of multiple applications of a learning system. e metalearning field is also known as "learning to learn" [22] and it brings systems that can help by searching patterns across different tasks to control the process of exploiting cumulative expertise. e metalearning concept has been present in the field of heuristics and metaheuristics for TSP [23], the quadratic assignment problem, and hyperheuristics.
On the other hand, Gutierrez-Rodríguez et al. [23] used VRP with time windows and proposed a methodology based on metalearning to select the best metaheuristic for each instance. Besides, their proposal shared and exploited an offline scheme for the instant solutions of academies and industry. eir main contributions were to propose a set of features for characterizing VRPTW instances and design a classification process that predicts the most suitable metaheuristic for each instance. Nevertheless, they assumed that the solutions of the set instances could be stored, shared, and exploited in an offline scheme for predicting good solvers for new unseen instances. e aim of this paper is not to present a survey on heuristics or hyperheuristics; our proposal is slightly different. Our proposal considers some vital aspects of the research, including the ones from Yates and Keedwell [19]. We took the offline hyperheuristic approach from Soria-Alcaraz et al. [20] and the statistical approach to selecting a pool heuristic from Kanda et al. [5]. e offline hyperheuristic approach is an effective and popular method in the machine learning area [8]. On the other hand, the statistical approach to selecting a pool heuristic is a useful and reliable method because it takes statistical information from the input data.

Combinatorial Problems
Our methodology is a general approach to competitive performance across several classes of problems. us, we used two problem domains: graph coloring and vehicle routing problems. In the following sections, we will review the formal definition of each of these problems as well as their benchmark instances.

Graph Coloring Problem.
e graph coloring problem demonstration as an NP-hard problem was proposed by Karp [24]. According to [25], a formal vertex-coloring problem of a graph G � (V, E) is a function c: V ⟶ N, in which any two incident vertices u, v ∈ V are assigned different colors, that is, , and E is a finite set of unordered pairs of vertices named edges, where the function c is the coloring function and a graph G for which there exists a vertex-coloring which requires k colors is called k-colorable. e coloring function induces a partition of the graph G into independent subsets V 1 , e benchmark instances can be found in http://mat.gsia.cmu. edu/COLOR/instances.htmlmat.gsia.cmu.edu. e above lets the partitioning methodology work on the input design and it is possible to avoid the ad hoc modifications to the heuristics since it will only pass a different objective function that adequately evaluates the instances of this problem.

Capacitated Vehicle Routing Problem.
e capacitated vehicle routing problem (CVRP) is a variant of VRP [26]. In this problem, we have an undirected graph G, m vehicles, Q capacity, and a set of cities C � c 0 , c 1 , . . . , c n . Formally, the city c 0 is the depot and each vehicle must visit these cities starting from the depot and coming back to this. Alba and Dorronsoro [27] define a distance or travel time matrix D � (d i,j ) between cities c i and c j . Each city c i has a demand of things q i . We denote it as a route R � r → 0 , r → 1 , . . . , r → m , and r → i is a permutation of the cities, starting and finishing at the depot c 0 . For each route, r → i ∩ r → j � c 0 with i ≠ j. e cost of a problem solution is the sum of the costs of each route of R as Complexity 3 where k is the total of vehicles. is problem aims to determine for each vehicle the lowest cost (see equations (1) and (2)) tour or distance or travel time, considering the max capacity. Nota bene the hard constraints are the capacity of each vehicle and two vehicles cannot visit the same city. e CVRP has several constraints and a specific formal definition of this problem. ese two characteristics let us apply our methodology with a design cities partition, where each vehicle is related to one part. As heuristics work with solutions that are already complete and respect important restrictions such as capacity, if any movement violates or exceeds this capacity, that solution is penalized in the objective function.

Methodology
For our methodology, it is important to know since the constraint modeling phase of the problem can be solved by partitions. e API-Carpio methodology and the methodology proposed by Soria-Alcaraz et al. [28] let us transform the instances of the problem with their restrictions, into inputs to apply the proposed methodology. e MMA matrix that is generated by applying the API-Carpio methodology lets us visualize the hard restrictions of the problem and evaluate the costs of visiting cities or nodes. For the soft constraints of the problem, the methodology proposed by Soria-Alcaraz et al. [28] is to be considered in the list of restrictions.
In this section, we describe the methodology to model the input data problem information used for the experimentation for two combinatorial problems based on the API-Carpio methodology. We integrated the API-Carpio methodology [29] and the methodology of the design proposed by Soria-Alcaraz et al. [28].

API-Carpio Methodology.
is methodology is used to solve the university course timetabling problem and it considers three factors: students, teachers, and institutions (infrastructure). e methodology uses several structures for the equations previously described. One of the most important structures of this work is MMA.
is matrix is constructed with information on the cities or nodes. For graph coloring, we use the information of the adjacency matrix, while for CVRP it is considered the cost matrix. Table 1 shows an example of an MMA matrix. e algorithm to construct this matrix is given in [29].

Methodology of Design.
is methodology was extended from the proposal by Carpio [29] and their formal definition was proposed by Soria-Alcaraz et al. [28]. e methodology of design by Soria allows us to consider the objectives of course timetabling and to satisfy the different restrictions, by converting these to lists of time and space restrictions it is seeking to minimize student conflict.
To use this methodology, two structures are used to consider restrictions and variables: MMA matrix and LPH.
e LPH has information about the possible restrictions that can be assigned to each node or city. An example of this list can be found in Table 2. e list shows in each row the number part, i.e., the node N 2 can be assigned in parts 1, 2, 4, 5, 7, or 8, but not in 6. e algorithm for generating artificial instances of LPH can be found in Ortiz-Aguilar [30].

Metalearning for Selecting a Subset of Heuristics for Hyperheuristics
According to Brazdil et al. [22], the approach of metalearning (ML) is to help the selection of an algorithm for a set of instances with metadata. According to Alpaydin [31], the metalearning aim is to find the best classifier for a set of data and to find the best classifier for the characterization when the data are considered. In our proposal, the data are associated with the problem instances of different problems, and the classifier is associated with the set of heuristics. Our objective is to be able to select the best set of heuristics for a hyperheuristic in a set of instances. In Figure 1, we show a diagram of the metalearning processes to obtain metaknowledge for the selection of heuristics (diagram modified from Brazdil and Giraud-Carrier [32] to our methodology).
In this work, we use metalearning to select a set of heuristics for hyperheuristics in a dataset. We named the set of characterized instances of the two problem domains as the "metacharacteristics" and the model that maps each instance to the corresponding group of heuristics for hyperheuristics the "metalearner." In this case, the metalearner selected is the K-means algorithm. e methodology proposed in this article consists of 5 steps in the metalearning stage: (i) Step 1: obtain the set of instances to be worked on.
In this case, we have as a criterion to select those instances that are susceptible to being resolved by partitions. (ii) Step 2: evaluation and extraction of characteristics of the instances. In this step, the characteristics of the heuristics and the instances are generated.
Heuristics that apply to both problems are selected, this task becomes simple with the use of the partitioning methodology, and this is because it allows working always with generic inputs where the variables and restrictions are modeled. Later, heuristics work with these generic inputs and solutions that only have a fitness function corresponding to their problem (where the objectives are evaluated). Step 3: generation of metacharacteristics. Based on the characteristics of each problem and the performance of the heuristics applied to all instances, we generate vectors of characteristics that will be our metadata. (iv) Step 4: metalearning and the recommended model of heuristics. In the state of the art, research is limited to applying only a clustering technique for the recommendation of the algorithm model. We propose to incorporate a statistical analysis together with the clustering algorithm to improve the design of the basic subset of heuristics.

Problem Definition of Metalearning for Heuristic Selection.
Consider a problem P i that belongs to the problem set (GCP). Let H i � h i,1 , h i,2 , · · · , h i,n i be a subset of low-level heuristics, which are used in the state of the art, to solve the problem P i . We denote as RS H i a random selection of heuristics H i , to be applied in the solution of P i , where where λ represents an empty string. We defined recursively V n+1 � wh i,m |w ∈ V n ; h i,m ∈ H i where n ≥ 0 and m ∈ 1, 2, 3, · · · , n i . en V n represents the set of all strings of length n ∈ Z + � 0, 1, 2, 3, · · · { }, formed from the symbols in H i . So Kleene's closure from H i is [33] When V 0 is omitted at the junction, we get the Kleene Plus closure H + i : In other words, H + i is the collection of all possible nonempty strings of finite length generated from the symbols in H i .
Let HH be a heuristic selection hyperheuristic with offline training, where the training considers the set H i . After training, the HH provides a methodology M H i ∈ H + i , with the best order of application of low-level heuristics, which we will denote by BM H i ∈ H + i , in the solution of the problem P i , which improves the performance of the ap- We take two problems P i and P j that belong to the problem set (GCP) that comply with level 3 of generality proposed by Pillay [2] and are susceptible to being solved by partitions. Let H i � h i,1 , h i,2 , · · · , h i,n i and H j � h j,1 , h j,2 , · · · , h j,n j } be subsets of low-level heuristics, which are used in the state of art, to solve the problems P i and P j , and UH = H i ⋃ H j . We denote as RS UH a random selection of the heuristics UH, to be applied in the solution of P i and P j , where |RS UH | � |UH|, with RSUH to a reduced set of heuristics, that is, RSUH⊆UH.
Heuristic selection hyperheuristic is denoted as HH, with offline training, where the training considers the UH set. After training, the HH provides a methodology M UH ∈ UH + , with the best order of application of low-level heuristics, which we will denote by BM UH ∈ UH + in solving the problems P i and P j which improve the performance of the application of a simple RS UH , where |BM UH | ≥ |UH + |.
Our objective is to propose a methodology that provides the HH, with a reduced subset RSUH⊆UH for its training, such that the HH provides a methodology M RSUH ∈ RSU H + , with the best order of application of the reduced set of heuristics RSUH, which we will denote by BM RSUH ∈ RSU H + , in solving the problems P i and P j respectively, which equal the performance than with the application of the methodology M UH ∈ UH + where |BM RSUH | ≥ |RSUH|.
To solve the problem, the independent application of each of the heuristics of H i and H j was proposed, measuring their performance in solving the problems P i and P j . Apply statistical tests to contrast the performance of the independent heuristics and thereby discriminate from each set H i and H j those heuristics that obtained the lowest performance. Step 1 Step 3 Step 2 Step 4

Complexity
Next, we will focus on describing the stage of extracting characteristics from heuristics and instances. Our methodology improves the metalearning stage (step 4) with the application of nonparametric statistical tests to determine which heuristics are the ones that will provide the same performance if the full set of heuristics will be applied. is means that, if there are heuristics that are redundant, it is possible to leave them out and consider only those that enhance the speed of the search for solutions to the problem. e metalearning process proposed in this work to select the pool of heuristics includes the following steps: (1) e problems will be the source of information for the basic features. (2) Given a set of instances denoted as I, for each of the instances I i apply a number k of times the heuristic H j . e results will be the inner features. For the CVRP, a greedy heuristic is used, which will allow us to build feasible solutions to the problem. For graph coloring, we will initialize with a random construction heuristic. e next step is to apply the heuristics. It is possible after this step that the instances can be solved to the best solution due to their complexity. is means that it is possible to avoid executing a complete and expensive computation process when solving problems with the application of a simple heuristic.
(3) With the information obtained from points 1 and 2, feature vectors will be formed that will be our metadata. (4) For a better treatment of the metadata from the previous step, the following steps are carried out: (a) e patterns that will be used in the k-means will be scaled with the following formula [34]: where x ∈ R are the values of the original variables (features). (b) Generate the pattern based on the inner and basic features per instance. e basic features will be the problem information and the inner features will be the fitness obtained by the heuristics applied k times to the problem.
(5) Pass the feature vectors to the clustering algorithm to form classes. According to Brazdil and Giraud-Carrier [32], the k-means is a simple learning method, which we apply to carry out the grouping of instances in classes. To determine which are the number of classes to form, we use Sturges rule, since, with data that are the potency of 2, it is approximated in a good way. Determine the subclasses with Sturges rule [35] with where T is the total amount of data.
As the distance metric for the K-means, we use the Mahalanobis distance; this distance has properties such as being invariant to scale by nonsingular linear transformations. An in-depth study of different metrics [36,37] will be a specific job to investigate whether it can improve the performance of the proposed methodology. (6) Label each pattern according to the group number in which each pattern (instance) was classified. (7) Apply again the three statistical tests to the results of heuristics per problem, according to the formulas in [38]. e test ranks 1 to the best performing heuristic, 2 to the second-best, and n to the worstperforming heuristic. From these tests, we will take the range of the heuristics and the range will now be considered as inner features. With this information and the class label, they will now form patterns. (8) Determine a cutoff point for each class based on the range, and in this case, it will be the average of the minimum and maximum range. Choose those heuristics that pass the cutoff criterion to be part of the minimum set. (9) e output will be the minimum set of heuristics per class.
is process is shown in a specific way in Figure 2. e two important aspects of metalearning in our work are heuristics and metacharacteristics, which are low-level heuristics and metafeatures.

Low-Level
Heuristics. An important part of the hyperheuristic approach is the selection of the heuristic set. is article proposes to extract information from heuristics and problems to generate the metafeatures [32].
is lets us improve the design and testing of the hyperheuristic algorithm. e goal in this stage is to generate metafeatures in which the heuristics can have a better performance individually for all problem instances. is improves the next part in which the hyperheuristic must choose the sequence application for each heuristic and it uses a minimal pool of heuristics which is a fundamental part of it [2]. For all instances, it will be applied k times for each heuristic. e heuristics were applied to the two problems and their respective instances were as follows: e heuristic changes the value of one or more variables (in some cases k) to another feasible value. e GCP aim is to change the color of a certain node to another [39]. Finally, for CVRP, the movement implies changing a city to another specific vehicle [12]. (

2) K-Swap/Kempe Chain Neighborhood/S-Chain
Neighborhood (H 2 ). It must be selected two or more varieties and then interchange their values among them when possible; otherwise, the change is not made. We exchange the color between nodes previously selected by GCP. is heuristic is using in 6 Complexity works related to TSP or CVRP, also called k-interchange [40,41].
. is heuristic chooses a variable according to the list of hard restrictions (LPH) and changes its value. is exchange produces a better cost or in the worst case, the same cost [17]. Next time this heuristic is going to apply, the next variable will be chosen according to the next position of the last variable which was modified. e next node which must change color will be selected according to the last variable chosen for the graph coloring problem. e CVRP must be changing the city of the vehicle to another vehicle. (4) Static-Dynamic Perturbation (SDP) (H 4 ). It is also known as statically dynamic perturbation (SDP). It is based on the variable selection with a probability distribution of the frequency in the last k iterations. is heuristic chooses a variable and changes its value randomly [17]. e variables with fewer changes will have a higher probability to be selected. Applied to GCP, it would be a node with fewer color changes, and for the CVRP, the city has moved a few times to another vehicle. (5) Two Points Perturbation (2pp) (H 5 ). It is also known as k-opt, and it is a particular case of the K-swap with a value of k � 2.
is heuristic is based on the SDP, this receives a solution, and it modifies the value of a variable concerning a probability distribution. e difference is that a copy of the initial solution is kept and, in the end, the best of the two solutions is returned [17].
. is selects a random variable, and it assigns to a part of the value which generates the least cost [18]. In GCP, the color changes according to another which improves the fitness, and in CVRP, the city is moved to another vehicle where the total distance of the route is minimized.
. e heuristic selects a random variable, and it assigns to a part which generates the cheapest cost [18]. In GCP, the heuristic must change color from the selected node to another, which improves the result. For CVRP, the selected city with a lower cost in which it must minimize the total distance of the route.
(9) First-Fit (H 9 ). It changes the value of a variable to another, which is the least repeated in other variables [18], i.e., in CVRP, the heuristic will take a city and it will change it to the vehicle that has fewer cities in its route. For the GCP, it will select a node and it will assign the color that is least repeated. (10) Worse Fit (H 10 ). It assigns the most repeated value if possible, without violating the hard constraints on a randomly selected variable [42]. For GCP and CVRP, we assign a node or city to the most repeated timeslot, color, or vehicle.
is heuristic was proposed by Abdullah et al. [43], in which it chooses a variable applying Fail-First or Brelaz Heuristic [44] and its value changes according to the one that has obtained better performance by applying the following algorithms: minimum conflict, random selection, sequential selection, and least constrained.

Metafeatures.
e description and generation of characteristics permit differentiation into at least two groups of instances within the same problem class. We used the terms of basic feature and inter feature based on the proposal conducted by Gutierrez-Rodríguez et al. [23]. As basic features, these are given by the problem, e.g., the number of nodes, colors, vehicles, and so on, depending on each problem information. For both classes of problems, the number of different basic features is summarized in Table 3.
e fitness performance values of all heuristics are the inner feature key. Finally, the pattern per instance is basic feature + inner feature. e final pattern is shown in Table 4. For example, instance 1 has a pattern (3, 3, 8, 50, 3, 2, 1, 4), and the number of inter feature is according to the pool heuristics (eight features for the given example).

Methodology for Determining a Subset of Heuristics
In this section, we propose a new approach for selecting and determining a subset of heuristics to solve GCP and CVRP instances. We describe our methodology in the next steps and the graphical representation of our methodology is shown in Figure 3. Complexity objectives. To model the GCP, the values in the MMA matrix represent the weights of the edges of nodes. If there is a zero in a certain position (x, y) in the matrix, this represents no connection between those nodes. For graph coloring, each node is colored considering that the adjacent nodes do not have the same color. CVRP is aligned with our methodology due to its aim seeking to get subroutes in which the tour cost (subgroup) must be the minimum or the cheapest. (2) Problem's Restriction Modeling. In both problems, we must design a partition of nodes or cities. First, it is necessary to model the restrictions for each variable in an LPH, e.g., a node cannot be colored by a specific color or a restricted city for a tour. en, it must design the MMA which represents the edge or connection weight between nodes or cities. In GCP, the adjacency matrix corresponds to MMA, and in CVRP, the MMA matrix will be the matrix that has the distances of the node to node. For GC, our LPH is constructed based on the number of colors in which the nodes can be labeled. In case the problem restricts colors to five, the list will be like the one shown in Table 2. Similarly, this list will be built for the CVRP, where the number of vehicles is the number of parts that should be represented on the list (see Table 2). For the problems used in this work, it was not necessary to elaborate additional structures for soft restrictions. Besides, for an extensive review and how to model additional restrictions, the research proposed by Ortiz [10] details all possible cases and different features. (3) Apply the metalearning process described in Section 5.1 (4) Separate the patterns (step 6) into training and test sets to proceed to the classification phase. It is important to consider at least one pattern of each class in the test set. (5) Use the classifier on the training set to make necessary adjustments to it. After describing and getting all pattern characteristics per instance, the next step is training and testing all instances by a classifier. For our approach, we prefer to use a simple classifier as Bayesian because our objective was not to compare the performance between classification algorithms or to design ad hoc classifiers for our research. e NBC simplifies learning by assuming per class that all features are independent [45]. In our methodology, we assume that each heuristic performance is independent because we applied each heuristic to independent experiments. In the previous stage, each experiment must be run with only one heuristic, and thus, we did not apply two or more heuristics at a time. Finally, all features in the created dataset instances were normalized before applying the classifier. (6) Finally, the set of test instances will use the classifier to assign a "class" and solve it with its corresponding set of heuristics.

Designing and Testing the Hyperheuristic Offline Learning with K-Folds.
To choose the minimal set of heuristics and design the hyperheuristic for each class in more detail, our methodology considered the hyperheuristics with offline training as it has demonstrated good results for constraint satisfaction problems in terms of generality solution [3]. A random constructive heuristic was used to generate solutions to our problem of GCP, and for CVRP, a greedy algorithm was used. A selection hyperheuristic algorithm has three components: the pool of operators (low-level heuristics), a high-level search strategy, and a control mechanism to select the operator, which will be applied at each search step.

High-Level Search
Strategy. e iterated local search algorithm was used as a high-level search strategy. is metaheuristic was proposed by Lourenço et al. [46] and it is constructing a sequence of solutions generated by an embedded heuristic. e generated solutions could be better if they were only constructed randomly. e essence of this algorithm is to intensify an initial solution, exploring neighboring solutions to it. e algorithm is shown in Algorithm 1, which was taken from El-Ghazali [47]. In the field of hyperheuristics with offline learning, it refers to the fact that the high-level search strategy searches for a methodology (a sequence of heuristics) that solves a set of instances and then applies it to a given set of instances, in contrast to online learning, which refers to the construction of a given sequence of heuristics as the instances are presented.

Selection Operator.
In the perturbation phase (step 4 in Algorithm 2), it is necessary to choose a variable following a probability distribution based on the frequency of variable selection in the last k iterations. is simple heuristic allows

Experimental Results
is section describes our experiments in detail for graph coloring and CVRP benchmarks used in this paper. We give the configuration for the implementation of the iterated local search hyperheuristic. Finally, we described the statistical tests that we used to compare our results with the experimental methodology.
Our approach was implemented in JAVA language with JDK 1.8 using the IDE NetBeans IDE 8.2. e experiments were executed on a computer with processor Intel i7-7700U, 2.6 GHz, 16 GB DDR3 RAM, and operating system Windows 10 Home. e tests presented in this work were executed in a common notebook, with a single processor; it is showing the effectiveness of the exposed methodology.
For each heuristic, a limit of 100,000 function calls was given in each test run for all instances. We applied the Shapiro-Wilks test to check if the data results were normal or not, hence choosing a better representative (average or medium). If the data behavior is according to a normal distribution, the average was taken as representative and otherwise the median.

Heuristics Results for Graph Coloring and CVRP
7.1.1. Graph Coloring. We used the benchmark proposed for the second DIMACS challenge on graph coloring [48] and this is tested with 41 runs. In Tables 5 and 6, we show our results. We denote the best results with a bold face, and only the myciel2 instance was solved with the application of the individual heuristics in their optimum.

Capacitated Vehicle Routing Problem (CVRP).
ree sets of the state of the art were used and tested on 41 runs: In Table 9, we show the fitness values for the instances and the lowest city cost tour is indicated in bold, where n is the number of nodes, Q is the capacity of each vehicle, and k is the number of vehicles (colors in the case of graph coloring). e time of each run is reported in Table 10.
We applied the same procedure to the statistical tests of Friedman (FT), Alienated Friedman (AFT), and Quade (Qt) to distinguish the behavior of the heuristics set. We established α � 0.05 and h 0 as there are no differences between the performance of the heuristics and established h a as there are differences between the performance of the heuristics. Table 11 shows the ranks obtained in the three statistical tests.
In this case, the heuristic H 5 has the lowest rank for the tests and H 6 has the second-lowest rank for QT and FT.

Selection of Features and Classes by Statistical Tests.
According to the steps mentioned in Section 5.1, we must determine first the number of clusters or classes to split all our test instances. In this case, T � 139 and c � 8.
We considered 8 classes and used k-means clusters and we expected uniformly distributed instances in the clusters. e k-means algorithm was applied with a maximum number of Iterations � 500, initially random starting points.
To consider the uniform distribution of classes into clusters, we used the Manhattan distance obtained after the experimental work, with the best results. Table 12 contains the class details, number of instances per cluster/class, number of GCP (3rd column), or CVRP (4th column) per class, min and max nodes, and min and max number of colors nodes. In this experimentation, clusters 1, 5, 6, and 7 have only GCP instances, clusters 3, 4, and 8 have CVRP instances, and only cluster 2 has both problem domains.

Training and Test Classifiers for the Instance's Classes.
After the heuristic pool design phase for the hyperheuristics, we split our dataset into training and test. e training dataset was created by 125 instances with 15 features (basic + inner) and the unseen instances were made by 15 instances. e results of the classification with Naive Bayes are reported in Table 13. Table 14 contains the confusion matrix of the process classification. We observed that, for some classes like 3, 4, 7, and 8, the patterns were classified correctly. e rest of the classes have some patterns classified incorrectly, but, e.g., for the 3 patterns of class 1 classified into 5 and 6, we used the same pool of low-level heuristics and this does not represent an issue for the next step.
e hyperheuristic configuration was 10 iterations for local search and 100,000 function calls. For some GCP instances, we got the optimum number of colors (denoted in bold in Table 16). Besides, for the instances A-n45-k6, A-n55-k9, A-n62-k8, A-n64-k9, CMT13, GWKC1, GWKC2, GWKC3, and X-n148-k46, we get values near to the optimal with a maximum of 20% of the distance.

Classification of the Test Instances and Application of the
Hyperheuristic to the Corresponding Instance. Finally, for the 14 unseen instances, we used the Naive Bayes classifier, and it determines the class for these instances. Later, we applied the hyperheuristic with the corresponding pool heuristics according to the previous design and we obtained the results shown in Tables 17 and 18.  Table 17 shows the confusion matrix, TP rate, FP rate, and precision of the classification test. In these results, two patterns that belong to class five were classified incorrectly, but this does not affect the hyperheuristic solution because this class shares the same heuristics with class 6. 7.6. Statistical Comparison of Results. Finally, to compare if there are differences between the results applying the methodology and without applying the methodology, an experiment was carried out where the hyperheuristics were executed 33 times with the entire set of heuristics and 100,000 function calls. e results are shown in Table 19.
First, the statistical distributions followed by each set of results by class were analyzed, that is, the Shapiro-Wilks test [53] was applied to determine if the results of the methodology, hyperheuristic without methodology (HHPC), and the optimal state of the art followed a normal distribution.     Table 20, and the data shown in Tables 16, 18, and 19 were taken for the tests. It should be noted that the results of the methodology only for clusters 4 and 5 were normal, the results of the optimal state of the art only for only cluster 4 were normal, and the results of HHPC for clusters 1, 4, 5, and 7 were normal.
Student' t-tests for methodology and state of the art were applied. For the state of the art, methodology, and HHPC, we established ∝ � 0.05 as a level of significance. e null and alternative hypotheses for methodology and HHPC are as follows: (i) h 0 : there are no differences between the performance of hyperheuristic with the methodology and without the methodology (ii) h a : there are differences between the performance of hyperheuristics with the methodology and without the methodology For methodology and state of the art, (i) h 0 : there are no differences between the performance of hyperheuristics and the optimal state of the art (ii) h a : there are differences between the performance of hyperheuristics e statistical results of the tests are shown in Table 21. With these values, we can observe the following: (i) Methodology and HHPC. It can be inferred that the results of the methodology are significantly different from those of hyperheuristics with the whole set of heuristics. is means that the methodology improved performance and allowed limiting the set of heuristics for each of the clusters. (ii) Methodology and State of the Art. It can be inferred that no statistical evidence was found that the results of the methodology differ from the optimal ones of the state of the art, except in clusters 5 and 7. is is because it is where there are more atypical data or that they were badly classified which opens an area of opportunity for the refinement of the methodology.  Average Opt  108 3 13,221  4,905  23,107 10,103 39 6  37  68  126  24  95 8  1,436  727  2,024  866  109 3 14,390  7,055  21,411 11,635 40 6  35  69  124  24 102 8  6,023  3,021  8,874  5,624  131 3 50,634  9,126  68,656 43,448 42 6  127  137  332  24 111 8  1,056  1,172  2,201  736  132 3

Conclusion
In this work, a methodology is proposed to select low-level heuristics for a hyperheuristic approach to offline learning oriented to the solution of instances of different constraint satisfaction problems. e proposal was applied to two different problems well known and studied in the state of the art, which were the coloring of graphs and the vehicle routing problem with a specific capacity, GCP and CVRP, respectively. e methodology is focused on optimizing the number of heuristics that can be applied to different constraint satisfaction problems in a hyperheuristic approach. Information on the performance of an original set of heuristics for the instances of the problem is obtained from the different problems. e performance information is used to generate characteristic vectors for each instance, which is used to generate equivalence classes of instances of the problem.
e grouping in classes allows to identify the heuristics that apply to each class and from that information, a reduction of the number of heuristics necessary to obtain good solutions in the instance of each class is made and to reduce the total number of heuristics that can be applied in the hyperheuristic approach to solving the problems involved.
In the application to the GCP and CVRP, the information on the performance of the heuristics was obtained through a metalearning process, and this information was used to obtain the basic and internal characteristics of the instances. e instances were grouped into 5 classes using the k-means algorithm with the Mahalanobis metric. For each class, the sets of heuristics that could be applied to all their instances were identified, and through a process of hierarchization and cutoff criteria, the number of heuristics per class was reduced.
For training and testing, the Naive Bayes classifier and information on the characteristics of the instances were used. e experimental results show that the hyperheuristic in each class could efficiently solve each instance, and the classifier was able to predict the class for each problem instance.
e identification and reduction of heuristics to find the solution of complex problems is an optimization strategy that can do the search for solutions to problems of satisfaction of restrictions efficiently. e methodology presented allows generating a framework with a level of generality that can be trained to solve different problems of satisfaction of constraints simultaneously under the hyperheuristic approach. Once trained, it can allow finding good solutions to different problems with a common base of heuristics for instances of problems grouped by the efficiency of the solution heuristics.
Finally, the methodology makes it possible to improve the search for solutions to sets of problems by exploring the diversification of some of its components such as classification algorithms, metrics, heuristics, and selection criteria, which may be different for sets of different problems. A study of these possibilities is proposed as future work.

Conflicts of Interest
e authors declare that they have no conflicts of interest.