Evolutionary Hyperheuristics for Location-Routing Problemwith Simultaneous Pickup and Delivery

+is paper presents an evolution-based hyperheuristic (EHH) for addressing the capacitated location-routing problem (CLRP) and one of its more practicable variants, namely, CLRP with simultaneous pickup and delivery (CLRPSPD), which are significant and NP-hard model in the complex logistics system. +e proposed approaches manage a pool of low-level heuristics (LLH), implementing a set of simple, cheap, and knowledge-poor operators such as “shift” and “swap” to guide the search. Quantum (QS), ant (AS), and particle-inspired (PS) high-level learning strategies (HLH) are developed as evolutionary selection strategies (ESs) to improve the performance of the hyperheuristic framework. Meanwhile, random permutation (RP), tabu search (TS), and fitness rate rank-based multiarmed bandit (FRR-MAB) are also introduced as baselines for comparisons. We evaluated pairings of nine different selection strategies and four acceptance mechanisms and monitored the performance of the first four outstanding pairs in 36 pairs by solving three sets of benchmark instances from the literature. Experimental results show that the proposed approaches outperform most fine-tuned bespoke state-of-the-art approaches in the literature, and PS-AM and AS-AM perform better when compared to the rest of the pairs in terms of obtaining a good trade-off of solution quality and computing time.


Introduction
As one of the complex systems, logistics network design brings plenty of benefits in terms of economy, society, and environment. Recent years have witnessed great success in applying various tools for optimizing and managing complex logistics system to drive profit and service quality of freight transport [1]. One of such tools is location-routing problem (LRP) [2], integrating two types of decisions: facility location problem and vehicle routing problem (VRP) [3]. Among various extensions to the basic LRP, CLRP and CLRPSPD have been proposed in order to represent different features of practical problems in the complex logistics systems. Meanwhile, an effective solution method should be developed for providing competitive solutions for benchmark instances within reasonable computing time for both CLRP and CLRPSPD. Hence, this paper presents a novel approach to solve both problems.
Both CLRP and CLRPSPD are NP-hard, and they are more complicated and time-consuming to solve, especially CLRPSPD. erefore, it is unlikely to achieve the proven optimality for large instances within reasonable computing time [4]. Our previous work [2] applied a novel framework of hyperheuristic to solve CLRPSPD using tabu search, FRR-MAB, and combination of both as selection strategies and five acceptance criteria. However, the main difference between this paper and our previous work can be obtained: (1) the high-level strategies were inspired from evolutionary algorithms instead of the above strategies; (2) four basic acceptance criteria were developed in this paper which were not used by our previous work; (3) this paper aims at solving the benchmark instances of the CLRP and CLRPSPD, but our previous work only tackled the latter; (4) the pool of operators in our previous work was classified under the premise of nature of operators which was given in advance, but in this paper, the nature of operators was unknown. Moreover, the difference between this paper and Ref. [4] can be drawn that our paper focuses on a different solution method for both CLRP and CLRPSPD, namely, hyperheuristic approach. erefore, the main contributions are as follows: (i) Aiming at solving two basic models in complex logistics system, we explore a generality-oriented and emerging heuristic for solving both subjects in complex logistics network, namely, evolutionary hyperheuristic-based iterated local search. (ii) Inspired from the process of metaheuristics, several ESs are developed for implementing an online learning strategy for adaptively selecting promising sequence of LLHs to attempt to realize the global search, that is, QS, AS, and PS. Other most recent and relevant nonevolutionary selection strategies (RP, TS, and FRR-MAB) [5][6][7] are also chosen as the baseline for comparison. (iii) Four basic acceptance criteria are used to pair up with ES and non-ES to examine their performance, that is, All Moves (AM), Naïve Acceptance (NA), Great Deluge (GD), and Simulated Annealing (SA).
e remainder of this study is organized as follows. Section 2 provides a brief review about the approaches for problem domains and hyperheuristic over recent decades. In Section 3, the three-index MIP formulation for CLRPSPD is defined. e proposed evolution-based selection strategies are described in Section 4, and evaluation and discussion are presented in Section 5. Finally, conclusions are given in Section 6.

Literature Review
Two types of reviews are concerned in this section: one for problem domains (CLRP/LRPSPD) and one for approaches (i.e., hyperheuristic and HH). e former mainly focuses on effective approaches for solving CLRP and CLRPSPD, while the latter reviews recent technologies in developing HH and applications of HH in real world.

Location-Routing Problem.
In the complex logistics, CLRPSPD is a more practical variant version of CLRP by considering reverse logistics, that is, goods need to be picked up from clients when vehicles deliver goods to clients. Firstly, we review some recent approaches for CLRP, and then recent state of CLRPSPD is investigated.
As the main component in CLRP, VRP has been studied for decades. In the last years, several researchers have devoted to the development of VRP. Ant colony algorithm was applied to solve network intensive vehicle service [8]. Multiple population-based genetic algorithm was utilized to solve train-set circulation plan problem [9]. A hybrid tabu search algorithm was developed for a real-world problem [10], that is, open VRP considering fuel consumption. Meanwhile, several approaches were developed for solving a variant of VRP [11], namely, VRP with time windows. However, the above papers did not analyse the effects of depots on the logistics networks.
Among the variants of LRP, the CLRP has recently emerged as one of the most addressed, initiated in Ref. [12], applied in many practical applications, such as, glass recycling [13], food distribution [14], obnoxious waste [15], disaster relief [16], and so on. Several surveys on CLRP are provided in Refs. [17][18][19][20][21]. e above surveys proposed classification schemes in aspects of either structural characteristics or solution approaches and also summarized variants and extensions of CLRP, but summary on the best approaches for solving CLRP was not provided.
Considering the tailored solution methods of solving the CLRP, several exact methods have been devoted to solving CLRP, such as, branch-and-cut-and-price [22], branch-andcut [23,24], and dynamic programming [25], which are the most effective exact methods in tackling CLRP. However, no one is able to achieve proven optimality to large-scale instances within reasonable CPU time.
e most effective heuristics have been suggested to tackle large instances and mentioned hereafter.
A two-phase metaheuristic method with TS architecture was proposed by decomposing CLRP into two subproblems [26]. An iterative two-phase approach was applied in Ref. [27]. In their algorithm, clients were clustered into supercustomers, then the Lagrangian relaxation method was used in the location phase, and granular TS (GTS) constraints were suggested to improve the routes in the second phase. A clustering analysis was presented to generate the routes data, and then depot location is solved with the collapsed routes [28]. In the method proposed in Ref. [29], a neural network combining with a self-organizing map and TS was applied to solve single-depot LRP. An iterative heuristic was presented for LRP on the plane by iteratively processing feedback information of two decision problems [30].
More recently, a hybridized GRASP was developed [31], and an evolutionary local search (ELS) procedure was used to search within two solution spaces. SA heuristic based on three random neighbourhoods was studied [32]. An adaptive large neighbourhood search heuristic by combining hierarchical structure and several operators was developed [33]. Two-phase hybrid heuristic was presented by constructing two phases: construction phase and improvement phase [34]. e method was developed using a hierarchical ant colony structure in multiple ant colony optimization algorithm [35]. GTS with a variable neighbourhood search algorithm was for solving CLRP [36]. A unique genetic algorithm was proposed for CLRP [37], using local search procedures in the mutation phase without changing the standard GA framework. Besides, the aforementioned exact and heuristic approaches have been applied successfully to CLRP by many researchers, and most require efficient constructive methods for obtaining the initial population.
Little attention has been received in developing the CLRPSPD. is subject was first addressed in Ref. [38]. In their paper, an exact algorithm named branch-and-cut method combined SA (BC-SA) was studied. Next year, the same authors developed two types of formulas for CLRPSPD: node-and flow-based formulas [39].

Complexity
Multistart SA was proposed to incorporate multistart hill climbing strategies into SA framework [40]. Two-echelon CLRPSPD has been addressed to tackle the instances with less than 50 clients [41]. However, no instances with more than 100 clients have been solved to prove optimality by the above four approaches. e SA employing three local neighbourhood search mechanisms was developed to solve few instances with more than 100 clients [4]. e aforementioned methods have devoted to the development of CLRPSPD and achieved significant success in obtaining high solution quality at the expense of significantly high computing times. Our previous work [2] could outperform the above papers in terms of computing times and solution quality by developing a general framework of hyperheuristics using iterated local search procedures.

Hyperheuristic.
Conclusion derived from the aforementioned approaches for CLRP and CLRPSPD can be drawn that different search operators are involved in those methods, e.g., hill climbers, mutation heuristics, and crossovers, which may be effective in finding global optima. However, it is hard for them to obtain trade-off between exploration (exploring new solutions) and exploitation (exploiting better solutions) rooting in that no corresponding strategies are suggested to efficiently evaluate and manage those operators at right time, showing poor performance concerning computing times. Moreover, existing search-based approaches are generally domain-dependent, resulting in a hard task for tester without a deep knowledge in domain. e ideal of hyperheuristic was defined as "heuristics to choose heuristics" [5,42]. Posteriorly, an extensive version was developed as a methodology and classified into two types: heuristic selection and heuristic generation (heuristics to generate heuristics) [43]. is paper focuses on the former based on single-point-search method, and brief description and review is provided hereafter.
In the framework of selection HH, two levels are concerned: HLH and LLH. e HLH manipulates the space consisting of a fixed pool of LLHs which directly modify the space of solutions [44]. Two main categories can be considered in HLH: selection strategies and acceptance criteria [45]. e role of heuristic selection mechanism is to intelligently construct effective sequences of heuristics from the pool of LLHs, while acceptance criterion aims at deciding whether to accept or reject new solution after applying the chosen LLH [46]. By analysing the source of feedback information, three modules can be considered: online, offline, and no-learning. Choice function [5,47], reinforcement learning [48], TS [2,6,[49][50][51][52], and FRR-MAB [2,[53][54][55][56][57] are examples for online selection strategies, and simple random, random descent, RP, etc., are viewed as no-learning methods. Several metaheuristic-based strategies for designing hyperheuristics have been proposed in the literature: ant colony-, particle-, and quantum-inspired hyperheuristic. Characteristics of most evolutionary hyperheuristics in the literature are presented in Table 1, where for each   EHH category, application domain, source of publication,  main feature, and publishing year are provided. As far as  acceptance criterion is concerned, two types are involved:  determinate and nondeterminate methods.  e former  determinately accepts the resultant result, such as all  moves, only improving, improving, and equal while NA,  SA, GD, and Monte Carlo are instanced as the nondeterminate methods for accepting the new solutions. With the popularity of hyperheuristic, it has been widely applied in practice, such as 2D regular and irregular packing problems [73], nurse rostering [74], vehicle routing problem [54], construction levelling problem [52], software project scheduling problem [56], t-ways test suite generation [6,75], deriving products for variability test of feature models [55], fast machine reassignment [76], and timetabling [77][78][79][80][81]. We refer interested readers to these papers [43,82,83] for extensive review on hyperheuristic.
To the best of our knowledge, evolutionary hyperheuristics have not been used thus far to address the basic models of complex logistics network, i.e., CLRP and CLRPSPD.

Mathematical Formulation
CLRPSPD is a more practical variant of CLRP considering the reverse logistics in which the pickup and delivery take place at the same time for each client [4]. e CLRPSPD and CLRP are defined on a complete directed graph G � (V, E), where V � J∪I is a set of nodes in which J and I represent the potential depot and client nodes, respectively, and E � {(i, j): i, j ∈ V, i ≠ j}\{(i, j): i, j ∈ J} is the set of arcs. Each arc (i, j) ∈ E has a nonnegative distance c ij . Each client i ∈ I has a positive delivery demand q i and pickup demand p i (only for the CLRPSPD). A storage capacity w j and a fixed opening cost f j are associated with each potential depot j ∈ J. e index set of vehicle types is denoted by K consisting of homogeneous vehicles with capacity Q. e objective is to determine the optimal plan of the set of depots and routes to minimize the total costs consisting of opened depot, vehicle, and travel costs.
Two types of MIP formulations for CLRPSPD are defined in Refs. [38,39]: node-and flow-based formulations, whose distinction depends on the definition of additional variables in the graph: nodes or arcs, and similarity of the above two is built on two-index concept. In this paper, threeindex flow-based MIP formulation is defined for CLRPSPD. Before describing this formulation, some assumptions should be considered: (i) Each route is served by one vehicle, and each client is served only once. (ii) Each vehicle must return to the departure depot at the end of the route. (iii) Goods of each client are delivered and picked up at the same location, and the goods are delivered to each client before being picked up. (iv) e total vehicle load at any arc must not exceed the vehicle capacity.
Complexity 3 (v) e total delivery and pickup demands of clients served by each depot cannot exceed the depot capacity.
Decision variables: Additional variables: U ijk : the dynamic load of each arc (i, j) operated by vehicle k.
Based on the aforementioned assumptions and notations, the proposed flow-based formulation [2] is as follows: min Fitness � j∈J f j y j + i∈J j∈I k∈K subject to i∈V k∈K i∈J j∈J i∈J j∈I i∈S j∈S j∈I k∈K i∈J j∈I i∈I j∈J k∈K i∈V In this three-index flow-based MIP formulation, objective function (1) represents the fitness value Fitness consisting of the total costs composed of fixed lease cost of depots and vehicles and travelling cost of edges; constraint (2) guarantees that each client is served only once; constraint (3) eliminates routes between depots; constraint (4) ensures  [61] Population/paired 2D bin packing [62] Population/paired Travelling tournament [63] Population/paired P-median [64] Population/paired WDM networks [65] Population & single-point/paired WDM networks [66] Population/paired Dynamic environment [67] Single-point/paired Travelling salesman [68] Population/paired Flow shop scheduling [69] Population/single Particle-based HH Academic scheduling [70] Population/multi-heuristics Examinations scheduling [71] Population/multi-heuristics Project scheduling [72] Population/multi-heuristics Grid resource scheduling [73] Population/single-heuristic Quantum-based HH Energy-aware scheduling [74] Population/single-heuristic 4 Complexity that entering and leaving edge to each node is equal; constraint (5) requires that only one route is assigned to one vehicle and the vehicle only departs from one depot; subcircuits in each route are forbidden by constraint (6); constraint (7) eliminates the situation that one client is served by plural different depots; constraint (8) specifies that plural different depots must not exist in the same route; constraints (9) and (10) enforce that clients are assigned to the selected depots and each depot to open must serve at least one client; each vehicle is assigned to the chosen depots and each selected depot must have at least one vehicle, which is ensured by constraints (11) and (12); constraint (13) ensures two consecutive clients of each route are served by the same depot; constraint (14) forbids that total delivery and pickup demands of clients must not exceed the depot's capacity; constraint (15) guarantees that the total delivery demands of all clients served by each depot should equal to depot's total delivery load; constraint (16) requires that the total pickup demands of all clients served by each depot should equal to depot's total pickup load; equation (17) expresses the proper movement of load delivery and/or pickup; constraint (18) makes sure that the total load of each arc of vehicles must not exceed vehicle's capacity.

Proposed Approach
In the following sections, we introduce HH by describing the adopted solution representation, effective constructive heuristic, low-level heuristics for domains, nine selection strategies, and four acceptance criteria for high-level heuristic.

Solution
Representation. e solution representation plays an important role in improving the performance of an algorithm for the CLRPSPD, which should include all routes and the chosen depot of each route. erefore, a string of cells are used to represent a complete solution which determines the assigned clients to each vehicle, the depots to be selected, and the sequence of clients to be visited by a specific vehicle starting and ending at the same depot. e solution representation in this paper applies a simple and efficient encoding [2], which is a complete set of vehicle routes, that is, R � {r 1 , r 2 , . . ., r K } with inserting the chosen depot at two ends of each route. Meanwhile, we also store the attributes of each route in the second subcell of each route. It is worth noting that our chromosome representation can meet the constraints (2)-(18) for avoiding restoring the feasibility of solutions and allowing fast evaluation of its fitness value without the need for decoding.

Initial Solution.
e initialization method is an improved greedy heuristic (inspired from the approach in Ref. [4]), named here as regret-k greedy heuristic (RKGH), proposed by our previous work [2]. After implementing greedy heuristic [4], configuration of depots (m ≥ 2) has been determined. Let x ik ∈ {1, 2, . . ., m} be a variable that indicates the depot for client i that has the k th lowest cost, that is,c(i, x ik ) ≤ c(i, x ik′ ). Using this notation, we can define a regret- In other words, the regret-k value is the difference in the cost for client placed in the first best depot and its k th best depot. e RKGH chooses to assign the client i that maximizes e multiplier α depends the remaining capacity of each depot, α � 1 representing that client i is served by this depot without violating the capacitated constraints; otherwise, α is set to 0. In this paper, the value of k is also set to 2, namely, R2GH is applied.

Hyperlevel Heuristics.
In this paper, we use different pairings of operators' vector and acceptance criterion. ere are 36 possible pairings of nine operators' vectors and four acceptance criteria. Several strategies are involved: RP, TS, FRR-MAB, QS, AS, and PS, and three variants are also developed.
e strategies implementing QS, PS, AS, and their variants can be categorised as ES. Four acceptance criteria are provided in this paper: AM, NA, GD, and SA. e brief descriptions are presented in the following sections.

Operator Selection Vector Design.
Operator selection strategies in the literature generally assign a probability to each operator and use a roulette wheel or tournament-like process to select the LLHs according to them [54], named as operator selection vector. e selection vector consists of an array of operators, each with a probability of selection. In this paper, selection vector is designed by assigning a probability to each LLH, and the initial selection vector of LLHs has an equal probability of selection.
(1) RP: the initial vector is not changed during the run, and all LLHs have an equal selection probability regardless of performance, also called as fixed selector [84], so the random one is chosen as a baseline for comparison with others. (2) TS: tabu list is used to prohibit LLHs with recently poor performance from being applied too soon. e LLHs belonging to be tabu (i.e., the selection probability is 0) are released (i.e., the selection probability is initial value) whenever other LLHs make a change in objective function, which is slightly different from the aspiration criterion [51]. (3) FRR-MAB [7]: FRR-MAB was proposed to adaptively select appropriate LLH based on its recent performance storing in slide windows implementing FIFO mechanism. Two successive stages are discussed: credit assignment and selection mechanism. e former offers the method to measure the impact on the quality in search process caused by the application of recent LLHs, while the latter selects one LLH with maximizing received credit values for generating new solution. For applying selection vector, the reward value of each LLH is normalized as selection probability. e other version based on Complexity FRR-MAB is combined with TS (viewed as FMT) proposed by our previous work [2], absorbing the rapid response capability of TS to exclude LLHs with poor performance. (4) QS: a heuristics search space with ξ LLHs in QS is defined as where h i � β i α i represents h i Q-bit and |β i | 2 and |α i | 2 give the probability that h i selects and disuses state, respectively, and guarantee |α i | 2 + |β i | 2 � 1. A learning strategy based on Q-gate is utilized to update selection probability |β i | 2 of h i based on realtime performance.
where α t i β t i is the Q-bit of h i during t th iteration and θ t i is the rotation angle of each Q-bit; Δθ t is rotation angle in t th iteration; k is uniformly distributed within 0 1 ; t max is the maximum iteration; π t and π t best are, respectively, sequence of current selected LLHs and selected LLHs in obtaining the best solution; x t i and bx t i are the state of h i of current and global sequence of selected LLHs; π t � 4 6 7 1 represents the current sequence of LLHs; and x t can be illustrated asx t � 1 0 0 1 0 1 1 if ξ � 7. e other version based on QS is combined with TS (named as QS2), absorbing the rapid response capability of TS to exclude LLHs with poor performance. (5) AS: inspired from application of ant colony optimization (ACO) in travelling salesman problem (TSP) and 0/1 knapsack problem, two versions of AS are developed into adaptively making decisions on selecting favourable sequence of LLHs. e first variant (paired-AS, AS2) highlights the joint performance of pairs of LLHs by determining next LLH (one at a time) based on probabilities proportional to the pheromone levels of each heuristic pair shown in equation (24); in this version, pheromone laying criterion [59] is modified for laying the pheromone for pairs which reach an improvement to the previous function, that is, if an ant performs heuristics h x , h y , and h z and h y leads to a nonimproving solution and h z provides a better solution, pheromone will be laid on edge x-z and neither x-y nor y-z, while the latter (single-AS, AS) emphasizes individual performance of each LLH by viewing the performance information of individual LLH as its pheromone trail [67]. Possible sequence of LLHs is selected by normalizing pheromone trail as selection probabilities.
where τ t ij (ξ × ξ) and p t ij (ξ × ξ) define, respectively, the pheromone trail and selection probability of i-j edge in t th iteration; 0 < ρ ≤ 1 represents the evaporation rate for pheromone; FIR j is fitness improvement rate compared to previous solutions [7,55]; and η ij (ξ × ξ) is the visibility information illustrating heuristic information.
e above equation (25) is used to update pheromone trail with η ij � T i + T j (T i indicates the running time of h i ), while equation (25) can be modified for AS with η j � T j by ignoring the previous h i . (6) PS: PS has been nominated as HLH strategies to measure effectiveness of their order placement and selection mechanism. Each particle is a vector of ξ numbers representing the selection probabilities of LLHs obtained by normalizing the position (using the original formulations) of this particle. e particles impose the order that the LLHs are applied to solution domain. Each particle is evaluated using performance indicator FIR.
Among the last seven selection strategies, probabilities of selecting LLHs are utilized to order LLHs, with the maximum utility score/weight being placed in the front of the list. e chosen LLHs are then applied in sequence following this order. However, the sequence of selected LLHs are randomly permutated to modify the current solution for RP and TS.

Acceptance Criteria Design.
After recent application of the LLHs, the obtained solution is considered for accepting as incumbent solution into next iteration. If the new solution is at least as good as the previous solution it will replace, and then it is automatically accepted as current solution regardless of the nature of an acceptance mechanism; otherwise, acceptance criterion is utilized to whether discard the 6 Complexity new solution or not [65,84]. e following four acceptance criteria are proposed to pair up with the above selection strategies, aiming at pointing out which pairs perform better than others.
(1) AM [5]: the current result is replaced by all new results with probability value at 1. (2) NA [84]: the current result is replaced by no-improving results with probability value at 0.5. (3) GD [50,84]: a no-improving result is accepted if it is better than a dynamically changing threshold value which depends on the current and historical fitness information. For determining the range of accepting a child solution, a threshold value is applied, which decreases with the increase in iteration.
where t and t max indicate the current steps and the maximum iteration and G best (t) represents the global best solution objective value found so far. (4) SA [65]: new solutions are accepted as the current solutions if the probability criterion is met, that is, improving results are accepted with probability value at 1 while no-improving results are accepted if a uniform value within [0, 1] is less the critical value. e probability of accepting a no-improving solution in SA is calculated for making a decision on whether accepting it or not [65]: where Δf is the difference in the quality between the new solution and current solution and ΔF t represents the expected range for maximum solution quality change, ΔF t = G best (t) − G best (0).

Low-Level
Heuristics. e pool of LLHs can be viewed as a "black box" which are used to perturb the incumbent solution by either intensifying or diversifying the search in the search region, which is used in Ref. [2]. ey normally are a set of simple, cheap, and knowledge-poor LLHs [5]. e module of this paper is composed of 13 LLHs h 1 , h 2 , . . ., h 13 across two pools of heuristics: mutational heuristic (MH) and local search/hill climber (HC), which are identified by the role in improving or worsening the solution. e first six LLHs are implemented to explore new region: 2-opt, or-opt, shift, interchange, add, and Shaw. Others are utilized to exploit better solutions: relocation, 2-opt, 3-opt, relocate, and 1-interchange algorithm, which apply intra and interroute moves. e detailed information of h 1 ∼h 4 can be obtained from Ref. [54], which are simple and basic mutational heuristics. e h 5 heuristic diversifies the open depots by opening a new one and randomly assigning between 1 and 2/3 of the routes to it [37]. As to the last diversified LLH h 6 [85], it is a method named as destroy-and-repair operator proposed to remove related clients, i.e., clients that are geographically close to each other and reinserted into best positions. In our paper, basic greedy heuristic [86] is suggested to reinsert removal clients back into routes. e disturbed solutions are accepted as long as the above acceptance criteria and the vehicle capacity at each arc and depots capacity are obeyed. e relocation h 7 is developed to reselect the appropriate depots for determined routes. Each route is collapsed into a cluster and the smallest insertion costs can be calculated as the distance of the depots in the original routes. e depots with the larger number of clusters take priority to open. And then other unassigned clusters (from large to small order) sharing same closest depot will be arranged to its closest depot, unless the assigned depots' capacity and cost are not satisfied. e 2-opt [87] inside the routes is equivalent to the well-known 2-opt move (h 8 ) [88], whereas the other version of 2-opt implemented between different routes is identified (h 9 ). Due to reduced CPU time, the intral-3-opt (h 10 ) [89] is only considered. e relocate heuristic [90] reinserts a single client in another position inside the incumbent route (h 11 ) or in another route (h 12 ). e 1-interchange [91] heuristic swaps each position of client from one route with each client in another route if not sharing the same depot (h 13 ). If improvements are found and depot capacity and vehicle load at each arc are met, the above moves are implemented, allowing them to improve incumbent solutions.
e MHs often perform simple random moves to perturb the incumbent solution without guaranteeing competitive results. Although insufficient for achieving competitive results, they are useful for providing randomization and helpful for navigating out of local optima. However, HCs play a key role in quickly obtaining much better solutions at each step. Moreover, it might be desirable to apply HCs after application of any MH and to apply MHs if it is hard for HCs to provide promising solutions. Considering the above features, the proposed nine selection strategies are utilized to adaptively manage the HCs, inspired from the mechanism on management of local search heuristics [5,54] and the method for automatically classifying according to the performance proposed in Ref. [57]. e probability for selecting any one MH as guider for getting rid of local optimality is set to 1 if and only if no HC can provide an improvement to previous objective value; otherwise, the probability value is set to 0. Hence, Figure 1 is the framework of hyperheuristics applied in this paper.
Moreover, a stopping mechanism is defined by evaluation limits with maximum number of fitness evaluation (α max ), aiming at providing fair comparison.

Computational Evaluation
In this paper, we explore the performance of a set of hyperheuristics in solving CLRP and CLRPSPD. e experiments consist of four parts. In the first part, an overall picture related to the selection strategies and acceptance alternatives is provided to rank the performance of 36 pairs and determine first four outstanding pairs for the following experiments. en, the efficiency and features are analysed for the chosen pairs by implementing on the three sets of Complexity 7 CLRP benchmark instances. In the third part, we explore the performance of selected pairs in solving CLRPSPD benchmark sets. Finally, the performance of selected pairs is tested against recent and effective tailored approaches in the literature.

Experimental Design.
e presented HH approach with 36 pairs is coded in Matlab 8.6 and runs on a desktop computer with Intel Core (TM) i7-6700K (4.00 GHz) and 8 GB RAM, under Windows 10; it is embedded in the CLOR tool, available by emailing to us.
As far as calibration is considered, two types of parameters are concerned: specific selection strategy parameters and common hyperheuristic parameters. e former relates to the setting of selection strategies, while the latter accounts for the common parameters between LLHs and HLH, and they are summarized in Table 2.
Some of the specified parameters follow defaults suggested in the literature, and others were determined by conducting an initial experiment with various settings aiming at obtaining relative better results within reasonable CPU time. As to common hyperheuristic parameters, initial selection probability p is set to 0.5 for each LLH in line with the initial probability amplitudes of QS; tabu list size l t is set to 7 which is the number of HCs; as single-point-based search hyperheuristic framework is investigated in this paper, the sizes of population used in the LLH and HLH are set at 1. For fairly comparing the running time of each pair, the maximum number of fitness evaluation for each instance is provided, which was different from the method in Ref. [2] (i.e., the maximum iterations without improving solution), depending directly on the number of clients, depots, and vehicles: Aiming at obtaining a good trade-off between solution quality and computing time, the multiplier a is taking on the value 5 for all instances in this paper.

Benchmark Instances.
ree sets of CLRP benchmark instances were adopted to evaluate the efficiency and features of each pair for hyperheuristic. ese sets are provided in Refs. [26][27][28]. One separation approach is used to generate the datasets of CLRPSPD from the above CLRP benchmarks, instanced as W type in Ref. [92] with setting 0.8 for β.
e first benchmark instances [28] contain 19 cases. e number of clients ranges from 12 to 318, and the number of depots m varies between 2 and 15, and the fixed vehicle cost f v is not given. In the randomly generated set [27], the number of clients ranges from 20 to 200, the number of depots m is 5 and 10, and vehicle capacity Q is {70, 150} and f v � 1000. e benchmark instances [26] are also randomly generated which contain 36 cases with n ∈ {100, 150, 200}, m ∈ {10, 20}, Q � 150, and f v � 10. And the clients' demand of this benchmark ranges from 1 to 20. e above three  CLRPSPD, the obtained distances are multiplied by 100 (rounded up to the next integer in CLRP) and the total costs are rounded up to the next integer, which is different from the sets obtained in Refs. [4,38]. e working cost of vehicle for each instance with less than 100 clients in the first set increases to 20, namely, f v � 20.

Experiment on 36 Pairs.
e first set of experiment is performed to compare the performance of 36 pairs of hyperheuristics aiming at determining the first four outstanding pairs for the following experiments. Nine CLRP instances in Barreto set with the number of clients ranging from 50 to 318 were used to test performance of each pair, implementing fifty runs on each instance. e maximum number of fitness evaluation is set to 10000. Aiming at comparing and scoring for all pairs, formula (1) scoring system was adopted, which was used to rank the approaches by CHESC (Cross-domain Heuristic Search Challenge) before 2010 (http://www.asap.cs.nott.ac.uk/external/ chesc2011/). e top eight approaches are given a score of 10, 8, 6, 5, 4, 3, 2, and 1 point for each problem from the best to the worst, successively [65]. e rest of the methods receive a score of 0. In this paper, the score of 10 points is assigned to the pairs which can exploit the best known solution (BKS), and the scores for the rest of pairs are similar to formula (1) scoring system. erefore, 90 is the maximum overall score a pair can get.
Experiment results are shown in Table 3, where the overall scores of the best four pairs, nine selection strategies, and four acceptance criteria are provided. As can be seen from the results, HH performing AS-AM is the best clear winner, and PS-AM, QS2-SA, and TS-SA also outperform other pairs. e hyperheuristic using AS, FRR-MAB-TS, TS, and RP as selection components performs better than others regardless of the acceptance component, and several interesting conclusions can be drawn from the overall scores that (1) TS could promote better performance for selection strategies by excluding the LLHs with poor performance, when compared with the performance of selection strategies without TS, reaching an average improvement about score 25 point; (2) the performance of AS using 0/1 knapsack problem outperforms AS2 applying TSP, in other words, compared to the one laying ant pheromone trails at routes, the mechanism laying ant pheromone trails on vertices (i.e., LLHs) have a greater chance of success in guiding ants to select the promising sequence of LLHs; (3) PS-AM and QS2-SA ranks the second/third place in overall scores of pairings, even if the performance of PS/QS2 is barely satisfactory. From the last four scores of acceptance component regardless of the selection component, AM and SA rank, respectively, the first and second with a score 529 and 518, while the other two receive a score of 473 and 429, respectively. From such analysis and answering the first experiments, we identify AS-AM as the best pair considering most cases and scores, and additional PS-AM, QS2-SA, and TS-SA are also selected to conduct the following experiments.

Experiment on CLRP.
e first four outstanding pairs were tested performing fifty runs on each instance, from which best found solution, gaps to BKS, and computing time are shown, providing the insufficient page space. Results for three CLRP benchmark instances are shown in Tables 4-6. e first column of each table is the name of each instances, followed by BKS, and results concerning four pairs which contain best found results (Best), gap (in percentage) to the BKS, and average computing time in seconds (CPU). Average and median values for gaps and CPU are displayed in the last two rows.
For the first three pairs (Table 4), the BKSs of instances with less than 150 clients are exploited with average gap at − 0.04% for TS-SA, − 0.07% for QS2-SA, and − 0.05% for AS-AM. Only one BKS cannot be found by the last pair with highest gap at 0.06% and average value at − 0.04%. Concerning the median values of gaps to BKS, they take on the value 0. Computing times are on average less than 29, 25, 27, and 17 s, and median values are lower than 6 s even if the largest instances exist. Two new BKS are found by four pairs, obtaining an improvement of over 0.5% for Perl83-318 × 4 by PS-AM and over 0.7% for Perl83-318 × 4-2 by QS2-SA.
In the second benchmark (Table 5), at least 16 BKS can be found by the four pairs, with difference in finding the Best of 50-5-2b. Four new BKSs in this set are obtained, respectively, reaching an improvement of over 0.3% for 200-10-1b by TS-SA and PS-AM and 0.1% for 200-10-2b and  (Table 6), TS-SA and PS-AM tie for first place in the number of new BKS found with 13 new BKSs, and QS2-SA tails the winner to take the second place by exploiting 12 new BKSs and AS-AM is in the last place. Concerning the number of solution less than or equal to BKS, TS-SA, PS-AM, and QS2-SA, respectively, rank the first, second, and third with 29, 28, and 27 solutions less than or equal to BKS, and AS-AM ranks the last place with 26 solutions less than or equal to BKS. What surprised us is that the number of new BKS accounts for over 30%, except for AS-AM with 27.8% of 36 instances in this benchmark. Concerning gaps of Best, average values are, respectively, − 0.01% for TS-SA, 0.01% for QS2-SA and PS-   Tables 4-6 through Table 7 based on multiple pairwise comparisons with 95% confidence level (i.e., α � 0.05), namely, Friedman tests. In the Friedman test, the null hypothesis (H 0 ) is only rejected if the Friedman statistic (χ 2 ) is greater than the critical value, looking up in χ 2 table through sample size and confidence level, and post hoc test based on Wilcoxon Rank-Sum should be carried out to detect the significant difference among samples, when needed; otherwise, the null hypothesis (H 0 ) is accepted indicating that no significant difference is concerned among four pairs. In Table 7

Experiment on CLRPSPD.
e CLRPSPD benchmark instances, which were first converted in Refs. [2,38], were also used to evaluate the features of the best four pairs. However, this paper only considered the W type of the above sets for the CLRPSPD. e main reason is that the sets of X, Y, and Z types show the same characteristics with the original sets of CLRP, and it is most difficult to obtain BKSs for the instances using W type separation approach. All BKSs were obtained from Ref. [1]. Results for the three CLRPSPD benchmark instances are shown in Tables 8-10. e first column of each table is the name of each instances, followed by the best known solution (BKS), data concerning the first four best pairs which contain best results (Best), gap (in percentage) to the BKS, and average computing time in seconds (CPU). Average (AV) and median values (MD)for gaps or gap and CPU are displayed in the last two rows.
For the Barreto benchmark set, computing times are on average lower than 35, 30, 29, and 18 s and median values are 3.9, 4, 3.4, and 3.1 s, respectively. Concerning gaps of Best,       Table 10, the computing times are on average less than 84, 83, 64, and 56 s, and median values are 66.6, 68.6, 58.6, and 47.7 s, respectively. As far as gap to BKS is concerned, average/median values are 0.1/0.05%, 0.16/ 0.09%, 0.14/0.08%, and 0.21/0.08%, respectively. Meanwhile, new BKSs of nine instances were obtained by our proposed algorithms, and the largest improvement was over 1% for P23 obtained by QS2-SA. e AS-AM outperformed others obtaining five new BKSs, followed by TS-SA and QS2-SA, and PS-AM performed the worst. However, the best-performing pairs was TS-SA in terms of average gap to BKSs.
Statistical analysis is carried out for obtained results of three sets in Tables 8-10 through Table 11 based on Friedman test. In Table 11 erefore, we can strongly conclude that the proposed pairs could achieve high-quality solutions for the CLRPSPD benchmark instances.
Overall, among the four pairs, two types of discussion and conclusions are conducted: one for problem domains and one for hyperheuristic pairs. From perspective of both subjects, the amount of CPU time depends directly on the            (2) number of depots has slight impact on the running time, which may be resulted by few depots; (3) computing time increases with the decrease of the vehicle capacity in the form of polynomial; (4) the order of impact of the above factors effecting the running time is n, Q, and m. e first and third conclusions were also shared in Ref. [2]. Moreover, the time complexity and solution quality may depend in part on the complexity of the mathematical model. Figure 6 illustrates the impact of complexity of mathematical model on the running time. More specifically, the average computing time of tackling CLRPSPD instances increases by 4.2-21.5% for Barreto set, 30.6-46.8% for Prins set, and 5.3-10.2% for Tuzun and Burke, respectively, compared with the average running times of tackling CLRP cases. Concerning solution quality and average standard deviation, the complexity of instances is the direct acting factor consisting of the number of clients, depots and vehicles, and the distribution of clients and depots. Furthermore, the complexity of approaches or pairs of selection strategies and acceptance criteria also significantly influence the running time, standard deviation, and solution quality. PS-AM seems to be among the fastest with declining 36, 33, and 20% average running time compared to others, and AS-AM and QS2-SA rank the second and third in aspects of average running time. In terms of solutions quality, there is no significant difference among these four pairs concluded by the above Friedman tests, indicating that PS-AM and AS-AM outperform the others in obtaining a better trade-off solution quality and running time, but all pairs can obtain high solution quality within reasonable running time.
Even though the time execution comparison can be unfair, depending on many factors (computer configuration, programming languages, the number of fitness evaluation, etc.), it is a significant performance indicator for judging the advantage of approaches. As shown in Table 12, our four pairs of hyperheuristic seem to be the fastest among the above methods. Furthermore, our approaches are the among best in aspects of solution quality on all CLRP and CLRPSPD benchmark sets with very small gaps. In conclusion, we can strongly conclude that our hyperheuristics are able to outperform the most recent and effective methods in terms of either solution quality or computing time.

Conclusion and Future Work
In this study, evolution-based hyperheuristics are proposed to tackle the CLRP and one of its variants, namely, CLRPSPD. Five evolutionary and four nonevolutionary selection strategies are developed as high-level selection strategies to pair up with four acceptance criteria to make a right decision on constructing effective sequences of LLHs at each step, guiding the proposed approaches to achieve the optimal values. e first experiment was carried out to determine the top four pairs which can rightly reflect to performance information of each LLH and quickly respond to select the promising LLHs. Results show that (1) AM and SA as acceptance criteria performed better when compared to the rest two; (2) AS, FRR-MAB-TS, and TS as selection strategies performed better when compared to the rest of the selection strategies; (3) average performance of AS-AM, PS-AM, QS2-SA, and TS-SA as HLHs perform significantly better than the rest of the pairs. e first four outstanding pairs were picked out to implement the remaining two experiments. PS-AM and AS-AM outperform others in terms of obtaining a good trade-off between solution quality and computing time, and all four pairs can achieve solution of high quality within reasonable running time, providing several new BKSs accounted for over 30% in Tuzun and Burke CLRP set. A comparative analysis is also conducted with tailored approaches in the published literature, and the results illustrate that the performance of our proposed approaches outperform the above methods in terms of both solution quality and computing time. As advocated in Ref. [93] for hyperheuristic, the proposed approaches have capability in exploiting good enough, soon enough, and cheap enough solutions. e proposed hyperheuristic (36 pairs) has been already been implemented in a decision-support tool, and we refer the interested readers to consult us about the package. e next research content will focus on the adaptive selection of MHs and other selection strategies of hill climbers, e.g., choice function [5,47], adaptive pursuit [94], and fair-share method [95]. For bringing CLRP/CLRPSPD closer to the reality, more practical constraints will also be taken into account in future works, such as time windows, dynamic demands, period delivery, and pickup.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.