Finding 𝑝 -Hub Median Locations: An Empirical Study on Problems and Solution Techniques

,


Introduction
Hub location problems deal with the location of hub facilities in a network [1].These problems are highly relevant in several fields, such as transportation [2][3][4][5][6] and telecommunication [7,8]; see [9][10][11] for some excellent surveys.In general, flows are routed from origin nodes to destination nodes through a "spoke-hub-hub-spoke" structure.First, flows from each origin node are collected by their hubs; then, the flows are transferred through at most another hub; and finally, flows are distributed to their destinations.Because of the economies of scale, there are cost discounts while transporting flows between hubs; the total cost with the "spoke-hub-hub-spoke" structure may be less than that for transporting flows from origins to destinations directly [10].
If the number of hubs  is fixed at design time, the hub location problem is called -hub median problem (pHMP) [12].In this paper, we consider hub location problems where the number of hubs  is fixed.Based on the constraints for node assignment, hub location problems have two basic structures: single allocation (SA) and multiple allocation (MA) [13].In SA problems, each node is allocated to a specific hub, and all inbound or outbound flows of each node must travel through its hub.In MA problems, on the other hand, the flows for different origin-destination (OD) pairs can be routed through different hubs.An example for the SA and MA problem is shown in Figure 1.Over time, many variations of the core problem, including uncertainty and reliability, have been considered by researchers.In early research, it is assumed that all nodes can always function properly.However, in reality, each hub can be disrupted because of possible accidents.For instance, in air transportation networks, nodes (i.e., airports) can be disrupted because of bad weather or equipment outage [14,15].Therefore, the reliability of nodes should be considered for more realistic modeling [11,16], taking into account the actual topology [17].In early research, the hub networks are assumed to be complete; that is, hubs are fully connected with each other.In recent years, incomplete  hub networks have also been studied [18].In this case, some hubs are not directly connected.
Apart from a few exceptions, hub location problems are NP-hard [19,20].Therefore, many researchers have focused on proposing reasonable models and efficient algorithms for solving hub location problems.Reference [1] published the first paper about hub location problems, and the first mathematical formulation for pHMP in the following year was proposed [12].After that, the pHMP has been studied by many researchers.Various types of efficient models and algorithms have been proposed.In early studies for hub location problems, researchers applied different solution methods for solving them.For instance, [21] proposed a heuristic method based on tabu search (TS) for solving single allocation problems.Reference [22] proposed a new linear formulation for single allocation model with fewer variables and constraints than the classical formulations.A heuristic based on simulated annealing algorithm (SAA) and a linear program based on branch-and-bound (BB) method are presented for solving the problem as well.After that, branch-and-bound methods and Lagrangian relaxation (LR) algorithm started to be applied to hub location problems.Reference [23] proposed a sophisticated approach based on Lagrangian relaxation for solving single allocation hub problems.Compared with simulated annealing and branch-and-bound method, the problem with 25 nodes can be solved within a reasonable time and solutions with high quality can be obtained.Reference [24] presented a branch-and-price algorithm (BP) for solving a capacitated single allocation problem.To obtain a tight lower bound, Lagrangian relaxation was applied to the problem.Instances with up to 200 nodes were solved optimally with their algorithm.Reference [25] studied the combination of congestion and capacity decisions in hub-and-spoke network design.They decomposed the problem into subproblems with Lagrangian relaxation.Reference [26] proposed new models with the consideration of uncertainties (hub failures and backup hubs) for single and multiple allocation problems.Based on the heuristics proposed by [23], they used Lagrangian relaxation and branch-and-bound method to solve reliable models for these two allocation problems.
In addition to Lagrangian relaxation and branch-andbound method, genetic algorithms (GA) are also commonly used for solving hub location problems.Authors of [27] proposed an effective method based on a genetic search algorithm.They showed that the GA-based method outperforms other approaches in their study on both computation time and solution quality.The method has a potential for solving large problem instances.Based on their research, [28] proposed two genetic algorithms for solving an uncapacitated single allocation -hub median problem.They compared their methods with a tabu search, simulated annealing, and path relinking method (PRM) [35].Reference [30] proposed a genetic algorithm approach for solving a capacitated single allocation -hub median problem, which can guarantee the optimality for networks with up to 50 nodes.The fuzzy capacitated -hub center problem was considered by [36].They provided a genetic algorithm method to solve it.Reference [37] proposed an efficient genetic algorithm for solving the liner hub-and-spoke shipping network design problem.The authors of [31] presented the formulation and a GAbased algorithm for solving a planar hub location problem in reasonable time.The Weiszfeld algorithm (WA) [38] and the location-allocation algorithm (LAA) [39] were also compared with their heuristic on different datasets.In addition, genetic algorithms were also used to solve the reliable hub location problem by [33].With the consideration of hub disruptions, they proposed a mathematical model for single allocation cases.The proposed method performs better than CPLEX.
Based on traditional models, several variants of the problems with new constraints were also studied, such as reliability.Reference [8] presented new reliable models for maximizing expected flow between city nodes.Reference [40] proposed a compact mixed integer program model and a continuum approximation model for the reliable uncapacitated hub location problem.With a customdesign Lagrangian relaxation method, near-optimal solutions can be obtained.In addition to reliability, the incomplete hub networks have also been studied.Reference [18] proposed models for several single allocation problems under √ incomplete hub network design and the mathematical formulations with ( 3 ) variables were presented.Reference [34] also modeled the incomplete hub location problem with and without hop-constraints.Benders decomposition technique was used to solve the problem in large scales.
In recent years, new algorithms and applications on hub location problems have been proposed.Reference [41] proposed approximation algorithms for solving the single allocation problem.Reference [42] studied single allocation problems with multiple capacity levels, where the decision making process includes the size of hubs.An enhanced Benders decomposition (EBD) algorithm was used to solve multiple allocation uncapacitated hub location problem for large-scale networks by [29].The algorithm was evaluated in computational experiments on different instances with up to 500 nodes, compared with Benders decomposition [43], adjustment procedure [44], relax-and-cut algorithm (RCA) [45], and CPLEX.Reference [46] proposed two formulations for capacitated ordered median hub location problems.Reference [32] proposed a new method for single allocation hub location analysis.Based on the spatial and flow properties of nodes, a clustering-based potential hub set (CBS) can be generated, which can help narrow the solution sets and reduce the computational complexity of the problems.In addition to the standard hub location problems above, several variants of hub location problems have also been studied.Reference [47] developed a model with equilibrium constraints for the intermodal hub-and-spoke problem and proposed a hybrid genetic algorithm for solving it.Reference [48] studied hub location problems under uncertainty.Generic models were proposed for single and multiple allocation problems.Reference [49] also presented a programming formulation for hierarchical multimodal hub location problem with timedefinite deliveries and analyzed the sensitivity on Turkish network.Reference [50] studied the hub location problem under competition with Civil Aeronautics Board (CAB) and Turkish Postal (TR) datasets as case studies.
In Tables 1 and 2, a summary of allocation problems and representative solution techniques in the literature is shown.Most of other papers in the references that are covered by them are not shown here.Various methods have been proposed for solving hub location problems, such as LR, GA, and CBS.Methods are often tuned for specific datasets, specific hardware properties, or specific parameters.Although newly published methods compare with few prior works, the selection of these works is often suboptimal, and comparisons are carried out on different datasets with different parameters; datasets are not often made publicly available, which means that results are not reproducible.Moreover, getting an overview over the state of the art in this field is difficult, as results are published in various domains without much cross-talk and often study variations of the core problems.It is difficult for new researchers to choose the best solution algorithm for a given problem instance, because of the lack of common benchmarks and the dispersal of research in different communities caused by the heterogeneity of problems and approaches.
In this paper, the formulations of five types of p-hub median problem variations (classical single, classical multiple, reliable single, reliable multiple, and incomplete single allocation problems) are surveyed first.Then, three different solution techniques from the state of the art, GA, LR, and restricted CBS (RCBS) method, for solving -hub median problems are provided.In addition, we use CPLEX as a benchmark reference.Finally, to compare the performance of these methods, the commonly used TR dataset [51], CAB dataset [12], and Australia Post (AP) dataset [52] are selected as case studies.In our experiment, the solution qualities, computation time, and main memory usage are evaluated.
We would like to emphasize that this paper is not intended as a theory-focused survey of solution techniques.There exist plenty of excellent works in this area [9][10][11].The major contributions of our study are as follows:  We synthesize the essence of solution algorithms and adapt them for different problem instances, to obtain five competitors for each method.
(2) All competitors are evaluated against three standard datasets from the literature: TR, CAB, and AP.Running all experiments with the same setup (hardware/software) parameters allows us to provide a comprehensive evaluation of all techniques.Moreover, we provide an unbiased view on the state of the art, as it is perceived by the papers and their methodological descriptions.
(3) The wealth of experiments we performed and the number of techniques we compared allow us to draw several insights into scalability, solution quality, and the possibility of exploiting parallelism.
We would like to point out that a comparison of different methodologies has to be taken with caution.Some methodologies were proposed to find provably optimal solutions, while others are pure heuristics without any guarantees on optimality.Comparing their solution quality and solution time allows us to estimate the trade-off between running time and solution quality by each method for different datasets.This view is novel to the literature, where, usually, methodologies are not cross-compared.In addition, we have implemented several standard solution techniques in the literature, a considerable amount of work, given that the source code is not published alongside of papers.Clearly, we implemented only a subset of existing methods.Finally, the hub location models chosen for our study make simplified assumptions about the network and the modeling level of detail, for instance, ignoring delay and propagation of delay through a network [53].Therefore, our study should be understood as a first step into the comparative evaluation of hub location problems.
The remainder of this paper is organized as follows.The formulations of five types of hub location problem models are provided in Section 2. Selected solution techniques, that is, genetic algorithm (GA), Lagrangian relaxation (LR), and clustering-based (CBS) method, are abstracted and synthesized in Section 3. To compare these methods, the evaluation of TR dataset, AP dataset, and CAB dataset as case studies is presented in Section 4. The paper concludes with Section 5.

𝑝-Hub Median Problem
The -hub median problems discussed in this study have two distinct structures: single allocation (SA) and multiple allocation (MA).In addition, each problem can also be considered in the classical models and as a reliable model, depending on whether the reliability of hubs is taken into account.In classical models, all hubs are assumed to function properly, while each hub can be disrupted in the reliable models [54].An example is shown in Figure 2: There is a regular route from Pittsburgh to San Francisco with two hubs, New York City and Los Angeles.If New York City hub is disrupted, an alternative route "Pittsburgh → Chicago → Los Angeles → San Francisco" can be used, and node Chicago is the backup hub of New York City.At last, the single allocation problem with incomplete hub networks is also considered.All five formulations of -hub median problems are introduced and formalized in this section: classical single allocation (CSA), classical multiple allocation (CMA), reliable single allocation (RSA), reliable multiple allocation (RMA), and incomplete single allocation (ISA).The major reason for revisiting the mathematical formulation is that many slightly different problem formulations have been used across existing studies, where modifications often have a significant impact on expressivity and time complexity.

Classical Single Allocation Problem.
In classical single allocation problem, each node is allocated to a single hub and all the inbound or outbound flows of each node must go through its hub.Let  = (, ) be a network.Here  and  are the set of nodes and links, respectively.The numbers of nodes and hubs are  and .Let   and   be the cost and the flow between nodes  and  for each pair of (, ).The formulation for CSA is shown as follows [55].
≤   , ∀, ∈  (6) ∈ {0, 1} , ∀,,,  <  ∈ .(8) Here, (1) is the objective function of total costs.Let   =   +   +   be the cost of route (, , , ) [56].Parameter  < 1 is the cost coefficient (economies of scale discount factor, see, e.g., [57]) for transporting the flows between two hubs.Variables   and   are the route variable and assignment variable, respectively.They are defined as follows [58]: Equations ( 2)-(3) ensure that if node  is assigned to hub  (or  assigned to ), the flow from  (or to ) must go through hub  (or ).Equation ( 4) ensures that each node can be associated with one hub only.Equation ( 5) indicates the number of hubs is .Equation (6) indicates that each node can only be assigned to a hub.Based on the fundamental formulation for CSA presented by [59], [22] also formulated CSA with a new model, in which they reduced the number of variables for linear program from ( 4 ) to ( 3 ).Finally, an assumption has been used in many studies:   =   , ∀,  ∈ .Therefore, in this paper, we consider only the case of  < .

Classical Multiple Allocation Problem.
In classical multiple allocation problem, the flows from one origin to different destinations can be routed through different hubs.Thus, we do not need to assign each node to a specific hub.Let   be the variable that is used to indicate whether node  is a hub, and the route variable   is still used.The formulation for the CMA problem is shown as follows [45,60].

Reliable Single Allocation
Problem.The problems above are both without the consideration of hub reliability.In real-world cases, however, each hub can be disrupted for many reasons.For instance, in air transportation networks, nodes (i.e., airports) can be disrupted due to bad weather or equipment outage.In this case, backup hubs need to be considered.Let   ∈ [0, 1] be the disruption probability of node .Let    be   if  ̸ =  and 0 otherwise.For the backup assignment, we use the strategy proposed by [26].They made two major assumptions: first, only single disruptions are considered; second, for the routes with two hubs, the unaffected hub must still be inside the alternative route.Assume   and   are the backup hub variables that are defined as follows: subject to Here, the first two terms of the objective function are the cost for regular routes and the left term is the cost for alternative routes.Equations ( 19)-( 22) are the same as (2)- (5).Equations ( 23)- (24) ensure that nodes can be assigned only to hub nodes, and backup nodes must be different from the corresponding hubs.Equations ( 25)- (26) indicate that one hub needs an alternative hub only if it is different from the OD pair (, ) and it cannot be disrupted otherwise.It shows that RSA-P is a quadratic integer program.
It can be linearized with the linearization method proposed by [61].

Reliable Multiple Allocation
Problem.Similar to Section 2.2, variable   is used to denote whether node  is a hub.Therefore, the formulation of the RMA problem is shown as follows [26].

Incomplete Single Allocation Problem.
For the incomplete single allocation problem, in addition to hub location and assignment, a fixed number  hub links needs to be determined.The formulation for this problem is quite different from the formulations introduced above.Let   = ∑    be the total flow from source node ,   = ∑    be the total flow to destination node , and    be the flow originating from node  which is routed on hub link (, ).The decision variable   for hub links is defined as follows: Therefore, the ISA problem is formulated as follows [18].
≤   , ∀, ∈  (41) +    ≤     , ∀,  < ,  ∈  (45) ∈ {0, 1} , ∀, ∈  (47) In (37), the first two terms are the cost of flows between hubs and nonhub nodes, and the third term is the cost of flows in hub networks.Equations ( 38)-( 39) are the constraints of single allocation and number of hubs.Equation ( 40) is the constraint for the number of hub links; that is,  links are established between hubs.Equations ( 41)- (43) indicate that nodes can be assigned to a hub only and hub links can be established between hubs only.Equations ( 44)-( 45) are the flow equilibrium constraint and flow capacity constraint, respectively.

Solution Techniques
In Section 2, we revisited the formulations of five different -hub median problems.To solve these problems, various solution techniques have been proposed, such as GA and LR; see the introduction and relevant surveys for details.In most literature, however, researchers usually compared their approaches with very few/outdated methods instead of new algorithms from the state of the art.Furthermore, the properties of some new methods were compared only with certain optimization solvers (e.g., CPLEX).Several approaches from the state of the art are summarized and revised accordingly in this section, such as GA (Section 3.2), LR (Section 3.3), and CBS (Section 3.4), all tailored towards the five standard problem definitions.

Overview of Solution Techniques.
In this section, the idea of each method for solving five -hub median problems is revisited.Let us start with GA.A large number of nearoptimal solutions can be obtained with GA, which has been used to solve many complicated problems and its feasibility for solving hub locations problems has been shown in the literature.When solving hub location problems with GA, chromosomes are used to represent solutions [62].In the CSA problem, each node is allocated to one hub.Thus, node assignments can be modeled as chromosomes directly.The crossover and mutation operators are used to change the node assignments and generate new solutions.For the CMA problem, the chromosomes are represented by the routes between OD pairs.In addition, in the reliable cases, we need to add an operator for selecting the backup hubs, which is used to find an optimal alternative route for each given regular route.For incomplete problem, the connectivity of hub networks is considered first while establishing hub links.In LR, the formulations of five allocation problems can be transformed into two independent subproblems, by adding several Lagrangian multipliers.Then, the lower bounds and upper bounds of the original problem can be obtained by solving these subproblems.The Lagrangian multipliers can be updated iteratively, until reaching the optimal solution.The CBS method is a very novel approach for analyzing hub location problem.This method was proposed for solving single allocation problems.However, it can also be applied to multiple cases.In this method, based on spatial properties and travel demands of nodes, a subset of potential hubs is obtained.Then, several constraints based on these subsets are added to the problem formulations, and the computational complexity is reduced significantly.
Note most solution methods that have been proposed by researchers.We use the original procedures to describe them.However, if a solution method has not been published for solving a given problem, we do some modification based on previous versions of the method.For instance, GA has not yet been published for solving incomplete problems and the procedure of this method is modified for incomplete problems in our study.

Genetic Algorithms.
To solve hub location problems, several researchers presented GA-based methods in recent years.Because most methods are proposed for solving single allocation problems, the standard strategy for CSA using GA is introduced first.Then, based on this simple strategy, we develop an extension for RSA by adding a method for the backup hub selection.The GA strategy developed in this section is based mostly on the approaches proposed by [27,33].Initial Population.In the single allocation problem, each node is assigned to a particular hub.Thus, each individual has two arrays: hub array (HS) and assignment array (AY).The hub array HS has a length of  and represents the hub set, and the assignment array AY has a length of  and represents the assignment of nodes to hubs.If node  is assigned to hub , then AY[] = .Here we assume that if node  is a hub, then it must be assigned to itself; that is, AY[] = .Based on the solution representation above, the initial population can be generated.However, before that, the hub set must be determined.To reduce the total cost, large flows should travel through the routes with less cost.Let   = ∑ ∈,<   and   = ∑ ∈,<   be the total outbound flow and inbound flow of node , respectively.Thus, it should be more likely to locate hubs at those nodes with large flows to travel (i.e.,  with large   +   ).Then, in each chromosome, we assign each node to its closest hub.
Crossover.With initial population set, new offspring chromosomes can be generated by the crossover operator.To choose the parent chromosomes, we use roulette wheel sampling [27] and the fitness value of each individual is represented by the reciprocal of the total cost of the solution.The crossover operator is introduced with a small example.As shown in Figure 3, two parent chromosomes are selected randomly and the number of nodes and hubs is  = 10 and  = 3, respectively.First, we choose a swapping number  ℎ from 1 to /2 and a crossover point   from 1 to , randomly.Then,  ℎ hubs are swapped between the two hub arrays.Each assignment array can be divided into two parts by the point   , and each part is combined with the opposite part of the other assignment array.For instance, in this example, we assume that  ℎ = 1 and   = 4.One hub is swapped between HS 1 and HS 2 .If they are 4 and 6, then new hub arrays are HS   1 = [6, 7, 9] and HS  2 = [4,7,8].Next, we combine the left part of AY 1 with the right part of AY 2 , and new assignment array AY  1 is generated.With the same operator, AY  2 can also be obtained.Now, there may exist an infeasible case where some nodes are assigned to a nonhub node.For instance, in AY  1 , node 4 is assigned to itself which is not a hub in the new individual.Therefore, to avoid the infeasibility, we need to do a reallocation after each crossover operator.Each node assigned to a nonhub node needs to be reallocated to its closest hub in the new hub array.In addition, each hub must be reallocated to itself, obviously.
Mutation.To avoid the early convergence near a local optimal solution, the mutation needs to be operated on the chromosomes with a probability.In this paper, two mutation operators are applied to the assignment arrays only: (A) Select two nonhub nodes randomly and swap their assignment.(B) Select one nonhub node randomly and reallocate it to another hub.
Because each hub node must be assigned to itself, the mutation operator is applied only to nonhub nodes (i.e., spokes).For each chosen individual, these two mutation operators are both used, and the result with more total cost is discarded.Using a crossover operator and a mutation operator, two offspring individuals can be generated.Then, elitism is applied in the population.The individuals with worse fitness values are replaced by those with better fitness values.Backup Hub Selection.With the GA strategy introduced above, the CSA problem can be solved.However, while solving the RSA problem, the backup hubs need to be considered after determining the hub set and assignment.Because the alternative route of each OD pair (, ) is independent, we can find the best backup hubs for each individual (, ) by considering the route cost.For each OD pair (, ), the regular route (, , , ) is determined by GA.Then, the best backup hubs for  can be found by computing the costs of all possible backup hubs which is different from  and selecting the minimum one.A similar strategy is used for selecting the best backup hub for .
GA Strategy for Multiple Allocation Problems.Each node is assigned to one hub in single allocation problems.However, in multiple allocation problems, each node can be assigned to different hubs for different OD pairs.Thus, the chromosome size is equal to the number of OD pairs here.Because we only consider the case  < , the size is equal to ( − 1)/2.While generating the initial population, we use the same method as single allocation problem to construct the hub array.Then for each (, ), where ,  ∈  and  < , we choose a pair of hubs (, ) as its regular hubs from the hub array randomly and the backup hubs can also be selected with the method above similarly.

GA Strategy for ISA Problem. GA has not yet been published
for solving ISA problems and we propose a new strategy here.In addition to the hub array and assignment array, the hub links should also be determined in the ISA problem.Thus, a set of hub links is added to each individual.We use the same method as single allocation problem to construct the hub array and assignment array while generating the initial population.In order to guarantee the connectivity of the hub network, a randomized version of the Prim algorithm [63] is used to generate hub links (See Algorithm 1).
In crossover step, after obtaining offspring hub sets with the same strategy as Figure 3, the link sets of offspring can be generated as follows: Let HUB  1 and HUB  2 be hub sets of two offspring.Let Link 1 and Link 2 be the link sets of two parent chromosomes.Then, Link  = Link 1 ∩ Link 2 , Link  = Link 1 ∪ Link 2 , and Link 1 = Link  ∩ {links in complete graph in HUB 1 }.For the first offspring, we remove several links from Link 1 with the guarantee of connectivity of HUB 1 network.The remaining  links construct the link set of the first offspring individual.If Link 1 is not connective or the number of its links is less than , we can add several links to it randomly first.The same strategy is applied to the second offspring.In the mutation step, in addition to the operation on assignment array, a link is selected from the link set randomly and replaced by another random link on the basis of connectivity of hub network.

Lagrangian Relaxation.
As a widely used algorithm, Lagrangian relaxation has also been applied to hub location problems [23,26].In this section, we introduce the procedure of LR for solving hub location problems mostly based on the formulation CSA-P (see ( 1)-( 8)) presented in Section 2.1.The methods for solving the other three problems are similar.
(1) Input hub set H, empty link set  1 = 0, complete link set  2 = {links in complete graph of H}, empty hub set  1 = 0, set  2 = , number of hubs p, number of hub links q.
Here (54) is used to reduce the size of the solution set in the model, and the result qualities can be improved significantly.The lower bound of the CSA-P can be obtained by solving the two subproblems with the methods proposed by [23].To obtain the optimal solution efficiently, we need a method for finding the upper bound.
Upper Bound.The upper bound can be obtained based on the solution of LSub1.After solving this subproblem, a set of  hubs and some assignments   are determined.However, in these results, some nodes may be assigned to more than one hub, and some nodes may be unassigned.Therefore, Algorithm 2 is used to find an upper bound with a feasible solution.
With the methods above, the lower bound and upper bound can be obtained for a group of particular Lagrangian multipliers.In this section, the complete procedure of LR with multiplier updating is shown as follows.
(a) Initialization includes the following: iteration number  = 0, max number of iterations  ite , max gap  gap , max iteration number for no improvement of lower bound  imp , step size multiplier  = 6, and initial Lagrangian multipliers , ,  = 0.
(b) Finding the lower bound LB is by solving LSub1 and LSub2.
Finding the upper bound UB is with Algorithm 2. (c) Updating Lagrangian multipliers is as follows [64]: where (d) If the lower bound LB is not improved in  imp continuous iterations, then  = /2, and , ,  also need to be set to the current best values.
The CSA problem can be solved with the procedure above.CMA, RSA, and RMA are solved with similar strategies proposed by [23,26].LR-based technique has not yet been published for solving ISA problems and, in fact, it is nontrivial to develop such a method.We need to define Lagrangian multipliers for several constraints and construct the Lagrangian problems.The constraint like (54) that can increase the convergence speed is necessary to be added to the Lagrangian problems but difficult to construct.Therefore, LR for the ISA problem is not evaluated in our study.

Clustering-Based Method.
Reference [32] proposed a novel method for analyzing hub location problems by considering clustering-based potential hub sets.They define the travel demand of node  as (  +   ).Then, the following observations are obtained: (1) Nodes with larger travel demands are more likely to be chosen as hubs, especially those nodes far away from other nodes.However, some of the nodes are rarely chosen as hubs in the optimal solutions.(2) In a cluster of nodes that are close to each other, at most one of them can be selected as a hub usually, although the travel demand of this hub is not the largest.(3) The hubs prefer to be spread across the whole network.(4) The optimal hub locations are almost insensitive to the value of .
Thus, the solutions of hub location problems strongly depend on both the spatial properties and the travel demands of nodes.Based on these properties, a heuristic is developed for generating a subset of potential hubs.The hubs for the optimal solutions can be selected from this subset and the computational complexity of the problem can be reduced.To generate the potential hub sets, all nodes need to be sorted by their importance, since the node importance depends on spatial properties and travel demands.The node importance can be measured with different formulations, and one of the best measures identified by [32] is as follows: where   = ∑ ∈   .Here, those nodes with large travel demands and far away from other nodes tend to be chosen as hubs.Then, we list all the nodes in descending order with the value of importance Imp  and the potential hub set   is generated with 2 most important nodes.The hubs are more likely to be selected from   (note that if 2 ≥ , then   = ).
Let  = ∑ ∈  (min ∈  , ̸ =   )/2 be the average shortest distances between nodes in   , and   [] is the th most important node in   .Therefore, the clustering-based potential hub set algorithm (CBS) can be shown as Algorithm 3.
Here, each potential hub has a "hub circle" with a radius of .Node  will be assigned to set   , if there are no other nodes from   in its hub circle; node  will be assigned to set , if there are two or more nodes from   that are less important than itself; node  will be assigned to set   , if  is within the hub circle of node  and node  is more important than .Therefore, some subsets of potential hub nodes are generated based on the clustering properties, and several constraints for hub location can be added to the problem formulations as follows.

CBS
There must be at least one hub in each set   , ∀, and at least one hub in set   .There is a restricted variant of CBS, called RCBS.In addition to (58), it adds another constraint where all the hubs must be selected from the potential hub sets, and the formulation is shown as follows: Note that CBS adds constraints independently of the problems at hand.Therefore, it is a prefiltering method for hubs and can be used for all problems without modification.In addition, there are several other variants of CBS, such as augmented CBS and improved RCBS.The details can be found in [32].

Evaluation
In this section we report the results of our experimental evaluation of all implementations.Section 4.1 introduces the three datasets used in our study and describes the experimental setup.The results and analysis of the CSA, CMA, RSA, RMA, and ISA problems in the TR dataset are provided from Sections 4.2 to 4.6.In Section 4.7, we present the results for the AP dataset and CAB dataset that corroborate our findings for the TR dataset.The hub assignments for selected problem instances and datasets are visualized in Section 4.8.The analysis of parallel computation is shown in Section 4.9.

Introduction of Datasets and Experiments.
To compare the performance of the methods introduced in Section 3, three commonly used datasets are utilized in this study.The TR (Turkish Postal) dataset includes 81 cities with distance and flows between them in Turkish Postal System [51].The AP (Australia Post) dataset provides 200 nodes representing postcode districts with their coordinates and the flows between them in Australia [22].The CAB (Civil Aeronautics Board) dataset is based the airline passenger interactions between 25 cities in USA in 1970 [12].The numbers of nodes for the TR, AP, and CAB dataset are set to  ∈ {10, 25, 50, 81},  ∈ {10, 25, 50, 100}, and  ∈ {10, 15, 20, 25}.We set the cost discount coefficient  ∈ {0.3, 0.5, 0.7} for the TR dataset and CAB dataset, as well as  = 0.75 (  = 3  + 0.75  + 2  ) for the AP dataset.For complete (CSA, CMA, RSA, and RMA) problems, we set the numbers of hubs for three datasets  ∈ {3, 5, 7}.For ISA problems, we set (, ) ∈ {(2, 1), (3, 2), (3, 3), (4, 4), (4, 5), (4, 6), (5, 6), (5, 7), (5, 8), (5, 9), (5, 10)} [18].The visualizations of all three datasets are shown in Figure 4.For the reliable models, the disruption probability   of each node  is set to a number randomly from [0.01, 0.05].The rerouting coefficient  is set to 2. We use the settings for GA and LR here as recommended in the literature.In GA, we set the population size   = 200 for single allocation problem and   = 1000 for multiple allocation problem.The maximum number of generations is 200 [27].If the best fitness value cannot be improved within 10 consecutive generations, the algorithm will stop.In LR, the step size multiplier  is set to 6, and the maximum iteration number is 3000 [23].If the lower bound cannot be improved  within 50 consecutive iterations, then  = /2 and the Lagrangian multipliers will be set to the current best values.If the maximum iteration number is reached or the gap between the lower bound and upper bound is less than a certain value (≤0.1%) or all Lagrangian multipliers are equal to zero, then LR can be terminated.Note that the results for both GA and LR could be further improved by tuning the parameters.
Since CBS provides a refinement of potential hubs, we use it with CPLEX together as a preselection method.In this experiment, its restricted variant RCBS is tested because of its short computation time compared with CBS and other variants.In addition, the results obtained by CPLEX are used as a benchmark.
GA and LR are implemented with Python, while the RCBS and LP are solved with CPLEX.Because CPLEX uses multiple threads as default, we use a single thread in all experiments to obtain comparable results.Finally, the maximum computation time is set to 3,600 seconds; that is, an experiment will be terminated once the running time exceeds 3,600 seconds.The results for the TR dataset, AP dataset, and CAB dataset are presented in Figures 5-8.The solution gaps, computation time, and main memory usage are used to measure the performances of four methods [65].Because the observations on three datasets are generally similar, we only report the results for TR in detail here and summarize the differences to other datasets subsequently.All experiments were executed  on a server with 32 cores and 386 GB RAM, running Fedora 24 (Linux 4.8.4-200.fc24.x8664).

Classical Single Allocation
Problems.The solution gaps, computation time, and main memory usage of four methods for CSA problems in the TR dataset with different numbers of nodes ( ∈ {10, 25, 50, 81}) are shown in the first column (a, f, k) of Figure 5.Note that -axis of all three figures is in log scale.In Figure 5(f), it can be seen that these methods all have similar computation time (around 1 second) while solving the problems with small size ( = 10).With an increasing number of nodes, the computation time for CLPEX-LP, LR, and RCBS increases significantly.In particular, when the number of nodes is up to 50, both CPLEX-LP and LR reach the maximum computation time of 3,600 seconds and they cannot provide good solutions in this case.However, GA performs well here.It provides very good solutions with small gaps within a reasonable computation time for different numbers of nodes.Note that RCBS can also terminate within 3,600 seconds for  ≤ 50.However, it sometimes provides a poor solution with a large gap even for small number of nodes ( ∈ {10, 25}) because its additional constraints limit the search space too much.In Figure 5(k), the main memory usage of CPLEX-LP and RCBS increases significantly with an increasing number of nodes.When  = 81, the memory usage of these two CPLEX-based methods both is over 10 GB while GA and LR use much less memory than them.Therefore, GA performs the best for solving CSA problems in the TR dataset.
In addition, it needs more computation time to obtain good solutions for larger numbers of hubs  with CPLEX-LP, GA, and RCBS.

Classical Multiple Allocation
Problems.The computational complexity of multiple allocation problems is much lower than for single allocation problems.Therefore, CMA is solved in polynomial time if the hub set is determined [26], because the problem can be solved by finding the best route for each OD pair independently in this case.The results for CMA problems in the TR dataset are presented in the second column (b, g, l) of Figure 5.The results show that CPLEX-LP and RCBS take much shorter time for solving CMA than CSA.Thus, CPLEX-LP can be used to solve CMA problems within a reasonable time for small numbers of nodes ( ≤ 50).However, just like the performance in CSA    problems, RCBS sometimes provides relatively poor solutions with large gaps.Note that GA becomes the worst method with large-gap solutions and long computation time because in the case of multiple allocation, the size of chromosomes is up to the number of OD pairs (( 2 )).It is more difficult to reach the near-optimal solutions with random exploration in this case.We have increased the population size to 1000 in additional experiments with GA; however, results did not improve significantly.LR still needs a long computation time for large number of nodes.In Figure 5(l), similar to Figure 5(i), CPLEX-LP and RCBS need much more memory with an increasing number of nodes, while the memory usage of LR is always much less than them.Because of the larger size of chromosomes, GA uses more memory for CMA than CSA problems.Finally, similar to Section 4.2, it is more difficult to obtain good solutions for larger  with CPLEX-LP, GA, and RCBS.

Reliable Single Allocation
Problems.The reliable problems have more computation complexity than the classical problems.Thus, the maximum number of nodes for the evaluation of two reliable problems is set to  = 50.The performance of four approaches for solving RSA problems is shown in the third column (c, h, m) of Figure 5, indicating that when the number of nodes is  ≥ 25, we cannot obtain any optimal solution with CPLEX-LP directly.The results provided by RCBS are slightly better, but it cannot still guarantee the optimality of the solutions even for networks with small sizes.LR can provide acceptable solutions in networks of small size ( = 10).However, with an increasing   number of nodes, the solution gaps become large because of the slow computation for each iteration.The observation in Figure 5(m) is still similar to Figures 5(k) and 5(l): the main memory usage of CPLEX-LP and RCBS increases fast with increasing numbers of nodes, while GA and LR use much less memory than them.Finally, similar to the case of CSA problems, GA presents solutions with high quality within a reasonable time.

Reliable Multiple Allocation Problems.
The results for the RMA problems are presented in the fourth column (d, i, n) of Figure 5. Similar to the condition in RSA problems, the optimal solutions cannot be obtained with CPLEX-LP within 3,600 seconds for  ≥ 25.However, the gaps are smaller than in Figure 5(c), because of the lower complexity of multiple allocation problems.GA still performs the worst and RCBS cannot still guarantee the optimality always.These observations are similar to Section 4.3.The influence of the number of hubs  is still similar to the sections above.
In general, RMA problems with  = 10 can be solved by CPLEX-LP or LR with small gaps (<0.1%) within 3,600 seconds.However, for the cases of  ≥ 25, the problem cannot be solved within 3,600 seconds by any methods proposed in this paper.Therefore, an additional evaluation for the convergence is performed below.
The convergence progress of CPLEX-LP and LR for RMA problems with  = 25 within 3,600 seconds is shown in Figure 6 which indicates that with the increase of time the  solutions obtained by LR converge gradually.Finally, the gaps approach small numbers.However, although CPLEX-LP can provide gaps less than 10% within 1,500 seconds for all 9 cases ( ∈ {3, 5, 7},  ∈ {0.3, 0.5, 0.7}), the gaps stay nearly stable within the remaining computation time; that is, providing more computational resources does not improve the solution qualities.In addition, GA and RCBS cannot provide a nearoptimal solution for RMA because of the algorithm structure.Therefore, LR is the best technique to solve reliable multiple allocation problems.

Incomplete Single Allocation Problems.
For the ISA problems, the number of hub links  also needs to be considered in addition to the number of hub .We evaluate the following combinations of  and : (, ) ∈ {(2, 1), (3, 2), (3, 3), (4, 4), (4, 5), (4, 6), (5,6), (5, 7), (5, 8), (5, 9), (5, 10)}, similar to existing work on ISA problems [18].The results are shown in the fifth column (e, j, o) of Figure 5.We only evaluate CPLEX-LP, GA, and RCBS here, because LR is a nontrivial algorithm and has not yet been published for solving ISA problems.CPLEX-LP still performs well in small networks ( ≤ 25) and RCBS cannot guarantee the optimality.GA strategy proposed by us performs worse than CPLEX-LP for  ≤ 25.However, it can guarantee small gaps (≤3%) for all the instances while CPLEX-LP and RCBS often provide gaps over 10%.The scale of variables of ISA problems is ( 3 ) in ( 37)- (48).Thus, the main memory usage of these three methods for ISA problems all is much less than that for the other four problems.

4.7.
Results for the AP Dataset and CAB Dataset.In addition to the TR dataset, the results for the AP dataset and CAB dataset are presented in Figures 7 and 8.The results corroborate our findings for the TR dataset proposed in Sections 4.2 to 4.6.CPLEX-LP computes optimal solutions for CSA, CMA, and ISA problems in small instances because of their low complexity.GA still has the best performance, with short computation time and low solution gaps for solving CSA, RSA, and ISA problems, because the problem formulations can be neatly encoded with chromosomes of reasonable sizes.LR needs a long computation time for all problems with large network scales.However, similar to the results in Section 4.5, RMA with large networks scales cannot be solved by any of these methods within 3,600 seconds.From the observation in Figure 6, LR should be used to solve RMA problems.The computation time of RCBS is shorter than LP.However, RCBS sometimes provides poor solutions with high gaps because of the additional constraints in the method.
4.8.Visualization of Results.In the sections above, we analyzed the solution gaps, computation time, and main memory usage of four solution techniques for five types of hub location problems.In this section, the results of CSA and ISA problems for the three datasets are visualized in Figures 9 and  10.The number at the end of the caption of each subfigure is the relative difference between the solution and the minimum one.
In the first row (a-d) of Figure 9, results of CSA problems for the TR dataset with  = 50,  = 7,  = 0.5 are proposed.CPLEX-LP only provides a feasible solution in 3,600 seconds because of the large network scale.Thus, the assignment in Figure 9(a) is rather useless.The assignments of other three methods also have some difference, and their solutions are not far away from each other.The results of CSA problems for AP dataset with  = 50,  = 7,  = 0.75 are shown in the second row (e-h) of Figure 9. CPLEX-LP and RCBS provide only feasible solutions with bad assignments.The results of GA and LR have a difference on one hub, but the difference between the values of their objective functions is only 2.45%.In the third row (i-l) of Figure 9, results of CSA problems for the CAB dataset with  = 25,  = 5,  = 0.5 are shown.It indicates that the hub sets and node assignments obtained by CPLEX-LP, GA, and LR are all optimal.However, the results of RCBS are different: Detroit (DTT) is selected as a hub because of its high node importance (obtained by ( 57)) instead of Dallas (DFW).
In the first row (a-c) of Figure 10, results of ISA problems for the TR dataset with  = 50,  = 5,  = 6,  = 0.5 are presented.The structures of links obtained by three methods are similar to each other.Only a few of hubs are different between three figures.In the second row (d-f) of Figure 10, results for the AP dataset with  = 50,  = 5,  = 6,  = 0.75 are shown.The optimal solution is obtained with CPLEX-LP and GA.RCBS selects node 35 and node 16 as hubs because of their high importance.Results for the CAB dataset with  = 25,  = 5,  = 6,  = 0.5 with three methods are shown in the third row (g-i) of Figure 10.Here, the optimal solution is obtained with CPLEX-LP.Although the assignment provided by GA looks quite different from that by CPLEX-LP, their solutions only have a difference of 1.33%.Detroit (DTT) is still selected as a hub instead of Dallas (DFW) by RCBS.

Analysis of Parallel Computation.
The results from Sections 4.2 to 4.7 are obtained using a single thread only.However, the number of threads can be modified easily for CPLEX-LP and RCBS.In this section, we analyze the performance of these two solution techniques with different numbers of threads.
As shown in Figure 11, two classical problems with  = 25,  = 5,  = 0.7, two reliable problems with  = 10,  = 5,  = 0.7, and the ISA problem with  = 25,  = 5,  = 6,  = 0.7 for the CAB dataset are used as case studies.CPLEX-LP and RCBS provide good solutions, with gaps less than 0.1% within 3,600 seconds for all five problems.Thus, we consider the computation time in this section.It can be seen that in the RMA problem (Figure 11 significantly with an increasing number of threads.For the ISA problem (Figure 11(e)), the computation time can be reduced by adding the threads when the number of threads is less than a fixed number (here, the fixed number is 8).However, in the other three figures, there is no significant correlation between computation time and the number of threads.For future work, the parallel computation of GA and LR should be implemented.The impact of the number of threads will also be analyzed.

Conclusions
In this paper, we reviewed, implemented, and compared several methods (genetic algorithm, Lagrangian relaxation, clustering-based method, and CPLEX) for solving five types of hub location problems (CSA, CMA, RSA, RMA, and ISA).Because most methods were proposed for solving one or two types of hub location problems, we made appropriate modifications for these methods to solve all kinds of hub location problems.In order to evaluate the scalability and solution quality of these methods under the same computational environment, three standard datasets (TR, AP, and CAB) were used as case studies.In the evaluation, gaps of the objective functions, computation time, and main memory usage were used to compare the performance of these methods.There are seven major observations obtained from our evaluation: (1) CPLEX-LP performs best on solving two classical (CSA, CMA) problems and incomplete single allocation (ISA) problems with small size because of the low problem complexity.
(2) GA provides good solutions within short time for solving three single allocation (CSA, RSA, and ISA) problems, because small chromosomes are sufficient for modeling single allocation problems.However, it might be a poor choice for solving multiple allocation (CMA and RMA) problems, because the size of the chromosomes is in ( 2 ) (here  is the number of nodes).
(3) LR should be used to solve reliable multiple allocation (RMA) problems, since other methods perform poorly for solving these problems, because the computational complexity of RMA and CMA is similar for LR.However, while using other methods, the complexity of RMA is much higher than CMA.
(4) RCBS can be used to obtain a solution within a shorter time than CPLEX-LP.However, because there are additional constraints in the algorithm, the optimality of the solution cannot be guaranteed with RCBS.Although these constraints can reduce the computation time, the problems may be changed already.
Because of the constraints of potential hub set, some nodes with high importance are more likely to be selected as hubs no matter whether they are the optimal hubs, such as Detroit (DTT) in Figures 9(l) and 10(i).
(5) CPLEX-LP, GA, and RCBS use more main memory for larger networks, whereas the memory usage of LR remains almost stable for different numbers of nodes.CPLEX-LP and RCBS need much more memory than GA and LR.
(6) The structures of solutions obtained with several methods are sometimes completely different, yet, the values of their objective functions usually are not so far away from each other.For a given problem, more computation time is needed to obtain good solutions for large numbers of hubs  with CPLEX-LP, GA, and RCBS.
(7) For reliable multiple allocation (RMA) problems, the computation time of CPLEX-LP and RCBS can be reduced significantly with an increasing number of threads.For incomplete single allocation (ISA) problems, the computation time of CPLEX-LP can be reduced by adding the threads when the number of threads is less than a fixed number.However, there is no significant correlation between computation time and the number of threads for other three types of problems.
Finally, note that several other types of hub location problems, such as the uncapacitated hub location problem and the -allocation -hub median problem [66], are not evaluated in this paper.For further work, these different problems and solution techniques should be considered.Hub location problems can be applied to other network design problems, multimodal systems, or multiple airport systems [67][68][69][70], and this can also be studied for future work.In this paper, we used CPLEX directly; future work could implement more sophisticated methods, such as Benders decomposition.In addition, GA performs well in single allocation problems in the current study.However, the computation time and the usage of main memory increase quickly with the growth of the network.Therefore, the combination of GA and CBS can be considered, that is, operating the GA evolution in the initial population obtained by CBS.In addition, the compression for representation of allocation arrays can reduce the usage of main memory.The parallel computation of several independent subpopulations in multiple threads may also speed up the algorithms.

Figure 1 :
Figure 1: Single allocation problem and multiple allocation problem: (a) Flights from Pittsburgh must go through its hub New York City first for different destinations in single allocation problems.(b) The flow can be assigned to New York City for destination of San Francisco and to Chicago for destination of Houston in multiple allocation problems.

1 )
We perform an exhaustive evaluation of five types of hub location problems and four solution techniques.

Figure 2 :
Figure 2: Hub-and-spoke structure and alternative routes: there exists a regular route from Pittsburgh to San Francisco with New York City and Los Angeles as hubs.If hub of New York City is disrupted, an alternative route "Pittsburgh → Chicago → Los Angeles → San Francisco" can be used.Note that, the Los Angeles hub is still in the alternative route.

Figure 3 :
Figure 3: A small example of crossover and mutation.Hubs are encoded by numbers: (a) Hubs 4 and 6 are swapped between two hub arrays.The assignments of the nodes after the red lines are also swapped between two assignment arrays.After the reallocation, the offspring individuals are generated.(b) The assignments of two spokes are exchanged in the first mutation; a random spoke is reassigned to a random hub in the second mutation.

Figure 4 :
Figure 4: Visualization of three datasets used in this study: TR dataset, AP dataset, and CAB dataset.

1e − 01 CPLEX, p = 3 CPLEX, p = 5 CPLEXFigure 5 :
Figure 5: Results of five types of problems for TR dataset.The problems in five columns are CSA, CMA, RSA, RMA, and ISA.The measurements of three rows are solution gaps, computation time, and main memory usage, respectively.In each subfigure, the results for different numbers of nodes ( ∈ {10, 25, 50, 81}) obtained with four methods are presented.
for CPLEX-LP and LR in CAB dataset (c) Convergence for the CAB dataset

Figure 7 :
Figure 7: Results of five types of problems for AP dataset.The problems in five columns are CSA, CMA, RSA, RMA, and ISA.The measurements of three rows are solution gaps, computation time, and main memory usage, respectively.In each subfigure, the results for different numbers of nodes ( ∈ {10, 25, 50, 100}) obtained with four methods are presented.

Figure 8 :
Figure 8: Results of five types of problems for CAB dataset.The problems in five columns are CSA, CMA, RSA, RMA, and ISA.The measurements of three rows are solution gaps, computation time, and main memory usage, respectively.In each subfigure, the results for different numbers of nodes ( ∈ {10, 15, 20, 25}) obtained with four methods are presented.

Figure 9 :
Figure9: Visualization of CSA problems for the TR dataset with  = 50,  = 7, and  = 0.5, AP dataset with  = 50,  = 7, and  = 0.75, and CAB dataset with  = 25,  = 5, and  = 0.5: red points are hubs and blue points are nonhub nodes.The red links are hub links.For the TR dataset, CPLEX-LP provides a feasible solution only, and therefore the assignment is rather useless.The results obtained by GA, LR, and RCBS are quite similar to each other.For the AP dataset, CPLEX-LP and RCBS provide poor solutions, while the results of GA and LR are different regarding one hub.For the CAB dataset, CPLEX-LP, GA, and LR are all up to the optimality, while RCBS selects Detroit (DTT) as a hub because of its high node importance instead of Dallas (DFW).

Figure 10 :
Figure10: Visualization of ISA problems for the TR dataset with  = 50,  = 5,  = 6, and  = 0.5, AP dataset with  = 50,  = 5,  = 6, and  = 0.75, and CAB dataset with  = 25,  = 5,  = 6, and  = 0.5: red points and blue points are hubs and nonhub nodes.The red links are the links between hubs.For the TR dataset, the structures of links obtained by three methods are similar to each other.For the AP dataset, the optimal solution is obtained with CPLEX-LP and GA while RCBS selects node 35 and node 16 as hubs because of their high importance.For the CAB dataset, CPLEX-LP provides an optimal solution.The solutions of GA and CPLEX-LP only have a difference of 1.33% while having significantly different hub assignments in the figures.RCBS still selects Detroit (DTT) as a hub instead of Dallas (DFW).

Figure 11 :
Figure 11: Computation time of LP and RCBS with different number of threads for the CAB dataset: for the RMA problem, the computation time of two solution techniques can be reduced significantly with an increasing number of threads.For the ISA problem, the computation time of CPLEX-LP can be reduced by adding the threads when the number of threads is less than a fixed number (here, the fixed number is 8).

Table 1 :
Summary of different allocation problems in the literature.

Table 2 :
Comparison of solution techniques in the literature: a method is marked with "√" in the corresponding row, if it was compared in a paper.The methods marked with "O" are proposed by the researchers in the first column.See Abbreviation Summary of Methods for abbreviations of solution techniques.