Graph is an important complex network model to describe the relationship among various entities in real applications, including knowledge graph, social network, and traffic network. Shortest path query is an important problem over graphs and has been well studied. This paper studies a special case of the shortest path problem to find the shortest path passing through a set of vertices specified by user, which is NP-hard. Most existing methods calculate all permutations for given vertices and then find the shortest one from these permutations. However, the computational cost is extremely expensive when the size of graph or given set of vertices is large. In this paper, we first propose a novel exact heuristic algorithm in best-first search way and then give two optimizing techniques to improve efficiency. Moreover, we propose an approximate heuristic algorithm in polynomial time for this problem over large graphs. We prove the ratio bound is 3 for our approximate algorithm. We confirm the efficiency of our algorithms by extensive experiments on real-life datasets. The experimental results validate that our algorithms always outperform the existing methods even though the size of graph or given set of vertices is large.
National Natural Science Foundation of China6140232361572353U1736103Opening Project of State Key Laboratory of Digital Publishing TechnologyAustralian Research Council DiscoveryDP1301030511. Introduction
Graph is an important complex network model to describe the relationship among various entities in real applications, including knowledge graph, RDF graph, linked data, social network, biological network, and traffic network [1–4]. Shortest path query is a basic problem on graph model. For example, in knowledge graphs, it is to find the closest connection between two entities or concepts; in social networks, it is to find the closest relationships such as friendship between two individuals; in traffic networks, it is to compute the shortest route between two locations.
Shortest path routing is an important problem in location-based services (LBS) and has been well studied in the past decades [5–7]. However, a special kind of shortest path query with vertex constraint is more and more important in real life. For instance, in knowledge graphs, a data miner is interested in investigating the closest relationship between two entities connected by some specified entities or concepts. In traffic networks, carpooling becomes a common business with the rapid development of sharing economy. A car driver may carry some fellows on the way home from company and the fellows are going to get down at distinct locations. Thus a critical problem is how to find a route with the minimum length passing through these locations. In above examples, both knowledge graph and traffic network can be modeled as a large graph G(V,E). The query of shortest path with vertex constraint can be defined as follows: given a starting vertex vs, an ending vertex ve, and a subset Vs⊆V, find a path with the minimum length among all the paths passing through every vi∈Vs from vs to ve. The subset Vs is called vertex constraint; that is, the shortest path must pass through every vertex in the subset Vs.
The above problem is a special case of Generalized Traveling Salesman Path (GTSP) problem [8], which is known to be NP-hard. In GTSP problem, all the vertices in G are partitioned into several categories. The objective is to find a path that visits at least one vertex for every category specified by user. For example, a tourist plans to travel through three kinds of locations, e.g., a coffee shop, a gas station, and a bank. Because he/she may have several choices for every location category, then it is necessary to find an optimal route for him/her. The basic idea of most existing works on GTSP problem is as follows: they first compute all permutations for given categories. Each permutation represents a class of path which has the same order of the categories. Next, for every permutation, these methods enumerate all possible paths from source to destination by concatenating the subpaths between vertices in two successive categories. Finally, they find the optimal one from these paths. In our problem, every vertex in G represents a category different to others. Thus these methods need to calculate all the permutations of the vertices to be visited, which incur too heavy computational consumption. However, most of these permutations are unnecessary for computing the shortest path. Therefore, the main challenge is how to avoid computing unnecessary permutations when finding the shortest path with vertex constraint. In this paper, we propose a novel efficient algorithm based on the best-first search to compute the shortest path with vertex constraint. The main idea of our method is to avoid calculating the unnecessary permutations as soon as possible. We also propose an approximate algorithm in polynomial time which is more efficient for large graphs. The contributions of this paper are summarized below.
We propose a novel and efficient exact heuristic algorithm with two optimizing techniques to find the shortest path with vertex constraint.
We also propose an approximate algorithm in polynomial time for our problem over large graphs. We prove the ratio bound of our approximate algorithm is 3.
We conduct extensive experiments on several real-life datasets. We compare our algorithms with the state-of-the-art methods. The experimental results validate the efficiency and effectiveness of our algorithms.
The rest of this paper is organized as follows. Section 2 gives the problem statement. Section 3 introduces the CH technique for preprocessing graphs. Section 4 proposes the best-first searching algorithm with two optimizing techniques. Section 5 proposes the approximate algorithm and analyzes the ratio bound. The experimental results are presented in Section 6. The related work is in Section 7. Finally, we conclude this paper in Section 8.
2. Problem Statement
An undirected weighted graph is denoted as G(V,E,w) (or G for short), where V={vi} is the set of vertices and E⊆V×V is the set of edges in G. w is a function that assigns a nonnegative weight wi,j on every edge (vi,vj)∈E; i.e., w((vi,vj))=wi,j. Note that (vi,vj) is equivalent to (vj,vi) because G is an undirected graph. The number of vertices (or edges) is denoted as V (or E) in G. A path p in G is a sequence of vertices; i.e., p=(v1,v2,…,vk), where every (vi,vi+1) is an edge in G for 1≤i≤k-1. The weight of path p, denoted as w(p), is the sum of the weights of all the edges in p; i.e., w(p)=∑1≤i≤k-1wi,i+1. We say a path p is simple if and only if there is no repeated vertex in p. The shortest path between vi and vj is a path with the minimum w(p) among all the paths between vi and vj. For simplicity, in the following, we use wi,j∗ to denote the weight of the shortest path between vi and vj in G.
In this paper, we study the problem of finding the shortest path with vertex constraint. Table 1 summarizes the symbols in this paper. We first give the definition below.
List of notations.
Symbol
Meaning
G(V,E,w)
An undirected weighted graph
vs,ve,Vs
Starting vertex, ending vertex, vertex constraint
wi,j,w(p)
Weight of edge (vi,vj), weight of path p
ps,e∗
Shortest path between vi and vj with vertex constraint
π
Permutation of a vertex subset
ps,e∗π
Shortest path between vs and ve under permutation π
Definition 1 (shortest path with vertex constraint).
Given a graph G, a vertex subset Vs⊆V, a starting vertex vs, and an ending vertex ve in G, a path is called the shortest path between vs and ve with vertex constraint of Vs, denoted as ps,e∗, if it satisfies the following two conditions: (1)ps,e∗ travels through all the vertices in Vs; i.e., v∈ps,e∗ for every vertex v∈Vs and (2)ps,e∗ is with the minimum weight among all the paths satisfying the condition (1).
Figure 1 illustrates an example of the shortest path with vertex constraint. In this example, Vs is {v3,v4,v5,v6} and these vertices are colored with yellow in Figure 1(b). Two gray vertices, v1 and v8, are the starting vertex and the ending vertex, respectively. Therefore, the shortest path between v1 and v8 with vertex constraint of Vs is p=(v1,v2,v3,v5,v4,v6,v8), which is shown as the green path in Figure 1(b).
An example of the shortest path with vertex constraint.
Undirected graph
Shortest path
Hamilton path problem is a special case of our problem; then, we have the following theorem straightforwardly.
Theorem 2.
The problem of finding the shortest path with vertex constraint over graphs is NP-hard.
Proof.
We proof it by reducing Hamilton path problem, which is NP-complete. Given a undirected graph G=(V,E,w), let vs and ve denote starting vertex and ending vertex, respectively. The weight of every edge in G is set as one. The vertex subset Vs⊆V is set as Vs=V\{vs,ve}. Obviously, there exists a Hamilton path from vs to ve in G if and only if the length is V-1 for the shortest path from vs to ve with vertex constraint of Vs. This reduction can be done in polynomial time. Therefore, the problem of finding the shortest path with vertex constraint over graphs is NP-hard.
3. CH Technique for Preprocessing Graphs
Contraction Hierarchies (CH) proposed in [9] is a well-known technique for speeding up the traditional shortest path query effectively. It essentially builds an index by maintaining the shortest paths for some pairs of vertices. In this paper, we use CH technique for preprocessing graphs to make our method more efficient.
Given a graph G(V,E,w), CH first sorts all vertices in an ascending order and then contracts the vertices one by one under this order. Contraction of vertex vi can be described as removing vi from a graph by adding new edges which represent the shortest path between two vertices adjacent to vi. Such edges are called shortcut edges. Specifically, for each pair of incoming edge (vj,vi) and outgoing edge (vi,vk) of vi, if (vj,vi,vk) is a unique shortest path, then a new shortcut edge (vj,vk) is added with weight wj,i+wi,k to obtain a new graph G′.
We use an example in Figure 2 to illustrate the process of vertex contraction. Figure 2(a) shows a graph before the contraction of v1. Note that there are two shortest paths between v4 and v5, which are (v4,v1,v5) and (v4,v2,v5), respectively. Thus it is unnecessary to add the edge from v4 to v5 when removing v1. We also note that there is only one shortest path from v3 to v4. Because this path goes through v1, a new edge from v3 to v4 can be constructed by removing v1. Similarly, a new edge from v3 to v5 also can be constructed. Both the weights of such two new edges are 2. The result graph after contraction of v1 is shown in Figure 2(b).
Contraction of vertex.
Before contraction
After contraction
After contracting vertices, CH divides G′ into an upward graph Gu and a downward graph Gd. The shortest paths can be calculated on Gu and Gd. Given a starting vertex vs and an ending vertex ve, a forward Dijkstra [10] search from vs and a backward Dijkstra search from ve are executed on Gu and Gd, respectively. The more details about CH technique are given in [9].
4. Permutation-Expanding Algorithm
In this section, we propose an algorithm to find the shortest path with vertex constraint. We first introduce the definition of permutation expanding, which is the basis of our algorithm, and then we explain the algorithm Permutation-Expanding. Two optimizing techniques are proposed in Section 4.3 and we analyze the time and space complexity of our algorithm in Section 4.4
4.1. Permutation-Expanding
Given a vertex subset Vs on G, Vs=r, a permutation π of Vs is a sequence v1v2⋯vr of all vertices in Vs, where every vi∈Vs and vi≠vj for 1≤i,j≤r, i≠j. Obviously, there are r! permutations for a given Vs. We use vi≺vj to denote that if vi is before vj in π, a permutation is essentially an order of the vertices in Vs. We say a path p is under a permutation π, denoted as pπ, if it satisfies the following two conditions: (1)vi∈p for every vi∈Vs and (2) there exists a subpath vi⇝vj from vi to vj if vi≺vj in π. Given a pπ, path p can be divided into r+1 subpaths v0⇝v1,v1⇝v2,…,vr⇝vr+1, where v0 and vr+1 are the starting vertex and the ending vertex of p, respectively. Each vi⇝vi+1(0≤i≤r) is called a “segment” of p. We use Spπ to denote the set of all the segments of p.
In the example of Figure 1, p=(v1,v2,v3,v5,v4,v6,v8) is a path under permutation π=v3v5v4v6. Here, Spπ={(v1,v2,v3),(v3,v5),(v5,v4),(v4,v6),(v6,v8)}.
A path is called the shortest path between vs and ve under permutation π, denoted as ps,e∗π, if every segment vi⇝vj∈Sps,e∗π is a shortest path. Then we have the following theorem.
Theorem 3.
Given an undirected graph G, a vertex subset Vs, a starting vertex vs, and an ending vertex ve in G, the shortest path ps,e∗ between vs and ve with vertex constraint of Vs is exactly ps,e∗π with the minimum weight among all the permutations of Vs; i.e., ps,e∗=min{ps,e∗π∣π∈Π(Vs)}, where Π(Vs) is the set of all permutations of Vs.
Proof.
Assuming that ps,eπ′ is a path under a permutation π′ from vs to ve and the weight of ps,eπ′ is less than that of ps,e∗π, then there will be the following four situations.
If π′ and π are the same permutations and every segment vi⇝vj∈Sps,e∗π′ is a shortest path, obviously ps,eπ′ and ps,e∗π have the same weight. This contradicts the assumption.
If π′ and π are the same permutations and not all of the segments vi⇝vj∈Sps,e∗π′ are shortest path, obviously the weight of ps,eπ′ is greater than that of ps,e∗π. This contradicts the assumption.
If π′ and π are different permutations and every segment vi⇝vj∈Sps,e∗π′ is a shortest path, because ps,e∗π is the path with the minimum weight among all the permutations, the weight of ps,e∗π is less than that of ps,eπ′. This contradicts the assumption.
If π′ and π are different permutations and not all of the segments vi⇝vj∈Sps,e∗π′ are shortest path, obviously the weight of ps,eπ′ is not smaller than that of ps,e∗π. This contradicts the assumption.
To sum up, ps,e∗π is the shortest path between vs and ve with vertex constraint of Vs.
For two vertex subsets Vs and Vs′ on G, if Vs⊆Vs′, for every permutation π of Vs, there must exist a permutation π′ of Vs′, such that vi≺vj in π′ if vi≺vj in π. π is a subpermutation of π′, denoted as π⊆π′. If Vs=r, π is also called a r-permutation of vertex set Vs. Specifically, π is called a prefix of π′, denoted as π⊆pπ′, if π⊆π′ and π is at the beginning of π′. For example, v3v4 is a prefix of v3v4v5v6.
Given a permutation π, π′=π⊕v is an expanded permutation with one vertex v from π, where ⊕ is the concatenation operator appending v at the end of π. Obviously, π⊆pπ′ and π′=π+1. This process is called permutation expanding. For the example in Figure 1, given a permutation π=v3v4, π′=v3v4v5 and π′′=v3v4v6 are two expanded permutations with one vertex v5 and v6, respectively.
4.2. Main Algorithm
We propose an algorithm, Permutation-Expanding, to find the shortest path with vertex constraint by expanding permutation incrementally. The main idea of the algorithm is essentially best-first searching on the shortest paths under 1-permutation to r-permutation of Vs as soon as possible, until the optimal one has been searched.
The pseudocode of Permutation-Expanding is shown in Algorithm 1. Algorithm 1 utilizes a min priority queue Q to maintain a set of tuples (π,w(π)) (line 1). π is a subpermutation of Vs. w(π) is the weight of the shortest path under π from vs to the last vertex of π. If π=v1⋯vk, then w(π)=w(ps,1∗)+∑i=1k-1w(pi,i+1∗). Here pi,j∗ represents the shortest path without vertex constraint and w(pi,j∗) can be easily calculated by CH technique as discussed in Section 3. Initially, Q only contains all the 1-permutations π of Vs with its w(π) (lines 2-3). Algorithm 1 dequeues (π,w(π)) iteratively according to w(π). In each iteration, a (π,w(π)) with the minimum w(π) is dequeued from Q (line 11). Let Vπ be the vertex set of π. If Vπ≠Vs, the algorithm generates every permutation π′ by appending every vertex v∈Vs-Vπ at the end of π and enqueues (π′,w(π′)) into Q. Otherwise, π is a permutation of Vs; Algorithm 1 generates π⊕ve and enqueues it into Q (lines 6-10). Algorithm 1 terminates when a permutation π⊕ve is dequeued for the first time, where π is a permutation of Vs (line 5). At this moment, w(π⊕ve) is the weight of the shortest path ps,e∗ with vertex constraint of Vs and we can obtain ps,e∗ by the CH technique (line 12). There is a special case that no path is between vs (or ve) and vi where vi∈Vs. Algorithm 1 can find such case by computing the shortest path between two vertices. For such case, we return no solution for this problem.
Algorithm 1: Permutation-Expanding (G,Vs,vs,ve).
Input:G,Vs,vs,ve.
Output:ps,e∗.
// Input: G: an undirected weighted graph
// Vs: a vertex subset of V
// vs, ve: starting vertex and ending vertex respectively
// Output: ps,e∗: the shortest path between vs and ve with vertex
// constraint of Vs
1: Let Q be a min priority queue with entries in the form (π,w(π)), sorted in ascending order of w(π);
2: for each vj∈Vsdo
3: Enqueue an entry (vj,w(ps,j∗)) into Q;
4: Dequeue the first entry (π,w(π)) from Q and let vi be the last vertex of π;
5: whilevi≠vedo
6: ifVπ≠Vsthen
7: for each vj∈Vs-Vπdo
8: Enqueue an entry (π⊕vj,w(π)+w(pi,j∗)) into Q;
9: else
10: Enqueue an entry (π⊕ve,w(π)+w(pi,e∗)) into Q;
11: Dequeue the first entry (π,w(π)) from Q and let vi be the last vertex of π;
12: Generate the shortest path ps,e∗ between vs and ve under a permutation π;
13: returnps,e∗;
Example 4.
Given a graph G shown in Figure 1(a), let vs=v1, ve=v8, and Vs={v4,v6}. Algorithm 1 first enqueues (v4,4) and (v6,7) into Q and then dequeues the first entry (v4,4) from Q. (v4v6,7) is enqueued into Q. Then the entry (v6,7) is dequeued from Q. (v6v4,10) is enqueued into Q. Then the entry (v4v6,7) is dequeued from Q. Due to Vπ=Vs where π=v4v6, (v4v6v8,8) is enqueued into Q. Then the entry (v4v6v8,8) is dequeued from Q. Due to the fact that the last vertex of π is the ending vertex v8, where π=v4v6v8, Algorithm 1 returns ps,e∗=(v1,v2,v3,v4,v6,v8) as the shortest path with vertex constraint of Vs.
4.3. Optimizing Techniques
We give two optimizing techniques to improve the efficiency of Permutation-Expanding algorithm.
Cache Mechanism. Given two different permutations π and π′, there may exist the overlapping segments for the shortest paths under π and π′. The weights of these overlapping shortest subpaths are unnecessary to be calculated for many times during the permutation expanding. Cache Mechanism is utilized to maintain these values. For the example in Figure 1(a), v1 and v8 are the starting and ending vertices, respectively, and Vs={v3,v4,v5,v6} is the vertex constraint. Let π=v3v4v5v6 and π′=v6v4v5v3. Obviously, π and π′ are two permutations of Vs. When calculating the shortest path between v4 and v5 for the first time, the distance between v4 and v5 is maintained and it only needs to be calculated once when π and π′ are both expanded in Permutation-Expanding. The experimental results validate that Cache Mechanism can avoid redundant calculation effectively.
Permutation Filtering. When a permutation π is dequeued from Q in an iteration, Permutation-Expanding generates all expanded permutations π′=π⊕v by appending every vertex v∈Vs-Vπ at the end of π. Note that it is unnecessary to enqueue every π′ into Q in this iteration. For two expanded permutations πi′=π⊕vi and πj′=π⊕vj, vi≠vj, if the shortest path ps,i∗ under πi′ between vs and vi is a subpath of ps,j∗, then permutation πj′ can be filtered and it does not need to be enqueued into Q. The following theorem guarantees the correctness of permutation filtering.
Theorem 5.
For two expanded permutations πi′=π⊕vi and πj′=π⊕vj, vi≠vj, if the shortest path ps,i∗ under πi′ from vs to vi is a subpath of ps,j∗, then for any permutation πj′′ of Vs, πj′⊆pπj′′, there exists a permutation πi′′ of Vs, πi′⊆pπi′′, such that the weight of the shortest path under πj′′ from vs to ve must not be less than the weight of the shortest path under πi′′ from vs to ve.
Proof.
Given a shortest path p∗ under πj′′ from vs to ve, ps,j∗ is obvious a prefix subpath of p∗. Let pi,j∗ denote the shortest path from vi to vj. Consider the path ps,i∗⊕pi,j∗ obtained by concatenating ps,i∗ and pi,j∗, because ps,i∗ is a subpath of ps,j∗, then we have w(ps,i∗⊕pi,j∗)≤w(ps,j∗), where w(p) represents the weight of path p. Next, we consider the subpath vj⇝ve of p∗; vj⇝ve must go through vi. Let vi- and vi+ represent the precursor and successor of vi in subpath vj⇝ve. A new path pj,e∗ can be obtained by utilizing the shortest path from vi- to vi+ to replace the part vi-→vi→vi+ in vj⇝ve. It is obvious that w(pj,e∗)≤w(vj⇝ve). We concatenate ps,i∗, pi,j∗, and pj,e∗ to get a path p′ from vs to ve. Obviously, p′ is a path under a permutation πi′′ of Vs and we have w(p′)≤w(p∗). Theorem 5 has been proved.
The conclusion of Theorem 5 is obvious. For the example in Figure 1, let vs=v1, ve=v8, and Vs={v4,v6}. We consider two permutations v1v4 and v1v6, which are expanded from v1. The shortest paths under v1v4 and v1v6 are p=(v1,v2,v3,v4) and p′=(v1,v2,v3,v4,v6), respectively. Because p is a subpath of p′, v1v6 does not need to be enqueued into Q in the iteration when π=v1 is dequeued from Q. The reason is that all the paths under the permutations expanded from v1v6 cannot be the shortest path with vertex constraint.
4.4. Complexity Analysis
In this section, we analyze the complexity of Algorithm 1. We first analyze the time complexity and then analyze the space complexity.
Time Complexity. Because Algorithm 1 may calculate the shortest path for every two vertices in Vs in the worst case, it needs at most (r+1)(r+2) calculations for the shortest paths, where r=Vs. For each shortest path calculation, CH runs in O(nlogn+m) time where n=V and m=E. In addition, at most r! permutations of Vs may be created and every permutation is maintained as a tuple which can be done in O(1) time. Therefore, Algorithm 1 runs in O(r2(nlogn+m)+r!) time. It is worth noting that r is always far less than n in real applications.
Space Complexity. Algorithm 1 mainly needs to maintain the expanded permutations and expand at most r! permutations. Therefore, the space complexity of Algorithm 1 is O(r!).
5. Approximate-Path Algorithm
In this section, we propose an approximate algorithm Approximate-Path to find the shortest path with vertex constraint in polynomial time. In the following, we first define query graph and then explain our approximate algorithm in detail. Next, we prove that the ratio bound of our approximate algorithm is 3. Finally, we analyze the time and space complexity of Approximate-Path.
Given a graph G, a vertex subset Vs⊆V, a starting vertex vs, and an ending vertex ve in G, a query graph Gq(Vq,Eq) is a complete graph on Vq, where Vq=Vs∪{vs,ve}, Eq={(vi,vj)∣vi,vj∈Vq,vi≠vj}. In Gq, the weight wi,j of every edge (vi,vj) is the shortest distance wi,j∗ between vi and vj in G. Here, wi,j∗ is weight of the shortest path between vi and vj without vertex constraint in G.
The following theorem indicates that we only need to find the shortest path with vertex constraint over Gq.
Theorem 6.
It is identical for the weight of the shortest path between vs and ve with vertex constraint of Vs in G and Gq.
The main idea of Approximate-Path is as follows. We first compute the minimum spanning tree T of Gq and then “adjust” some edges in T such that T is converted into a path p satisfying the vertex constraint. The pseudocode of Approximate-Path is shown in Algorithm 2. In Algorithm 2, the minimum spanning tree T of Gq is first generated in a similar way as Prim Algorithm [11] (lines 1-14). Next, Algorithm 2 executes a preorder traversal on T and then we have a permutation π corresponding to the order of vertices in such preorder traversal on T (line 15). Note that in π the ending vertex ve may not be the last one. In this case, ve is put into the end of π and we get a new permutation π′ (line 16). Finally, Algorithm 2 returns the shortest path under permutation π′ as a result (lines 17-18), which is an approximate solution for our problem.
Algorithm 2: Approximate-Path (G,Vs,vs,ve).
Input:G,Vs,vs,ve.
Output:p.
// Input: G: an undirected weighted graph
// Vs: a vertex subset of V
// vs, ve: starting vertex and ending vertex respectively
// Output: p: the approximate shortest path between vs and ve
// with vertexconstraint of Vs
1: Let Q be a min priority queue with the entries in the form vi,vj,wi,j∗, sorted in the ascending order of
wi,j∗, where wi,j∗ is the shortest distance between vi and vj;
2: Vq←Vs∪vs,ve,m←Vq;
3: for each vk∈Vq-vsdo
4: Enqueue an entry vs,vk,ws,k∗ into Q;
5: ET←∅, VT←vs;
6: whileVT≠Vqdo
7: Dequeue the first entry vi,vj,wi,j∗ from Q;
8: ifvj∈VTthen
9: continue;
10: else
11: VT←VT∪vj, ET=ET∪vi,vj;
12: for each vk∈Vq-VTdo
13: Enqueue an entry vj,vk,wj,k∗ into Q;
14: T←(VT,ET);
15: Traverse T by preorder and let π=v1v2⋯vm be a permutation corresponding to the order of
vertices in preorder traversal on T;
16: Move the ending vertex ve to the end of π to get π′=v1′v2′⋯vm′;
17: Generate the shortest path p between vs and ve under a permutation π′;
18: returnp;
Example 7.
Figures 3(a) and 3(b) show the query graph Gq and the minimum spanning tree T of Gq, respectively. Let π=v1v3v4v5v8v6 be a permutation corresponding to the preorder traversal on T shown in Figure 3(c). Then Approximate-Path removes the ending vertex v8 to the end of π to get π′=v1v3v4v5v6v8. The path between v1 and v8 under π′ in Gq is (v1,v3,v4,v5,v6,v8) shown in Figure 3(d), and its weight is 12. The shortest path with vertex constraint Vs for the input graph is shown in Figure 1(b) and its weight is 10.
An example of an approximate path.
Query graph Gq
Minimum spanning tree T
Preorder traversal of T
Approximate path
Next, we prove that Approximate-Path is a 3-approximation algorithm for shortest path problem with vertex constraint.
Theorem 8.
Approximate-Path is a 3-approximation algorithm for finding the shortest path with vertex constraint.
Proof.
Let ps,e∗ denote a shortest path with vertex constraint of Vs in Gq. Obviously, ps,e∗ is a spanning tree of Gq. Therefore, the weight of the minimum spanning tree T of Gq, computed by Approximate-Path, provides a lower bound on the weight of ps,e∗:(1)wT≤wps,e∗
The preorder traversal π of T is essentially a vertex permutation of Vq. Let π=v1v2⋯vVq, where vi∈Vq for 1≤i≤Vq. We use pT,π to denote a path on T under permutation π. Note that pT,π may not be a simple path and every edge in pT,π appears at most twice. For the example in Figure 3, π=v1v3v4v5v8v6 and its pT,π=(v1,v3,v4,v3,v5,v8,v6). Here, the edge (v3,v4) (or (v4,v3)) appears twice in pT,π. Because pT,π travels through every edge in T at most twice, then we have(2)wpT,π≤2wTBased on inequality (1) and equation (2), we have(3)wpT,π≤2wps,e∗
Because Gq is a complete graph, we can generate a simple path pGq,π on Gq under permutation π. Note that if π=v1v2⋯vVq, then pGq,π=(v1,v2,…,vVq) and every (vi,vi+1) is an edge in Gq for 1≤i≤Vq-1. Additionally, the weight of every edge (vi,vj) in Gq is equal to the weight of the shortest path between vi and vj in G; thus, the weight of edge (vi,vj) cannot be larger than the weight of subpath between vi and vj in pT,π. It means(4)wpGq,π≤wpT,πGiven the permutation π of preorder traversal of T, Algorithm 2 obtains another permutation π′ by removing the ending vertex ve to the end of π. For the last two vertices vVq and ve of π′, if (vVq,ve) is an edge in T, its weight must be less than the weight of T. Otherwise, there must exist a simple path between vVq and ve in T and its weight cannot be less than the shortest distance between vVq and ve. Therefore, for both two cases, wVq,e∗≤w(T) and then we have(5)wpGq,π′≤wpGq,π+wT≤3wT≤3wps,e∗
Because w(pGq,π′) is exactly the weight of the approximate shortest path returned by Algorithm 2, then the proof is completed.
Complexity Analysis. We first analyze the time complexity for Algorithm 2. In order to construct the minimum spanning tree of Gq, we utilize the CH technique to calculate the weight of shortest path between any two vertices in Vs. It needs O(r2(nlogn+m)) time, where n=V, m=E, and r=Vs, then the time complexity of Algorithm 2 is O(r2(nlogn+m)). In order to construct the minimum spanning tree, Algorithm 2 needs to maintain the weight of shortest path for any two vertices in Vs, then the space complexity of Algorithm 2 is O(r2).
6. Experiments
This section experimentally evaluates our algorithms against the current state-of-the-art methods. Section 6.1 explains the experimental settings. Section 6.2 presents the performance of algorithms.
6.1. Experimental Settings
All methods are implemented in C++ and tested on a Linux machine with an Intel(R) Core(TM) i7-4770K and 32GB RAM. We repeat each experiment 100 times and report the average result. If a method requires more than 24 hours or more than 32GB RAM to preprocess a dataset D, we omit the method from the experiments on D.
Datasets. We test 4 real road networks from the 9th DIMACS Implementation Challenge (http://www.dis.uniroma1.it/challenge9/index.shtml) and an email network (http://snap.stanford.edu/data/) as shown in Table 2. For each graph, each vertex represents a road junction and each edge represents a road segment. Table 2 describes the properties of the datasets, where V, E, and d are the number of vertices, the number of edges in the road network, and the average degree of vertex, respectively. The full name of each road network is shown in description.
Datasets.
Dataset
V
E
d
Description
NY
264,346
733,846
2.78
New York City Road Network
BAY
321,270
800,172
2.49
San Francisco Bay Area Road Network
COL
435,666
1,057,066
2.43
Colorado Road Network
FLA
1,070,376
2,712,798
2.53
Florida Road Network
EMAIL
265,214
420,045
1.25
Email network of EU research institution
Query Set. In this paper, we investigate the query efficiency by varying the size of the vertex constraint. The size of the vertex constraint is the number of vertices in Vs. We test 15 kinds of query sets Q1 to Q15, where every query set is a set of queries with an appropriate size of Vs. For each query set, we test 100 random queries and report the average querying time and space consumption as the results for the current query set. Specifically, the sizes of Vs for Q1-Q5 are 4,5,6,7,8, respectively, and the sizes of Vs for Q6-Q10 are 12,14,16,18,20, respectively. The starting and ending vertex for every query are additionally selected in random way. Q11-Q15 are generated as follows. We first randomly select 500 pairs of the starting vertex vs and the ending vertex ve and then calculate distance for every pair of vs and ve. We sort these distances in ascending order and generate Q11-Q15 by dividing these pairs of vs and ve into five query sets. For example, Q11 represents the queries for the pairs of vs and ve whose distances are in the top 100, and so on. For each query, we randomly select six vertices as Vs; that is, the size of Vs is 6.
For a query, if the starting vertex and ending vertex are the same, we call this starting-to-starting query (STS query); otherwise, we call this starting-to-ending query (STE query). In this paper, we present the experimental results of our algorithms for both STS query and STE query.
Compared Methods. For each experiment, we compare Permutation-Expanding (PE) and Approximate-Path (AP) against three algorithms which are unidirectional Dijkstra Search (U.Dijkstra) [8], Level-Sweeping Search (LESS) [8], and Nearest Neighbor Algorithm (ANN) [12]. We use CH technique to preprocess the input graphs. The first two compared algorithms are exact algorithms and the last one is an approximation algorithm. The other methods are not included in our comparison for the following reasons: (1) INC [13] computes a simple path which does not contain repeated vertex; however, we do not require a simple path in this problem and (2) P-LESS [8] is an optimization algorithm of LESS and mainly achieves the size of search space which typically grows in size proportional to the density of category. When each category contains only one vertex, P-LESS is equivalent to LESS.
6.2. Experimental Results
Exp-1. Query Efficiency. We investigate the impact of the size of Vs and show the experimental results of STE query in Figure 4(a). On each dataset, we find that U.Dijkstra has the largest querying time for every query. PE outperforms LESS by large margins depending on the size of Vs for each dataset and their maximum difference is close to two orders of magnitude. The reason is that LESS calculates all the permutations of Vs. In contrast, PE finds the shortest path with vertex constraint by expanding permutation incrementally, which can avoid calculating the unnecessary permutations as soon as possible. We can see that PE begins to degrade as the size of graph increases. Despite this degradation, it only requires no more than 3 seconds in the worst case (for Q5 on FLA).
Query efficiency on Q1-Q5.
STE query
STS query
For each dataset, we find that AP has the minimum time cost than the other algorithms on every query. Specifically, AP outperforms ANN by one order of magnitude. When the size of Vs is small, our exact algorithm PE runs less time than the approximate algorithm ANN, and AP answers these queries in subsecond time. We find the querying times of ANN and AP are nonsensitive to the size of Vs in Figure 4(a).
As shown in Figure 4(b), the query efficiency of STS query is similar to STE query. PE is better than the other exact algorithms and AP has the minimum time cost than the other algorithms on every query. For the same size of Vs and dataset, the querying time of STE query is less than that of STS query. The reason is that given a starting vertex, PE uses best-first searching on the shortest paths under 1-permutation to r-permutation of Vs as soon as possible, until the optimal one has been searched out. PE gradually expands the path, and finally each vertex in Vs will be arranged according to its shortest distance from the starting vertex. However, STS query eventually returns to the starting vertex, so it will generate more permutations than STE query, which increases the running time of the algorithm.
When the size of Vs becomes large, for Q6-Q10 query, because the runtime of the exact algorithms is too long, here we only compare the query efficiency of the approximate algorithms. Figure 5 shows the results of these queries. We find the performance of AP is also better than ANN by an order of magnitude and the querying time of AP does not exceed 2 seconds in the worst case for both STE query and STS query.
Query efficiency on Q6-Q10.
STE query
STS query
Q11-Q15 has the same size of Vs and the query time is shown in Figure 6. As the distance between the starting vertex and the ending vertex increases, the time required for the query does not increase. This shows that the time required for the query is not related to the distance between the starting vertex and the ending vertex but is only related to the size of Vs and the scale of the graph. For PE and AP algorithms, they find the shortest path with vertex constraint by expanding permutation incrementally, which can avoid calculating the unnecessary permutations as soon as possible. Moreover, AP can quickly give a solution to the problem by using the query graph. Therefore, AP and PE are more efficient than the other algorithms.
Query efficiency on Q11-Q15.
Figure 7 shows the space consumption of our algorithms on Q1-Q5. We can find that the space consumptions of STE query and STS query are nearly the same on every dataset. For every dataset, U.Dijkstra has the largest space consumption. PE has the smallest space consumption among all the exact algorithms and ANN has the smallest space consumption among all algorithms. Because ANN only needs to calculate the Vs+1 shortest subpaths and does not save any intermediate calculation results, it has less space consumption than AP. Note that our approximation algorithm is with the least space consumption except ANN.
Space consumption on Q1-Q5.
STE query
STS query
Exp-2. Effectiveness of Optimizing Techniques. For PE, we design two optimizing techniques. The optimizing effectiveness of PE is shown in Figure 8. The speedup ratio is the ratio of the query times of using optimizing techniques and without optimizing techniques. We can see that the optimizing techniques can greatly reduce the query time. Figure 8(a) shows the effectiveness of optimizing techniques on STE query. The results show that the efficiency of PE can be increased by several times through optimizing techniques depending on the size of Vs for each dataset. In addition to COL, with the increase of the size of Vs, the ratio of speedup is also increasing. For COL, due to its larger diameter but narrower width, which means that the traffic network is in strip sharp, PE can have better performance even without any optimizing technique. Consider an extreme case, when the network degenerates into a line, PE also can achieve the best performance without any optimizing technique. Of course, this kind of network is very rare in real life. Figure 8(b) shows the ratio of speedup on STS query. Since STS query needs to calculate more permutations than STE query, the ratio of speedup on STS query is relatively small.
Optimizing effectiveness.
STE query
STS query
Exp-3. Relative Error. The relative error is Wa-Wo/Wo, where Wa and Wo are the weights of approximation solution and optimal solution, respectively. For every query in this group of experiments, we first use PE to calculate the optimal result, and then use ANN and AP to calculate the approximate result. Figure 9 shows the relative errors of those two approximation algorithms on the different datasets. For STE query, the relative errors in the two datasets NY and FLA are not much different. For datasets BAY and COL, the relative errors of ANN are lower than that of AP. With the increasing of the size of Vs, the relative errors of both algorithms gradually increase. In all datasets, the relative errors of AP do not exceed 25%. However, for STS query, the relative error is relatively smaller than STE query and the relative errors of AP do not exceed 15%. For dataset FLA, the relative errors of AP are lower than that of ANN.
Relative error on Q1-Q5.
STE query
STS query
7. Related Work
In this section, we introduce existing works and categorize them as follows.
Traveling Salesman Problem (TSP). The traveling salesman problem is a very classic graph theory problem. So far, there are many algorithms to solve this problem, including exact and approximate algorithms [14]. TSP can be transformed into a linear programming problem and solved by some methods for solving linear programming [15–17]. Dorigo [18] solves TSP problem using ant colony algorithm. In this work, ants of the artificial colony generate pheromones on the edges of the graph. As the pheromone accumulates, the path formed by the pheromone trail produces a shorter feasible solution of TSP. As time progresses, the amount of pheromone in the shorter path gradually increases. The shorter the path, the more the pheromone deposited on it. There are also some approximate algorithms that can quickly give a better solution to the TSP problem [19–21]. However, TSP is a special case of the problem we studied in this paper. All the methods for TSP cannot solve our problem when Vs≠V. Additionally, these methods cannot be used for large graphs.
Generalized Traveling Salesman Problem (GTSP). The Generalized Traveling Salesman Problem is a variant of the classical Traveling Salesman Problem. It was first introduced in the late 1960s [22]. There are some exact algorithms to solve the GTSP [23–25]. Specifically, a salesman travels in n cities (each city can only be visited for one time) and has to eventually return to the starting city. Under the conditions that the distances between n cities are given and the traveling route meets certain constraints (for example, if a salesman would like to visit city 1, he/she must ensure that he/she has visited city 2 and city 3), an optimal traveling route can be explored known as Traveling Salesman Problem with Precedence Constraint (TSPPC). Ascheuer et al. [26] proposes an algorithm based on branch cut to solve the asymmetric traveling salesman problem with constraints. Moon et al. [27] and Wang et al. [28] solve the traveling salesman problem with constraints by genetic algorithm and integer programming, respectively. The Hamiltonian path problem with precedence constraints is also known as the sequential ordering problem, which can be described as finding the shortest path between the specified starting point and the specified ending point, which passes through every point once and satisfies the sequence constraints. Karan et al. [29] proposes an algorithm based on the branch boundary method to solve the sequential ordering problem. The existing algorithms for solving GTSP are essentially exhaustive for each possible path and cannot be applied to large graphs. Our algorithm can be applied to large graphs very well.
Trip Planning Query (TPQ). All vertices in a graph are divided into groups, each representing a category. Trip Planning Query is to find a minimum-cost route where, for each given category, at least one vertex should be contained. Li et al. [12] introduce four algorithms for answering TPQ; these algorithms achieve various approximation ratios with respect to m and ρ. m is the size of categories and ρ is the maximum cardinality of any category. Our algorithm is a 3-approximation algorithm and the ratio bound is lower than that of the algorithm in [12]. Rice et al. [8] present two exact algorithms to solve this problem. These algorithms use an exhaustive way to search for the optimal path, which adds a lot of unnecessary calculations and greatly increases the running time of the algorithms. Hars et al. [13] propose a heuristic algorithm that follows the divide-and-conquer approach to compute a simple path which passes through all vertices specified by user. The original question is divided into two subquestions and the algorithm consists of two main steps: (1) for a given set of must-visited vertices and the corresponding visited order, consider each pair of consecutive vertices represent a subpath of the entire end-to-end path, and then calculate all candidate subpaths; (2) concatenate candidate subpaths, one from each pair of consecutive vertices, in order to establish a simple path from starting vertex to ending vertex. Since the path we are finding does not require a simple path, the algorithm does not apply to our problem. Cao et al. [30] introduce some algorithms for solving Keyword-aware Optimal Route (KOR) queries. A KOR query adds a cost constraint based on the category constraint,; that is, the optimal path returned should satisfy the user-specified cost budget. Shang et al. [31] propose and study a novel problem for dynamically monitoring the shortest path in spatial network, with the aim of accelerating the shortest path computation in a dynamic spatial network. Shang et al. [32] design an exact algorithm and an approximation algorithm to solve Collective Travel Planning query problem. The query finds the lowest cost route connecting multiple sources and a destination with up to k meeting points.
8. Conclusion
To find the shortest path with vertex constraint, we propose an exact algorithm named Permutation-Expanding and give two optimizing techniques to improve its efficiency. Moreover, we also propose an approximate algorithm named Approximate-Path in polynomial time for this problem over large graphs. We conduct extensive experiments on real-life datasets and compare our algorithms with the state-of-the-art methods. The experimental results validate that our algorithms always outperform the existing methods even though the size of graph or given set of vertices is large. In the future work, we will study the index techniques to facilitate the queries such that our algorithms are more time and space efficient on the larger graphs.
Data Availability
The road network datasets used to support the findings of this study are included within the article. They can be downloaded from http://www.dis.uniroma1.it/challenge9/index.shtml.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work is supported by the grants of the National Natural Science Foundation of China nos. 61402323, 61572353, and U1736103, the Opening Project of State Key Laboratory of Digital Publishing Technology, and the Australian Research Council Discovery Grant DP130103051.
TanZ.ZhaoX.FangY.XiaoW.GTrans: generic knowledge graph embedding via multi-state entities and dynamic relation spaces20186998232824410.1109/ACCESS.2018.27978762-s2.0-85040969196YuanW.HeK.GuanD.HanG.Edge-dual graph preserving sign prediction for signed social networksvol. PP, no. 99, pp. 1–1, 2017ZhangJ.YangH.SongH.ZhangY.An improved archaeology algorithm based on integrated multi-source biological information for yeast protein interaction network2017599158931590010.1109/ACCESS.2017.26906642-s2.0-85028516307ChenY.-C.LeeC.Skyline path queries with aggregate attributes20164469047062-s2.0-8499169847910.1109/ACCESS.2016.2602702YuanY.ChenL.WangG.Efficiently answering probability threshold-based shortest path queries over uncertain graphs2010155170YuanY.LianX.ChenL.SunY.WangG.RSkNN: kNN search on road networks by incorporating social influence20162861575158810.1109/TKDE.2016.2518692CaoX.ChenL.GaoC.XiaoX.Keyword-aware optimal route search201251111361147RiceM. N.TsotrasV. J.Exact graph search algorithms for generalized traveling salesman path problemsInternational Symposium on Experimental Algorithms2012Springer344355GeisbergerR.SandersP.SchultesD.DellingD.Contraction hierarchies: Faster and simpler hierarchical routing in road networksInternational Workshop on Experimental and Efficient Algorithms2008Springer319333DijkstraE. W.A note on two problems in connexion with graphs19591269271MR010760910.1007/BF01386390Zbl0092.160022-s2.0-34147120474PrimR. C.Shortest connection networks and some generalizations19573661389140110.1002/j.1538-7305.1957.tb01515.xLiF.ChengD.HadjieleftheriouM.KolliosG.TengS.On trip planning queries in spatial databasesInternational Symposium on Spatial and Temporal Databases2005Springer273290VardhanH.BillenahalliS.HuangW.RazoM.SivasankaranA.TangL.MontiP.TaccaM.FumagalliA.Finding a simple path with multiple must-include nodesModeling, Analysis & Simulation of Computer and Telecommunication Systems MASCOTS’092009IEEE International Symposium on. IEEE13LuY.BenlicU.WuQ.A hybrid dynamic programming and memetic algorithm to the traveling salesman problem with hotel selection20189019320710.1016/j.cor.2017.09.0082-s2.0-85030110626ChekuriC.QuanrudK.2018DiabyM.KarwanM. H.Advances in combinatorial optimization: Linear programming formulations of the traveling salesman and other hard combinatorial optimization problems2016Zbl1371.90002ChaudhuriA.DeK.Fuzzy multi-objective linear programming for traveling salesman problem201146470MahiM.BaykanÖ. K.KodazH.A new hybrid method based on particle swarm optimization, ant colony optimization and 3-Opt algorithms for traveling salesman problem20153048449010.1016/j.asoc.2015.01.0682-s2.0-84923552457WangY.YinP. Y.An approximate algorithm for triangle tsp with a four-vertex-three-line inequality2015613546DebS.FongS.TianZ.WongR. K.MohammedS.FiaidhiJ.Finding approximate solutions of NP-hard optimization and TSP problems using elephant search algorithm2016721013310.1007/s11227-016-1739-22-s2.0-84976626176ZhangG.GheorgheM.ChengJ.An approximate algorithm combining p systems and ant colony optimization for taraveling salesman problems2010321340SrivastavaS. S. S.KumarS.GargR. C.SenP.Generalized traveling salesman problem through n sets of nodes1969TasgetirenM. F.SuganthanP. N.PanQ. K.An ensemble of discrete differential evolution algorithms for solving the generalized traveling salesman problem201021593356336810.1016/j.amc.2009.10.027MR2576824Zbl1183.650712-s2.0-71649084635TasgetirenM. F.SuganthanP. N.PanQ.A discrete particle swarm optimization algorithm for the generalized traveling salesman problemProceedings of the 9th Annual Genetic and Evolutionary Computation Conference (GECCO '07)July 2007New York, NY, USA15816710.1145/1276958.12769802-s2.0-34548086557RiceM. N.TsotrasV. J.Engineering generalized shortest path queriesProceedings of the 2013 29th IEEE International Conference on Data Engineering (ICDE 2013)April 201394996010.1109/ICDE.2013.6544888AscheuerN.JüngerM.ReineltG.A branch & cut algorithm for the asymmetric traveling salesman problem with precedence constraints200017161842-s2.0-17544398961MoonC.KimJ.ChoiG.SeoY.An efficient genetic algorithm for the traveling salesman problem with precedence constraints2002140360661710.1016/S0377-2217(01)00227-2MR1904168Zbl0998.900662-s2.0-0036680981WangX.ReganA. C.The traveling salesman problem with separation requirements20021405KaranM.Skorin-KapovN.A branch and bound algorithm for the sequential ordering problemMIPRO, 2011 Proceedings of the 34th International Convention2011IEEE452457CaoX.ChenL.CongG.GuanJ.PhanN.XiaoX.KORS: Keyword-aware optimal route search system29th IEEE International Conference on Data Engineering, ICDE 2013April 2013Brisbane, Australia13401343ShangS.ChenL.WeiZ.-W.GuoD.-H.WenJ.-R.Dynamic shortest path monitoring in spatial networks201631463764810.1007/s11390-016-1653-32-s2.0-84978173614ShangS.ChenL.WeiZ.JensenC. S.WenJ.-R.KalnisP.Collective travel planning in spatial networks20162851132114610.1109/TKDE.2015.25099982-s2.0-84963831078