Spider Covers and Their Applications

We introduce two new combinatorial optimization problems: the Maximum Spider Problem and the Spider Cover Problem; we study their approximability and illustrate their applications. In these problems we are given a directed graph , a distinguished vertex , and a family D of subsets of vertices. A spider centered at vertex s is a collection of arc-disjoint paths all starting at s but ending into pairwise distinct vertices. We say that a spider covers a subset of vertices X if at least one of the endpoints of the paths constituting the spider other than s belongs to X. In the Maximum Spider Problem the goal is to find a spider centered at s that covers the maximum number of elements of the family D. Conversely, the Spider Cover Problem consists of finding the minimum number of spiders centered at s that covers all subsets in D. We motivate the study of the Maximum Spider and Spider Cover Problems by pointing out a variety of applications. We show that a natural greedy algorithm gives a 2-approximation algorithm for the Maximum Spider Problem and a -approximation algorithm for the Spider Cover Problem.


Introduction
Given a digraph G V, E and a vertex s ∈ V , a spider centered at s is a subgraph S of G consisting of arc-disjoint paths sharing the initial vertex s and ending into pairwise distinct vertices. The vertex s is called the center of the spider. The endpoints of the paths composing the spider S-other than the center s-are called the terminals of the spider. In other words, a spider is a subdivision of K 1,m , where m is the number of terminals. Given a spider S, we say that S reaches a vertex x ∈ V if x is a terminal of S; we say that the spider S covers a subset of vertices D ⊆ V if S reaches at least a vertex in D.

ISRN Discrete Mathematics
In this paper we consider the approximability of the following problems.

Maximum Spider Problem (MSP)
We are given a digraph G V, E , a distinguished node s, and a family D ⊆ 2 V \{s} of subsets of vertices. The objective is to find a spider S centered at s such that the number of subsets D ∈ D covered by S is maximum among all possible spiders centered at s.
We also consider the related minimization problem, where one wants to cover all the elements of D.

Spider Cover Problem (SCP)
As before, we are given a digraph G V, E , a distinguished vertex s ∈ V , and a family D ∈ 2 V \{s} of subsets of vertices. The goal is to find a minimum cardinality collection of spiders centered at s such that each subset D ∈ D is covered by at least a spider in the collection.

Motivations
The Maximum Spider and the Spider Cover Problems are far reaching generalizations and unifications of several Maximum Coverage and Set Cover Problems which, in turn, are fundamental algorithmic and combinatorial problems that arise frequently in a variety of settings 3 . To start, recall that in the basic formulation of the Maximum Coverage Problem 3 , one is given a ground set X, a collection of sets S {S 1 , S 2 , . . . , S m }, where each S j ⊆ X, for j 1, . . . , m, and an integer k. The goal is to find ≤ k sets S i 1 , . . . , S i such that the cardinality |∪ j 1 S i j | of their union is maximum. To see that the Maximum Coverage Problem is a very particular case of the Maximum Spider Problem, let us consider the digraph G V, E of Figure 1, with node set V {s, x 1 , . . . , x k , S 1 , . . . , S m }. The vertex s is connected to each of the nodes x 1 , . . . , x k , and each x i is connected to every S j , for i 1, . . . , k and j 1, . . . , m.
One can see that the Maximum Spider Problem in G is equivalent to the Maximum Coverage Problem on the original instance X, S, and k. To that purpose, let us proceed as follows. Let S be a spider in G that covers a maximum number μ of subsets D ∈ D. Let D u 1 , . . . , D u μ be these subsets. By our definition of spider cover, the at most k terminals of S in G correspond to some S i 1 , . . . , S i , ≤ k, such that for any D u t ∈ {D u 1 , . . . , D u μ } there exists S i j ∈ {S i 1 , . . . , S i } for which S i j ∈ D u t . This implies that for any u t ∈ {u 1 , . . . , u μ } there exists S i j such that u t ∈ S i j , consequently ∪ j 1 S i j ⊇ {u 1 , . . . , u μ } and |∪ j 1 S i j | ≥ μ. Conversely, let S i 1 , . . . , S i , ≤ k, be a solution to the Maximum Coverage Problem on the original instance X, S, and k. Let ∪ j 1 S i j {u 1 , . . . , u μ }. Consider now the spider s in G starting at s and having terminal nodes equal to S i 1 , . . . , S i . By definition, spider S covers at least the μ subsets D u 1 , . . . , D u μ .
Thus, the Maximum Coverage Problem corresponds to the Maximum Spider Problem in a very simple digraph G. By allowing more flexibility in the structure of G, one can describe many more combinatorial optimization problems in this framework. For instance, Chekuri and Kumar in 4 considered the following generalization of Maximum Coverage.

Maximum Coverage with Group Budget Constraints (MCG) (see [4])
We are given a ground set X and a collection S {S 1 , . . . , S m } of subsets of X. We are also given sets G 1 , . . . , G , each G i ⊆ S {S 1 , . . . , S m }, with G i ∩G j ∅ for i / j, and integer bounds k, k 1 , . . . , k . A solution is a subset H ⊆ {S 1 , . . . , S m }, such that |H| ≤ k and |H ∩ G i | ≤ k i , for i 1, . . . , . The goal is to find a solution H such that | H∈H H| is maximized.
Before showing how MGC easily fits into our scenario, let us mention that the MGC problem itself was introduced and studied in 4 since it represents a useful generalization of several combinatorial optimization problems, like the multiple depot k-traveling repairmen problem with covering constraints 5 and the orienteering problem with time windows 6-8 .
Given an instance . . , and, in case i 1 k i < k, there is a complete bipartite graph between {x 1 , . . . , x k } and S \ i 1 G i . As before, the family D is defined as consisting of subsets of vertices D u {S i : u ∈ S i }, for each u ∈ X. Figure 2 below depicts the situation.
Again, it is not hard to see that MGC is equivalent to the Maximum Spider Problem in the graph G. At this point it should be clear that by variating the structure of the graph between the vertex s and the family of subsets {S 1 , . . . , S m }, one can describe many more covering problems.
Just as the Maximum Spider Problem encompasses a variety of coverage problems formulated in term of maximization of the objective function, the related Spider Cover minimization problem includes particular cases variants and extensions of the well-known Set Cover Problem. One of such an extension was considered in 4, 9, 10 .

Set Cover with Group Budget (SCG)
We are given a ground set X and a family S {S 1 , . . . , S m } of subsets of X. The family S is partitioned into subfamilies G 1 , . . . , G . The goal is to find an H ⊆ S such that all elements of X are covered by sets in H, and max i 1,..., |H ∩ G i | is minimized. Elkin and Kortsarz 9 studied the SCG problem as a preliminary tool for their multicasting algorithm in synchronous directed networks. Gargano et al. 10 studied the SCG problem in the context of multicasting in optical networks. Interestingly, Gargano et al. 10 also noticed that SCG naturally arises in airline scheduling problems 11 . We trust that the experienced reader can now appreciate the flexibility of our approach by checking that the SCG is equivalent to the Spider Cover problem in the graph shown in Figure 3. The family D to cover is D {D u : u ∈ X}, where for each u ∈ X we have D u {S ∈ S : u ∈ S}.
In general, we expect that the capability of our approach to easily describe and deal with diverse requirements in covering problems to be quite useful. In any case, it seems to provide a nice and unified view of many different questions.

Our Results in Comparison with Previous Work
To the best of our knowledge, the Maximum Spider and the Spider Cover Problems have not been considered before, apart from the different special cases mentioned in the previous section. Our results are the following. 1 We show that the greedy approach yields a 2-approximation algorithm for the Maximum Spider Problem. In this paper approximation ratios for both maximization problems and minimization problems will be greater than 1 . It is remarkable that we achieve the same approximation ratio obtained in 4 for the Maximum Coverage with Group Budget Constraints, although our Maximum Spider Problem is much more general. Since the Maximum Spider Problem contains the classical Maximum Coverage Problem as particular case, from results of 12 it follows that it is hard to approximate within a factor of e/ e − 1 − o 1 , unless NP⊂ DTIME n loglog n . In the paper 4 it is additionally proved that the approximation factor 2 is tight for their problem in the oracle model. Obviously, this tightness of analysis transfers also to our Maximum Spider Problem. 2 We give a greedy algorithm for the Spider Cover Problem with approximation ratio log |D| 1. Again, we match the results of 4, 9, 10 , who obtained the same result in case the graph G is the simple tree of Figure 3. Since the Maximum Spider Problems include the Set Cover problem as a particular case, from 12 one gets a 1− ln |D| factor for the hardness of its approximation, for any > 0. We also observe that our algorithm for the Spider Cover Problem provides a O log |D| -approximation algorithm for the Multicasting-to-Groups Problems considered in 10 , extending the main result of the same paper from trees to general networks. The problem considered therein was to find a set of paths from a source node to at least one node in each subset of a set of groups D and assignments of wavelengths to paths so that paths sharing a same physical link of the network are assigned different wavelengths. The goal is to minimize the number of wavelengths. It can be seen that the paths constituting the spiders covering the family D, and an assignment of different wavelengths to paths in different spiders, give an admissible solution to the Multicasting-to-Group problem in general optical networks.

A Greedy Algorithm for the Maximum Spider Problem
In this section we will present a 2-approximation greedy algorithm for the Maximum Spider Problem MSP .
Given an instance G, s, D of the MSP, where G V, E is a digraph, s is a designated vertex in V , and D is a family of subsets of V \ {s}, we say that the subsets of vertices X ⊆ V are reachable if there exists a spider in G, with center in s, such that each node v ∈ X is reached by such a spider. In other words, X is reachable if there is a spider in G whose set of terminals includes X. For any set X ⊆ V -not necessarily reachable-we define C X as the number of elements in D covered by X, that is, In terms of the function C · , our original objective is essentially that of finding a reachable set X of maximum value C X .

ISRN Discrete Mathematics
For any X, Y ⊆ V , we define the covering improvement C Y | X of Y over X as Definition 2.1. Given a reachable set X we say that: where the maximum is taken on all nodes y that improve on X; We can now describe the skeleton of our 2-approximation algorithm. We point out that the algorithm could also stop as soon as it finds a first node maximally improving on X with the property that C {x} | X 0. However, we let MAX SP generate a maximal set X to make the analysis cleaner .
In the rest of this section we will show how to efficiently implement step 2. Of the above greedy algorithm and how to compute a spider centered at s and with set of terminals X, and we will also show that the number of sets in D covered by the terminals in X is at least half of the optimum number.
Let us first check that the algorithm is polynomial. Proof. In order to compute the node x ∈ V \ X that maximally improves on X we proceed as follows. First, for each y ∈ V \ X we check whether X ∪ {y} is reachable, that is, whether there is a spider centered at s and with set of terminals equal to X ∪ {y}. This can be done by constructing a flow network For undefined terminology about flows in networks, see for example 13 from G, assigning the source node at s, connecting all nodes in X ∪ {y} to a sink node t, setting all flow capacities equal to 1, and by verifying whether or not in this flow network there exists a flow of value |X| 1. This entire procedure can be performed clearly in polynomial time. Subsequently, among all y's for which X ∪ {y} is reachable, we compute the one that maximally improves on X by using the identity C X ∪ {y} C X C {y} | X . Finally, the spider that reaches the set X,-output of the algorithm MAX SP-is computed from the executions of the maximum flow algorithm, and it consists of all the flow paths from s to X with assigned flow value equal to 1.
In order to show that Algorithm MAX SP G, s, D is a 2-approximation algorithm for the Maximum Spider Problem, we first need the following technical result. Proof. Consider two arbitrary sets X, Y ∈ R, such that |X| > |Y |. Let S X denote a spider reaching X, and let S Y be a spider reaching Y . We will show that there exists a new spider S W , with terminals W Y ∪ {x}, where x ∈ X \ Y . Hence, we will get that W ∈ R.

Starting from G, let us construct the flow network
where s is the source of the flow network, t is the sink, and each arc has capacity 1.
The existence of the spider S Y in G centered in s and reaching all nodes in Y implies the existence of a flow f in G such that The value of f is |Y |.
In the same way, the existence of spider S X in G implies the existence of a flow of value |X| in G . Since |X| > |Y |, we know that the maximum flow in G is at least |Y | 1. Hence, the flow f given in 2.4 can be augmented. Consider then the residual graph D f obtained starting from the initial flow f; D f must contain an augmenting path P from s to t. Moreover, the path P must use the arc x, t for some x ∈ X \ Y since D f only contains the arcs t, y for any y ∈ Y . Consider then the augmented flow g implied by f and P . Since it modifies the values of f only on arcs on P , we get that g induces a set of arc disjoint paths in G from s to the nodes in Y ∪ {x}. This gives the desired spider S W covering W Y ∪ {x}.
We notice that the family R is hereditary, that is, any subset of a reachable set is reachable. This fact and Lemma 2.3 tell us that However, the set system associated to our optimization problem is not V, R , but it is D, G , where G {D ⊆ D : all subsets in D are covered by a spider in G centered at s}; which is hereditary but not a matroid.
Nonetheless, the fact that V, R is a matroid represents a useful fact for us. Indeed our coverage function is submodular, for example for any X, Y ⊆ V it holds Hence the Maximum Spider Problem corresponds to the maximization of the submodular function C · on the independent sets of the matroid V, R . By a well-known result of Nemhauser et al. 14 we have that the greedy algorithm MAX SP given in Algorithm 1 returns a set X such that where X * represents an optimal solution to the problem. Hence, we have proved the desired approximation result.

ISRN Discrete Mathematics
Algorithm MAX SP G, s, D 1 Set X ← ∅ 2 while X is not maximal 3 Let x ∈ V \ X be the node that maximally improves on X 4 Set X ← X ∪ {x}. 5 Output X, C X , and the spider with set of terminals X.

Algorithm 1:
The algorithm for the Maximum Spider Problem on G, s, and D.

The Spider Cover Problem
In this section we will build up on the results for the Maximum Spider Problem in order to design a O log |D| -approximation algorithm for the Spider Cover Problem. Recall that in this latter problem we are given digraph G, a vertex s, a family D ⊆ 2 V \{s} , and the goal is to cover all elements in D by using the minimum number of spiders centered at s. Our first step will be to introduce a parametrized family of digraphs {H t } t≥1 and reduce the problem of determining the minimum number of spiders in G necessary to cover all elements of D to the problem of determining the minimum value of t for which H t contains a single spider covering all vertices in a designated subset of vertices of H t . Subsequently, using iteratively the approximation algorithm MAX SP on certain H t 's, plus some additional constructions, will allow us to construct an approximation algorithm for the Spider Cover Problem.

Constructing the Digraph H t
Let G V, E , s, D be an instance of the Spider Cover Problem, and let t ≥ 1 be an integer. We first construct t graphs G 1 V 1 , E 1 , . . . , G t V t , E t as follows: for any v ∈ V the vertex set V i of the ith digraph G i contains a corresponding vertex v i , for i 1, . . . , t. Vertex v i will be called the ith copy of v in the final digraph H t . If the designated vertex s is connected to k vertices v 1 , . . . , v k in G, then each V i contains k copies of s, let s 1 i , . . . , s k i be such copies, for i 1, . . . , t.
Now for the arcs in the G i 's. For each arc u, v ∈ E, u / s / v, we insert a corresponding arc u i , v i in E i . We also insert in E i the arcs For the final construction of H t we introduce new nodes n t v , for each v ∈ ∪ D∈D D, and a special node z. There are arcs between z and each s j i , and for each v ∈ ∪ D∈D D there is an arc v i , n t v from v i to n t v , for each i 1, . . . , t.
Formally, H t U t , A t is a directed graph where An example of digraph G and associated graph H 2 is presented in Figure 4. The relevance of digraph H t to our questions is explained by the following two evident results. Notice that the t spiders in G can also be easily constructed from the "big" spider in H t and vice versa.
Given an instance G, s, D of the Spider Cover Problem, let n t D be the family of subsets of nodes of digraph H t consisting of all subsets n t D {n t v : v ∈ D}, for any D ∈ D.

Theorem 3.2. An instance G, s, D of the Spider Cover Problem admits an optimal solution with t * spiders if and only if t * is the minimum integer for which an optimal solution of the Maximum Spider
Problem on the instance H t * , z, n t * D consists in a spider covering all elements in the family of subsets n t * D .

The Spider Cover Algorithm
Our spider cover algorithm SP COV G, s, D is presented in next box Algorithm 2. The algorithm consists of successive iterations, based on the Algorithm MAX SP. At each iteration a certain set of spiders is constructed in order to cover as many subsets in D as possible. Namely, at each iteration, if Δ ⊆ D is the subfamily of subsets not covered yet, the algorithm seeks for the minimum number w for which the algorithm MAX SP H w , z, n w Δ returns a spider centered in z that covers at least half of the subsets in n w Δ . The minimum number w Family of groups that need to be covered Set S ∅, w 0 Repeat i Compute the minimum integer w with 1 ≤ w ≤ |Δ| such that the algorithm MAX SP H w , z, n w Δ outputs a spider S in H w reaching a set X for which C X |{D ∈ n w Δ : D ∩ X / ∅}| ≥ |n w Δ |/2 |Δ|/2 ii From the spider S in H w obtain via Lemma 4 w new spiders in G that cover at least |Δ|/2 elements of Δ iii Let Δ be the new family of uncovered subsets, put in S the new w spiders, set w w w. Until Δ ∅. Output: S and w.

Algorithm 2:
The algorithm for the Spider Cover Problem on G, s, and D.
can be obtained by applying the algorithm MAX SP H w , z, n w Δ in a binary search fashion, with w in the range 1, |Δ| . Thereafter, via Lemma 3.1, one obtains w spiders in G from the "big" spider in H w .
The total number of used spiders w will be the sum of the number of spiders used at each iteration.
We show now that the number of spiders returned by the algorithm SP COV G, s, D is at most log 2 |D| 1 times the optimal number of spiders necessary for the given instance G, s, D of the Spider Cover Problem. Proof. Consider any iteration of the cycle. The algorithm computes the minimum integer w such that MAX SP H w , z, n w Δ outputs a spider covering at least |Δ|/2 elements of the family n w Δ . This means that the current size of the family of yet uncovered groups is decreased of at least 1/2 of its value during each iteration. Hence, the algorithm SP COV G, s, D consists of at most log |D| 1 iterations.
Moreover, at each iteration the minimum integer w computed by the algorithm is upperbounded by w * . In fact, it is certain that in H w * there exists a spider reaching |Δ| elements of n w * D , for any Δ ⊆ D, and the algorithm MAX SP H w * , z, n w * Δ is guaranteed to find a spider that covers at least |Δ|/2 elements of n w * Δ .
We can then conclude that the total number of spiders w used by SP COV G, s, D , which is the sum of all the values obtained at the various iterations, is upperbounded by w * log 2 |D| 1 .

Final Comments
We have provided a general framework for covering problems and shown that several seemingly different problems naturally fit in our scenario. We have given approximation algorithms with best possible approximation ratios, under widely believed computational complexity assumptions. We would like to point out that we can easily extend our results to undirected graphs or to spiders defined as a collection of vertex disjoint paths sharing only a common vertex, using standard tricks.
In case the graph G V, E is undirected, we can consider the corresponding directed symmetric graph G V, E where E contains the pair of arcs x, y and y, x if and only if x and y are neighbors in G. One must only be careful in the case in which one could get a spider containing both the opposite arcs, say x, y and y, x , corresponding to one edge of G. However, if two branches of a spider are of the form P 1 , x, y, P 2 and Q 1 , y, x, Q 2 , one can modify the spider so to contain P 1 , x, Q 2 and Q 1 , y, P 2 . This implies that we can always get spiders in G with edge disjoint branches. We can then apply the result of the present paper to the directed graph G V, E . In case we are interested in spiders made of vertex disjoint paths sharing a single vertex, we can obtain the same results as for arc-disjoint spiders by substituting in G each node v with a pair of nodes v and v , connected by the arc v , v . Moreover, each arc entering v in G now enters v , and each arc leaving v in G now leaves v .