On Solving the Decycling Problem in a Torus Network

Modern supercomputers are massively parallel systems: they embody thousands of computing nodes and sometimes several millions. The torus topology has proven very popular for the interconnect of these high-performance systems. Notably, this network topology is employed by the supercomputer ranked number one in the world as of November 2020, the supercomputer Fugaku. Given the high number of compute nodes in such systems, efficient parallel processing is critical to maximise the computing performance. It is well known that cycles harm the parallel processing capacity of systems: for instance, deadlocks and starvations are two notorious issues of parallel computing that are directly linked to the presence of cycles. Hence, network decycling is an important issue, and it has been extensively discussed in the literature. We describe in this paper a decycling algorithm for the 3-dimensional k-ary torus topology and compare it with established results, both theoretically and experimentally. (This paper is a revised version of Antoine Bossard (2020)).


Introduction
The supercomputers of the 21st century are massively parallel systems: they embody thousands of compute nodes. Some recent devices even include several millions of nodes (e.g., 10,649,600 nodes for the Sunway TaihuLight as of November 2020's TOP500 list [1]). The interconnection of these compute nodes is thus a critical issue so as to maximise the parallel processing performance and thus the machine performance overall. Thanks to its advantageous topological properties, such as regularity, the torus network topology has proven very popular for the interconnect of modern supercomputers. For example, the supercomputer ranked number one in the world in the November 2020 TOP500 ranking, the supercomputer Fugaku built by Fujitsu and RIKEN, employs the torus topology to connect its nodes (Tofu interconnect D [2]). The IBM Blue Gene/L and Blue Gene/P, Cray Titan (Gemini interconnect [3]), and Fujitsu SORA-MA (Tofu interconnect 2 [4]) are other examples of supercomputers based on the torus topology.
It is well known that parallel processing is harmed by the presence of cycles: they are at the source of the deadlock, livelock, and starvation notorious resource allocation issues [5]. Notably, because it has important implications for parallel processing, the decycling problem, also known as the minimum feedback vertex set problem, has been extensively addressed in the literature. Karp has shown that finding a decycling set of minimum size (i.e., an optimal decycling set) in any graph is NP-complete [6]. For instance, Fomin et al. have described an algorithm that solves this problem in any graph in Oð1:7548 n Þ time [7]. Furthermore, polynomial solutions have been described for several particular classes of graphs such as 3-regular graphs [8], convex bipartite graphs [9], permutation graphs [10], and hypercube-based networks [11,12]. Among others, the size of an optimal decycling set (i.e., the decycling number) in the case of cubes and grids has been discussed in [13,14] and for hypercubes in [15]. We describe in this paper a polynomial time decycling algorithm for a 3-dimensional k-ary torus network. One should note that while the case of a grid as mentioned above seems close, or at least related, to the case of the torus which we investigate hereinafter, the wrap-around edges of the torus invalidate the grid decycling approach (refer to the next section for additional details).
The rest of this paper is organised as follows. Notations, definitions, and previous results are recalled in Section 2. The decycling algorithm is presented in Section 3, including the proof of its correctness and complexity analysis.
Theoretical and empirical evaluations are conducted in Section 4. Finally, concluding remarks are given in Section 5.

Preliminaries
We recall in this section several notations, definitions, and previously established results. The set of the vertices of a graph G is denoted by VðGÞ, and the set of its edges by EðG Þ. A path in a graph G is a subgraph of G that is an alternating sequence of distinct vertices and edges. Such a vertex-edge sequence but whose two terminal vertices are the same vertex is called a cycle. The length of a path or cycle is its number of edges. A graph that contains no cycle is said to be acyclic and is isomorphic to a tree.
A torus Tðn, kÞ is thus a regular graph of degree 2n, of diameter nbk/2c and has nk n edges. An essential torus property is next recalled.
A torus Tð2, 3Þ is shown in Figure 1(a) and its recursive structure is illustrated in Figure 1 Next, previously established results are recalled. Beineke and Vandell have established a lower bound on the size of a decycling set for any graph [16]. This result is recalled in Theorem 3 below.
Theorem 3 (see [16]). Given a graph G = ðV, EÞ of maximum degree Δ, any decycling set S of G satisfies the following relation: We were unaware until very recently (after the publication of [17]) that Pike and Zou have shown how to calculate a decycling set of minimum size for a 2-dimensional torus in [18]. The corresponding result is recalled in Theorem 4 below.
Theorem 4 (see [18]). In a Tð2, kÞ with k ≥ 3, a decycling set S k of minimum size with can be found in Oðk 2 Þ time.

The Case of a Tð3, kÞ
We describe in this section the details of our approach to decycle a 3-dimensional k-ary torus Tð3, kÞ.
3.1. Algorithm Description. We give below a constructive proof in the form of a decycling algorithm whose input is an arity k (k ≥ 1) and which outputs a decycling set S k . The main idea is to consider one dimension δ to reduce Tð3, kÞ into k 2-dimensional subtori as per Property 2 and to alternate for each such subtorus the optimal decycling of a Tð2, kÞ (i.e., Theorem 4) and two other decycling methods of a Tð2, kÞ which induce a graph with no edge.
Step 1. We distinguish the two cases k even and k odd.
Case k even: Define two decycling sets S k 1 , S k 2 in a two-dimensional k -ary torus Tð2, kÞ as follows: In other words, the set S k 1 is induced by the vertices of T ð2, kÞ that are taken in one particular "quincunx" manner, and the set S k 2 by the vertices of Tð2, kÞ that are taken in the other "quincunx" manner. Precisely, we have S k 2 = VðTð2, kÞ Þ \ S k 1 and S k 1 = VðTð2, kÞÞ \ S k 2 . The sets S k 1 and S k 2 when k = 4 are illustrated in Figures 2(a) and 2(b), respectively; they consist in the red vertices.
Case k odd: Define two decycling sets S k 1 , S k 2 in a two-dimensional k -ary torus Tð2, kÞ as follows: Wireless Communications and Mobile Computing In other words, the set S k 1 is induced by the vertices of T ð2, kÞ that are taken in one particular "quincunx" manner and also includes the vertices of the top row and those of the right row. The sets S k 1 and S k 2 when k = 5 are illustrated in Figures 3(a) and 3(b), respectively; they consist in the red vertices.
Step 2. Let δ = 0 and consider the k subtori T i,δ ð2, kÞ (0 ≤ i ≤ k − 1) as per Property 2. Define S k 0 the optimal decycling set of Tð2, kÞ as induced by Theorem 4. We distinguish the three cases that are induced by the value of k mod 3.
Case k ≡ 0ðmod 3Þ: Figure 4). In this figure, where there can be edges on the third dimension between two subtori, sample edges are shown.
Case k ≡ 1ðmod 3Þ: Figure 5). Again in this figure, where there can be edges on the third dimension between two subtori, sample edges are shown. Case k ≡ 2ðmod 3Þ: Decycle each subtorus T i,δ ð2, kÞð0 ≤ i ≤ k − 2Þ with the vertex set S k i mod 3 and T k−1,δ ð2, kÞ with the vertex set Vð T k−1,δ ð2, kÞÞ (see Figure 6). Again in this figure, where there can be edges on the third dimension between two subtori, sample edges are shown.

Correctness and Complexities.
In this section, we prove the correctness of the proposed algorithm and establish its complexities.
Theorem 5. In a 3-dimensional k-ary torus Tð3, kÞ (k ≥ 1), a decycling set S k of 0 vertex when k = 1, 3 vertices when k = 2, and in the other cases with when k is even, and with when k is odd can be found in optimal Oðk 3 Þ time.
Proof. The cases induced by k ≤ 2 are trivial; they have already been shown at the beginning of Section 3.1. So, we can assume k ≥ 3.
By definition, the subgraph of Tð2, kÞ induced by the vertices of the set VðTð2, kÞÞ \ S k 1 has no edge. And similarly, the 3 Wireless Communications and Mobile Computing subgraph of Tð2, kÞ induced by the vertices of the set VðTð 2, kÞÞ \ S k 2 has no edge either. Hence, by definition of the algorithm, each subtorus T i,δ ð2, kÞ (0 ≤ i ≤ k − 1) is acyclic. Moreover, the only edges on the third dimension are at a vertex of a graph induced by a subtorus decycled with S k 0 . Consider two such graphs induced by a subtorus decycled with S k 0 , say the graphs that correspond to T i,δ ð2, kÞ and T j,δ ð2, kÞ where i < j and j − i minimal. In the case k ≡ 0ðmod 3Þ, we have j − i = 3 and the greatest such j is k − 3. And since for every edge on the third dimension the vertex that is not in a graph induced by a sub-torus decycled with S k 0 is inside a graph induced by a subtorus that has no edge, the resulting graph is acyclic. In the case k ≡ 1ðmod 3Þ, we have j − i = 3 and the greatest such j is k − 4. So, again and for the same reason, the resulting graph is acyclic. In the case k ≡ 2ðmod 3Þ, we have j − i = 3 and the greatest such j is k − 2, so there could be a path on the third dimension between a vertex of T k−2,δ ð2, kÞ (i.e., the rightmost graph induced by S k 0 ) and of T 0,δ ð2, kÞ (which is also induced by S k 0 ). However, all the vertices of T k−1,δ ð2, kÞ are removed, so there is no such path and the resulting graph is thus acyclic.
The set S k 1 and the set S k 2 each have k 2 /2 vertices when k is even and ðk − 1Þ 2 /2 + ð2k − 1Þ = ðk 2 − 1Þ/2 + k when k is odd. They can thus each be calculated in Oðk 2 Þ time. By Theorem 4, the set S k 0 has 3k/2 vertices when k = 4 and dðk 2 + 2Þ/3e vertices otherwise, and this set can be calculated in Oðk 2 Þ time.

Decycling sets:
T 0, (2,6) T k-1, (2, 6) T(3, 6): Figure 4: An illustration of the case k ≡ 0ðmod 3Þ with k = 6. The selected decycling set for each of the T i,δ ð2, kÞ (0 ≤ i ≤ k − 1) subtori is given below it. Where there can be edges on the third dimension between two subtori, sample edges are shown.  Figure 3: The sets S k 1 (a) and S k 2 (b) as defined in a Tð2, kÞ when k = 5; they consist in the red vertices.

Discussion
In this section, we discuss the obtained theoretical results and compare them with experimental data.

4.1.
Comparison with the Lower Bound. We investigate in this section how close the size of the generated decycling set is to the lower bound given by Theorem 3. Figure 7 shows the values obtained from Theorem 3 and Theorem 5. Let us recall that the result of Theorem 3 is a lower bound on the size of a decycling set, and not necessarily the size of an optimal decycling set. So, the difference plotted in Figure 7 is given for reference, and it shows that the size of the obtained decycling set is promising, possibly optimal in some cases, given that it is rather close, and sometimes equal, to the lower bound of Theorem 3.
One can also notice that the size of the decycling set generated by the proposed algorithm is never smaller than the lower bound of Theorem 3. If that were the case, this would indicate a hole in the proposed algorithm.

Comparison with the Results of a Computer Experiment.
We have implemented a stochastic decycling algorithm in order to compare the obtained theoretical results with those obtained experimentally. As recalled in introduction, this is an NP-complete problem; hence, the graph decycling implementation we use only approximates the size of an optimal decycling set. This implementation follows the method described in [19]. The stochastic implementation was run 1,000 times.
The values obtained from Theorem 5 and the computer experiment are shown in Table 1. The minimum size and the average size of the decycling set generated by the stochastic implementation are given. The values from Theorem 3-a lower bound on the size of a decycling set, and not necessarily the size of an optimal decycling set-are also given for reference.
From these results, it can be noticed that the proposed algorithm beats the stochastic implementation on each of all its 1,000 runs at k ≡ 2ðmod 4Þ and is equal or very nearly equal to it at k ≡ 0ðmod 4Þ. In other words, when k is even, our proposal induces a smaller or nearly equal decycling set than the best decycling set found after 1,000 runs. And when k is odd, as k increases, the size difference between our proposal and the stochastic implementation continuously decreases, and at k = 9, our proposal beats the stochastic implementation when considering the average value of its 1,000 runs.
And, of course, the complexity of the stochastic implementation is always prohibitive [19], especially compared to the worst-case time complexity of the proposal (Oðk 3 Þ, see Theorem 5). These are very positive results which quantitatively show the significance of the proposal. For reference, in the case k = 10, the stochastic implementation took more than 2.5 hours to complete the 1,000 runs on a midrange computer (Intel Core i5-1035G7 CPU, 8 GB RAM).
Finally, it can also be noticed that the size of the decycling set generated by the proposed algorithm in a Tð3, kÞ is Oðk n Þ. Besides, by Theorem 4, the size of an optimal decycling set in a Tð2, kÞ is also Oðk n Þ. This is yet another positive indicator of the performance of our proposal.

Concluding Remarks
The torus topology is nowadays ubiquitous in supercomputing. It is the network topology of choice for the interconnect of massively parallel systems: it is for instance employed by the supercomputer ranked number one in the world as of November 2020, the supercomputer Fugaku. Besides, it is common knowledge that cycles in the network of compute nodes harm parallel processing, and this is one reason why the decycling problem-NP-complete-has been extensively addressed in the literature. We have described in this paper a decycling algorithm for a torus Tð3, kÞ. Thanks to the recursive property of the torus topology, this proposal can be used  to decycle parts (i.e., subtori) of a torus of higher dimension, which as explained will consequently facilitate parallel processing. Precisely, we have given a constructive proof of a decycling set S k for a torus Tð3, kÞ where S k has kðjS k 0 j + 2j S k 1 jÞ/3 vertices when k ≡ 0ðmod 3Þ, ðk − 1ÞðjS k 0 j + 2jS k 1 jÞ/3 + j S k 1 j vertices when k ≡ 1ðmod 3Þ, ðk − 2ÞðjS k 0 j + 2jS k 1 jÞ/3 + jS k 0 j + k 2 vertices when k ≡ 2ðmod 3Þ with S k 0 an optimal decycling set in Tð2, kÞ, jS k 1 j = k 2 /2 when k even, and jS k 1 j = ðk 2 − 1Þ/2 + k when k odd and can be obtained in Oðk 3 Þ optimal time. We have formally evaluated the proposed algorithm and conducted evaluation experiments to compare it to conventional approaches. The obtained results have quantitatively shown the significance of the proposal.
Regarding future works, refining the proposed decycling algorithm so that the generated decycling set includes a smaller number of vertices is a first meaningful objective. Then, it will be very interesting to investigate, for instance, as explained above by using the recursive property of the torus topology, how to rely on the obtained results to produce nontrivial decycling sets for tori of higher dimensions.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The author declares that there are no conflicts of interest regarding the publication of this paper.