Conditional Disconnection Probability in Star Graphs

Recently a new interconnection topology has been proposed which compares very favorably with the well known n-cubes (hypercubes) in terms of degree, diameter, fault-tolerance and applicability in VLSI design. In this paper we use a new probabilistic measure of network fault tolerance expressed as the probability of disconnection to study the robustness of star graphs. We derive analytical approximation for the disconnection probability of star graphs and verify it with Monte Carlo simulation. We then compare the results with hypercubes [4]. We also use the measures of network resilience and relative network resilience to evaluate the effects of the disconnection probability on the reliability of star graphs.

ery large scale integration technology or VLSI has seen tremendous growth in the past decade and this has intensified the research in different as- pects of massively parallel computing systems.A multiple processor system consists of a large number of identical independent processing elements which communicate among each other only by exchanging messages over an interconnection network and the application programs are modeled as collections of concurrently executable tasks that may communicate with each other.As the number of processing ele- ments or nodes increases, so does the failure rate, and consequently the system reliability as well as availability become important issues in the design of such systems.Excellent analyses of different issues involved in designing reliable parallel systems can be found in Kohl and Reddy, Forbes and Raghavendra, and Pradhan [6,7,8].
In traditional reliable or fault-tolerant architec- tures the objective of failure-free operation is achieved mainly by hardware replication or redun- dancy.But in case of a large scale parallel computing system, this redundancy is provided inherently in the design of the interconnection topology and the system is allowed to degrade gracefully under conditions of failure down to the lowest acceptable performance level.Hence the design of the interconnection net- work becomes the most important issue in the design of large-scale systems.
The underlying topology of an interconnection network is modeled as a symmetric graph where the nodes represent the processing elements and the edges (arcs) represent the bidirectional communi- cation channels.Design features for an efficient in- terconnection topology include properties like low degree, regularity, small diameter, high connectivity, efficient routing algorithm, high fault tolerance, low fault diameter etc.Since more and more processors must work concurrently in a large-scale system, the criteria of high fault tolerance and strong resilience [1, 2, 3] have become increasingly important.A com- puter system is said to be k-fault tolerant if it can allow up to k failures with continuing operations; k is called the fault tolerance of the system.Network fault tolerance has been defined as the maximum number of elements that can fail without inducing a possible disconnection in the network [8].For ex- ample, in a regular graph with degree m, the network fault tolerance is rn 1. Whenever a node fails, the fault tolerant routing algorithm bypasses the failed node.But when suc- cessive failures lead to a state of network discon- nection whereby one or more healthy nodes are cut out from the rest of the system, distributed recovery is not possible because the state of the computation in the isolated nodes is unreachable.This situation is a failure state since distributed fault detection, recovery and restart procedures depend on graph connectedness.If we define coverage factor as the probability of a successful recovery from failure, then this coverage factor will depend on the disconnection probability of the graph.In this paper we attempt to analyze this dependence for star graphs.STAR

GRAPHS
In this section we briefly discuss the background in- formation about star graphs that is necessary for sub- sequent discussions.Graph theoretic terms not de- fined here can be found in Harary [9] and the complete details about the properties of star graphs can be found in Akers and Krishnamurthy, and Ak- ers et al. [1,2].A star graph S, of order n, is defined to be a symmetric graph G (V, E) where V is the set of n! vertices each representing a distinct per- mutation of n elements and E is the set of symmetric edges such that two permutations (nodes) are con- nected by an edge if one can be reached from the other by interchanging its first symbol with any other symbol.For example in $3, the node representing permutation ABC will have edges to two other permutations (nodes) BAC and CBA. Figure 1 shows $3 and $4.
The diameter of Sn is given by [3(n 1)/2] and efficient routing algorithm exists for such star graphs to compute the minimal path [2].Let A(G:v) or simply A (o) (when G is understood from the context) denote the set of vertices adjacent to vertex o in graph G.For any subset X C V, A(X) is defined as t-JvexA(v) X.Then IA(S.:v)l n 1 d(v), S,, i.e., S, is a (n 1)-regular graph, where d(v) denotes the degree of vertex v.The vertex connec- tivity of a graph G is defined to be the least IX[ for a subset X C V such that G X is disconnected.It has been shown in Akers and Krishnamurthy [1] that the vertex connectivity of star graph S, is n 1, i.e., S, is optimally or (n 2)-fault tolerant in the sense that whenever an arbitrary set of (n 2) or fewer vertices are removed the remaining graph is still con- nected.

DISCONNECTION PROBABILITY OF STAR GRAPHS
Corollary 2 Any edge e (u, v) in S, is a 2-cluster and hence I{A(u) v} LI {A(v) u}l 2n-4.Definition 1 A system is in a disconnected state if and only if there exists a cluster of size rn that is disconnected from the system and rn >_ 1.
Definition 2 P(i) Disconnection Probability Prob [the system is disconnected exactly after i-th fail- ure].
Definition 3 Q(i) Probability that a disconnected graph results with N nodes at the i-th node removal provided that no disconnection occurred until i-th node removal.
Definition 4 Qm(i) probability that a disconnected m-cluster results in a graph with N nodes by removing a single node from a connected graph with N-+ l nodes.It readily follows from these two definitions [4] that Now we present an analytical and experimental approach to evaluate the disconnection probability of a star graph.The model is a homogeneous, non- reconfigurable, large-scale system based on star graphs.This model is similar to the one used in Najjar and Gaudiot [4, 5] to analyze hypercubes and cube-connected cycles.
m>l Theoretical

Analysis
Let S, G(V, E) represent a star graph of order n as defined earlier.A m-cluster is any connected sub- set Xm of M nodes in G. Rm is the number of neighbor nodes to a m-cluster Xm, i.e., Rm IA(Sm)l.Also let N(m) be the number of m-clusters in the star graph S,.Lemma 1 For an arbitrary edge (u, v) in S,, we have {A(u) v} fq {A(v) u} .
Proof: Let the first symbol in u and v be X and Y respectively.For all vertices in A(v) u, the symbol Y is in the same position as in v, say .Now Y is in the first position of u and vertices in A(u) are gen- erated by interchanging Y with any other symbol in u.To bring Y to the j-th position will lead to vertex v. Thus the vertices in A(u) v can't have Y in j-th position.Hence the result.Corollary 1 For any arbitrary u, v A(u)l-< 1.
Lemma 2 Consider two edges (u, v) and (v, w) in S,.Then {A(u) v} {A(w) v} Proof: Similar to that of lemma 1.
It is now evident that to compute the disconnection probability of a star graph for a given number of node removal, we have to enumerate all possible m- clusters of the graph.This is combinatorially an al- most intractable problem.We try to develop an in- sight into the problem first by computing the number of m-clusters for smaller values of m.Lemma 3 For a star graph Sn, Proof: S is a regular graph of degree n 1 and hence there cannot be any disconnection as long as the number of faults is less than n 1.And also when n 1 disconnection of only a single node is possible (i.e., a 1-cluster) since the graph is n 1 regular; this can happen when all these failed nodes form the adjacency of the disconnected node.
V] Lemma 4 For a given S,, Q(n 1) N/(nN_I) Proof: In S,, there are N n! nodes which can be disconnected and there are (,U_l) ways to select a subset of n 1 nodes.(4) Proof."In order to disconnect a 2-cluster i.e., an edge, all of the neighbors must be failed and by lemma 2 and its corollary, any edge in S has 2n 4 neighbors.We can choose an edge in nt(n 1)/2 ways in S since there are only that many edges and (2:4) represents the number of ways one can choose a subset of (2n 4) vertices out of n! ones.Lemma 6 Consider two connecting edges (u, v) and (u, w) in S. Then IA(X)I 3n 7 where X {u, V W}.Proof** Lemmas 1 and 2 indicate that A(u) {v, w}, A(v) u, and A(w) u are pairwise mutually disjoint.Since each vertex has degree n 1, the result readily follows.
Lemma 7 For a given S,, Pmfi To disconnect a 3-cluster (i.e., two adjoin- ing edges), all the neighbors of this 3-cluster must be failed.By lemma 6 the number of such neighbors is 3n 7.For a given node we can choose two incident edges in (f) ways and there are n! nodes; hence n!(f ) gives the number of possible 3-clusters.
And we can choose a subset of (3n 7) vertices out of n! in (3nn!7) ways.
To compute Qm for m > 3 becomes increasingly involved.Instead, for simplicity we consider only those cases where m is a factorial of some integer and the cluster is an order-k star graph, m k!.This will indeed provide an indication of the variation of Ore(i) as a function of m.This approach can be justified by the fact that of all the possible configu- rations of an m node cluster, a star graph Sk, k! m, has the lowest number of neighbors and hence the highest probability of disconnection.
Lemma 8 The number Rm of neighbors of a m-clus- ter, m k!, in a star graph Sn, is given by Rm=k! (n k)k Proof: There are k! nodes in a subgraph Sk.Each of these nodes has n 1 neighbors, k 1 of which belong to the subgraph itself and n k are "external neighbors." Lemma 9 The number of distinct substar graphs Sk of order k in a given S,, when k <-n 2, is given by (k-1)!(6) Proof: Each node is represented by a permutation of n symbols.Also the nodes in a subgraph S must have (n k) symbols in the same positions.We can choose k symbols out of n in () ways and then place the remaining (n k) symbols in different possible positions to get different subgraphs.For example the first of the (n k) symbols can be placed in k dif- ferent positions in a string of k symbols (we cannot place the new symbol at the beginning since if the leading symbol is fixed in position no edge can be generated).Hence the (n k) symbols can be placed in (n 1)!/(k 1)! ways and hence the result. 1--]  Theorem 1 The disconnection probability of a star subgraph of size k (consisting of m k! nodes) in a star graph S, of n! nodes is given by O, < Rm U(m) [n!' Rm (7) )Rm Proof: The disconnection can occur when the num- ber of failures is less than the number of neighbors of the subset to be disconnected, the probability of a disconnection when the number of failures is less than the number of neighbors Rm is zero.For larger values of i, the probability of a disconnection of a subset of size m is proportional to the number of possible subsets which can be so disconnected.The disconnection of each of these subsets can occur when a specific Rm out of a total of n! nodes failed.
Thus, the total probability of disconnection is the ratio of two values.
At this point we want to note that it was conjec- tured in Najjar and Gaudiot [4] that in any regular graph QI(i) "> Om(i), for m > 1 provided that for any m-cluster the graph satisfies the relation 1 < m < N/2 Rm > n where N is number of nodes in the graph and n is the degree of each node.Najjar and Gaudiot [4] also gave an intuitive justification for their conjecture.While it seems extremely hard to prove the conjecture rigorously, our following ex- amples and subsequent experimental results on star graphs lend further strong credibility to this conjec- ture.
Example: Consider a star graph S 6 with n 6 and the number of nodes N 6! 720.
This example shows that when the connection of a 2-node cluster is possible at 2n 4, the probability of a prior single node disconnection event is about half a million times larger and similarly when a disconnection of a 3-node cluster is possible at 3n 7, the probability of a prior two node discon- nection is twenty thousand times larger.This ex- ample, although it does not prove the said conjecture of Najjar and Gaudiot [4], is a further demonstration of the rationale behind the conjecture as was shown with examples from hypercubes, cube connected cycles, etc.Now we propose to give an approximate analytical expression for P(i) based on the above-mentioned results.We cannot give an exact expression for P(i) but try to give an indication of its magnitude (and later verify it experimentally) by using the approxi- mation Q(i) Q(i) which is based on the conjecture Ql(i) >> Qm(i) for all rn > 1. Hence i-1 To evaluate P(i) for our star graphs, we need an expression for Q(i) for > n 1.
Theorem 2 The conditional probability of discon- necting a single node after failures, where n 1 < < 2n 3, is given by Proof: By definition, no disconnection occurred at (i 1)st failure and hence the probability that one occurs at the i-th failure is the probability that some node had all but one of its neighbors failed and that neighbor was the i-th failure.(_Ul) represents the pos- sible combinations of 1 failures among N n! nodes.(_-21) n 1 is the possible choices of n 2 failed neighbors among n 1 neighbors while (_u22) is the combination of the remaining n + 1 failure in the rest of the system, where N is the number of nodes that can be isolated.Lastly 1/(N + 1) is the probability that the last remaining neighbor fails.
Equation 10 corresponds to the single node dis- connection probability when more than n 1 nodes have failed.For -> 2n 3, it is possible to have two or more single node disconnection.However, the probability of multiple single node disconnection is of the same order as that of a cluster disconnection when rn > 1.Therefore, using the approximation Q(i) Q I(i), we can extend the range of in 10 to > n 1.We obtain From equation 11, we can write that Ql(i + 1) This proves that the relation Ql(i + 1) > Q(i) holds for > n 1.Thus the approximation analysis for P(i) of a star graph is complete.

Monte Carlo Simulation
The objective of our simulation experiment was to measure the values of P(i) for star graphs of different sizes and to compare those with similar results for hypercubes [4].A program has been developed which simulates the failure of nodes and checks even- tual disconnection in the graph.Each iteration stage of the simulation consisted of the following: Randomly choose any one of the remaining (N i) vertices and remove the vertex from the graph along with all the incident edges.Record the number and size of the connected components of the remaining graph.
If more than one component is found, record the iteration number and size of the component and exit, else repeat.
In each case the number of samples were higher than 2000.

Frequency of Disconnection
Table I shows the frequency of occurrence of dis- connections of different-sized clusters for different- sized star graphs.Table I also includes similar results for binary hypercubes and cube-connected cycles for comparison purposes; those are taken from Akers and Krishnamurthy [1].Let Fst(K) denote the prob- ability that the disconnected cluster is of size K pro- vided disconnection occurred in the star graph.Sim- ilarly Fc(K) and F,c(K) are defined.These values are shown in Table I for K 1, 2, 3, 4 as obtained by our simulation experiment.We make the following observations" For all values of N (number of nodes in the star graph), Fst(1), frequency of single node discon- nection is larger than 50% and Fst(1) always in- creases with increasing N. Similar observations V(i) can be made about binary cubes and cube con- nected cycles [1].
Dominance of the single node disconnection in- creases with increasing node degree n.The above results do not give any indication of the value of P(i) itself, only of its composition.In the following section we give the analytical (obtained by using the expressions derived in the earlier section) as well as simulation (obtained by our Monte Carlo simulation) results for P(i) of star graphs of different sizes.
Results for P(i) Figure 2, 3, and 4 show the analytical and simulation plots of P(i) versus the percentage of failed nodes (i/N percent) for a star graph of order 4, 5 and 6 respectively.We make the following observations: The curves are narrower for higher order star graphs and the discrepancy between analytical and simulation results is never beyond 20%.
If emax represents the maximum value of P(i) and /peak represents the corresponding value of i, there is a very close correlation in the value of /peak between the simulation results and the analytical model.Also the value of/peak tends to be lesser for higher order star graphs and /peak for star graphs is always less than 50%.It may be recalled from Aker and Krishnamurthy [1] that/peak for binary cubes was near 50%.
The variation in the value of Pmax is due to the approximation in the analytical model as well as to the statistical error in simulation experiment.
Pmax steadily falls with increasing size of the star graph.We can conclude that N is the dominant factor in determining the value of Pmax"

NETWORK RESILIENCE
We have seen that a network disconnection can impede the recovery mechanism in a gracefully de-   gradable system.Hence the probability of no dis- connection is a multiplicative coefficient of the cov- erage factor, the probability of successful recovery.
In other words, the coverage factor at i-th failure is (1 P(i)) times the coverage factor in a fully con- nected network graph.The range of values of Pmax as obtained from the figures is very high compared to acceptable values of coverage factor.In order to allow enough failures without reaching high values of P(i), we use the concept of network resilience which was used in Najjar and Gaudiot [4] to study hypercubes.
Network Resilience N R(p) of a distributed system is defined as the maximum number of node failures that can be sustained while the network remains con- nected with a probability (1 p).It is formally de- fined as N R(p) P(i) <-p (13) i=1 (1 p) is therefore the certainty factor of no disconnection after N R(p) failures.The measure of relative network resilience R N(p) is defined as N R(p)/N.Table II shows the values of N R(p) and RN R(p) for the star graph, hypercubes and cube connected cycle cases and for p 0.01.The values for the hypercubes and cube connected cycles are taken from Akers and Krishnamurthy [1] for com- parison purposes.These values represent the maxi- mum number of nodes that can fail with less than 1% chance of network disconnection.The plots of RN R(0.01) are shown in Figure 5 (x-axis values are values of log2 N and y-axis represents relative resil- ience).We make the following observations: Relative network resilience decreases for cube connected cycles while it increases for star graphs and hypercubes.
The network resilience in all cases increases with increase in number of nodes in the network, i.e., larger systems allow a larger number of degradation states irrespective of the topology.
When the degree of nodes remain constant the relative network resilience decreases with in- creasing N. A sublogarithmic increase in node degree, such as in a star graph results in a slight increase of the RN R and a logarithmic increase in node degree such as in hypercube results in more in- crease in relative network resilience.

CONCLUSIONS
In this paper we have used the probability of dis- connection as a probabilistic measure of network fault tolerance and have used this criterion to study the robustness of star graphs.We have derived sev- eral interesting properties of this new family of net- work topology and compared the results with those of hypercubes.Our study lends further proof to two major points: disconnection probability can be used as a very meaningful criterion to measure network resilience in real life applications and the star graphs RNR(N,p = 0.01) FIGUREStar Graphs of Order 3 and 4

FIGURE 2
FIGURE 2 Probability of Disconnections for S4, N 24, denotes number of failed nodes.

FIGURE 3 FIGURE 4
FIGURE 3 Probability of Disconnections for $5, N 120, denotes number of failed nodes.

TABLE Frequencies of
Disconnection of m-Clusters

TABLE II Resilience
and Relative Resilience for p 0.01