A Probabilistic Spatial Distribution Model for Wire Faults in Parallel Network-on-Chip Links

High-performance chip multiprocessors contain numerous parallel-processing cores where a fabric devised as a network-onchip (NoC) efficiently handles their escalating intertile communication demands. Unfortunately, prolonged operational stresses cause accelerated physically induced wearout leading to permanent metal wire faults in links. Where only a subset of wires may malfunction, enduring healthy wires are leveraged to sustain connectivity when a partially faulty link recovery mechanism is utilized, where its data recovery latency overhead is proportional to the number of consecutive faulty wires. With NoC link failure models being ultimately important, albeit being absent from existing literature, the construction of a mathematical model towards the understanding of the distribution of wire faults in parallel on-chip links is very critical. This paper steps in such a direction, where the objective is to find the probability of having a “fault segment” consisting of a certain number of consecutive “faulty” wires in a parallel NoC link. First, it is shown how the given problem can be reduced to an equivalent combinatorial problem through partitions and necklaces.Then the proposed algorithm counts certain classes of necklaces bymaking a separation between periodic and aperiodic cases. Finally, the resulting analytical model is tested successfully against a far more costly brute-force algorithm.


Introduction
Continuous complementary metal-oxide-semiconductor (CMOS) transistor miniaturization, following Moore's law, has sparked the multicore era [1,2] in which the architectural paradigm dictates that software application execution is handled by numerous processing cores that operate in parallel.This modular design of chips, including generalpurpose chip multiprocessors (CMPs), not only ensures ultrahigh performance attainment but also provides a number of advantageous attributes such as those of power and thermal management, reconfigurability, and fault-tolerance, among others [3][4][5].Networks-on-chips (NoCs) [6,7], microscale equivalents of large-scale interconnection networks [8,9], which also draw similarities to complex networks [10][11][12], as they are homogenous and exhibit clustering behaviour and short-distance communication between node-pairs, have become the de facto communication backbone in these multicore chips, including CMPs such as the Tilera TILE64 CMP [2] and Intel's 48-core Single-chip Cloud Computer (SCC) [1], hence becoming inherent components in these parallel on-chip systems.
Unfortunately, deep submicron CMOS process technology is marred by increasing susceptibility to wearout, expected to increase by 10x in the next 10 years by ITRS [13], dramatically shortening the useful lifespan of multicore systems.Point-to-point links, comprising a set of parallel metallic wires [14], interconnect neighbouring routers, allowing message transfers on-chip.Prolonged operational stress onto these parallel wires gives rise to accelerated wearout, due to physical failure mechanisms primarily including electromigration (EM) and negative bias temperature instability [15] that cause permanent device faults that can, in turn, quickly lead to architectural-level failures and possible catastrophic NoC operational failure.
Faults induced by these anomalies are widely predicted to become increasingly common in the near future [16].
Research indicates that about 20% of all link errors are caused by permanent failures, occurring both at manufacture-time and at run-time [17,18].Moreover, the wire repeaters (buffers), that is, the link drivers found in each router, the output latches, and the flip-flops of pipelined links are also susceptible and potentially vulnerable [19].
Even an isolated intrarouter or communication link failure in the NoC fabric can turn a static regular topology into an irregular one with subconnected geometry; hence, either physical connectivity among routers may not exist at all, and/or the associated routing protocol may not be able to advance packets to their destinations due to protocol-level violation(s) [20].In-transit messages cannot traverse faulty links, with back-pressure causing the effects of the fault(s) to spread backwards, quickly causing congestion, and even leading the entire system to stall indefinitely.Further, vital components such as vital input/output (I/O) and various offchip memory modules may be partitioned away from the CMP as well, making them inaccessible.Indeed, a number of surveys [4,5,21,22], which outline the design challenges and lay the roadmap in future multicore design, have emphasized the need to conduct research and identify the primary challenges in NoC reliability maintenance techniques, including link-level fault diagnosis and tolerance, as a means to safeguard the scalability and performance sustainability of general-purpose CMPs and application-driven systems-onchips (SoCs).
The facts that high data rate on-chip links are susceptible to increasing failure rates that decelerate the NoC's performance, that the NoC is critical to a CMP's overall functionality, and that no real link failure data are readily available from manufacturers (for obvious reasons) point to the crucial need in constructing a mathematical model to aid in the understanding and exploration of the distribution of wire faults in parallel on-chip links.This model can potentially be coupled to fault-tolerant mechanisms at the chip's architectural-level to realize improvements in intercore communication resiliency [1,2].This work takes decisive steps in such a direction.
In this paper, we derive and demonstrate combinatoricsbased models that can be used to calculate the spatial probability distribution of individual wire faults in a parallel network-on-chip (NoC) [6] interconnect link given its bitwidth (summation of the numbers of single-bit width healthy and unhealthy wires in this parallel link) and a given number of faulty single-bit width wires that reside in this link.Modern NoCs employ interrouter links comprising several unidirectional parallel wires [14] that can transfer an entire data flit in one clock cycle.Since each wire is associated with separate driver circuitry, a particular driver failure only affects its associated wire in a parallel NoC link.(The terms "unhealthy, " "corrupted, " "nonoperational, " and "faulty" are used interchangeably throughout this paper.)(A flit, or flowcontrol unit, is a logical segment of a packetized message.In wormhole flow-control, often employed in NoCs, a packet containing data, comprised of a series of bits, is often split into several flits to reduce buffering requirements and to achieve efficient communication among router nodes.) Previous research studies in [23][24][25], where the first two works constitute our previously published research, target the recovery of partially corrupted packetized data being retransmitted, using a partially faulty link recovery mechanism (PFLRM) that employs a shifting mechanism which leverages the existing healthy links in a partially faulty link, that is, a parallel NoC link in which a subset of its wires are faulty while the remaining wires are operational.These mechanisms retransmit a flit in a bit-shifted scheme from the sender router at every clock cycle, for a given number of cycles, so as to eventually receive all the essential information to enable recovery and reconstruction of the flit data at the receiver router.Under these mechanisms, it has been shown that the consecutiveness or "clustering" of these faulty wires, where each such cluster is separated from its neighboring clusters with at least a healthy link in between them, directly affects the recovery latency required to restore the received partially corrupted flit data at the receiver routers, hence directly impacting negatively the NoC performance.We are, therefore, particularly interested in the number of such consecutive faulty wires in a parallel NoC link as the "maximum wire fault clustering" (i.e., the longest existing consecutiveness of faulty wires in a parallel link, i.e., fault-segment) correlates to the number of overhead clock cycles that are required to retransmit a flit over a partially faulty parallel NoC link, as Section 2 will demonstrate with a detailed example; the wider this fault clustering is, the greater the number of flit retransmissions are needed for flit recovery, hence the lower the performances of the NoC and of the entire CMP.Note that we consider the two edge wires of the parallel link to be virtually consecutive for the functional purposes of the packetized message bit-shifting mechanism that forms part of the recovery in [23][24][25]; hence, the link arrangement forms a virtual ring, with the edge wires "touching" each other, as demonstrated in the example of Figures 1(a) and 1(b) where 5 wires are assumed to exist in a parallel link.We adopt a random spatial distribution of faulty wires in the parallel NoC link and aim to determine the probability distribution of corrupted (and noncorrupted) flit data bits (or associated NoC link wires), as no real data for wire failures in NoC links are published by IC manufacturers.(The terms "consecutive, " "clustering, " "adjacent, " and "segment(s)" are used interchangeably throughout this paper.) To effectively calculate the levels of these "parallel wire segmentations, " we derive (or perfect) a novel algorithm that can be used to determine the segmentation probability for an ordered collection of objects (i.e., parallel wires in a NoC link) of two distinct classes: faulty wires and healthy or "nonfaulty" wires.The algorithm presented here is a more rigorous extension of a preliminary algorithm presented in [26], which heavily depended on a stated (unproven) conjecture that led to non-100%-precise results.
The goal of a complete mathematical model describing the probability distribution of the length of a fault-segment for a given number of parallel NoC wires and faulty wires is reached through a series of combinatorial arguments with regard to partitions and necklaces.Necklaces, apart from their intrinsic usefulness in the field of combinatorics, have proven to be a powerful tool in other areas of mathematics and other sciences.Some customary notions and theories related to necklaces include the Lyndon word [27], the actual homonym necklace problem (see, e.g., [28]), the necklace The five parallel link wires forming a virtual "ring" Figure 1: (a) Demonstration of the PFLRM functionality under all three phases of recovery, using a 5-bit flit width, a faulty wire clustering of 2, and a total of 3 faulty wires (60% faulty wires).Stuck-at-one permanent faults are assumed.In phase 2-a the fault vector is rotated twice until all bits of vector R 3 equal 1, indicating a maximum fault clustering of 2. The boxed bit numbers under phase 3 indicate the respective newly recovered flit bits from the received and corrupted flit vector D  .The final two-position anticlockwise deshifting at the downstream router recovers the final flit to exactly equal to  + 1, the error-free flit being sent from the upstream router; the recovery phase takes 3 clock cycles (CLK {1,2,3} ) to complete (1 base plus 2 recovery cycles).(b) The five wires comprising the same parallel NoC link forming a virtual "ring." Mathematical Problems in Engineering splitting problem [29], and most notably a proof of Fermat's little theorem [30].The rest of this paper is organized as follows.Section 2 presents an overview of the partially faulty link recovery mechanism, published in our previous works [23,24], which forms the basis for the proposed faulty wire distribution model presented in the paper.Next, in Section 3 the problem definition is formally given.Section 4 accommodates the algorithm that leads to the determination of the probability distribution of the length of fault-segments for a given number of wires and faulty wires comprising a parallel NoC link.The algorithm is constructed through basic counting principles and probability rules, where appropriate, and through a derivation showing its correspondence to an equivalent necklace problem.In Section 5, an arithmetic example demonstrates the effectiveness of the obtained analytical model, which is also verified by the results of a brute-force algorithm, with a runtime computation comparison of our analytical model versus the brute-force approach demonstrating its advantageous speedup.The further applicability of the presented algorithm is discussed in Section 6.Finally, Section 7 concludes this paper.

Demonstration of the Partially Faulty Link Recovery Mechanism
For purposes of completeness, we give an outline of the partially fault link recovery mechanism (PFLRM), which forms the basis on which we build our distribution model presented in this paper.A full description of the mechanism can be found in [23,24].The PFLRM scheme can detect bit corruptions in received flit data caused by independent wire failures in a parallel NoC link [14].This detection initiates a data recovery process, whereby the downstream router instructs the upstream router to retransmit the flit(s) appropriately bit-rotated over a respective number of cycles, so as to bypass the faulty wire(s) that cause(s) the respective flit-bit error(s).Healthy bit fragments are extracted from each of received bit-rotated incarnations of the unhealthy flit and placed in an assembly block.PFLRM reacts dynamically to bypass permanent wire faults.While PFLRM also works for transient faults, for clarity we focus on permanent faults only.Preliminarily, we denote an initially healthy parallel NoC link as a vector D = ( 0 ,  1 , . . .,   ), where  ∈ Z + , of  + 1 noncorrupted flit bits sent from an upstream router towards a downstream router.Each such vector member represents the relevant and distinct bit of a flit traversing a relevant wire of a link.When faults occur, some of these wires, or link vector members, become faulty, and as a result a flit will be received at the downstream router with some of its bits being corrupted (while the remaining flit bits remain healthy and contain the correct data), denoted as D  .Individual corrupted flit bits are denoted as   ,  ∈ {0, 1, . . ., }.The relevant positions (placements or distribution) of faulty wires are assumed to be random.In our example of Figure 1(a) we assume that the wires carrying flit bits  0 ,  2 , and  3 between the upstream and the downstream routers in a 5-bit link become faulty simultaneously, respectively, denoted as  0 ,  2 , and  3 .
The same figure shows how PFLRM reconstructs corrupted flits transmitted over a partially faulty link (PFL) in a 3-phase scheme: (1) dynamic fault occurrence and detection, (2-a) fault vector generation, (2-b) flit recovery latency calculation, and (3) flit retransmission (upstream router), reassembly, and final flit recovery (downstream router).All 3 phases are executed when a wire fault(s) originally occurs; after the fault vector is generated, only the last phase is required, until later a new wire becomes faulty.
In phase 1, the error detection block in the downstream router detects the error (but does not recover or distinguish which bit(s) are erroneous), causing the initiation of phase 2a.In phase 2-a, the upstream router stops the transmission of subsequent flits without dropping any packets and transmits two consecutive test vectors, T 1 and T 2 , to the downstream router containing alternating "zeros" and "ones" with a onebit shift difference between the two (refer to Figure 1(a) phase 2-a).Stuck-at-zero or stuck-at-one errors in any of the link wires are detected by a bitwise exclusive or (XOR) operation in the downstream router, indicated by a corresponding 0 in the respective generated fault vector E.
The gist in recovering received flits corrupted during transmission is to utilize this fault vector as many times as required to extract healthy flit bits and use them to reassemble the entire healthy flit at the downstream router; then, repeat for the next flit(s).To do this, each healthy flit D at the upstream router is rotated clockwise a number of times, one bit position at every clock cycle , such that D  = D −1 ≫ 1, where ≫ denotes one-bit clockwise rotation,  ∈ Z + , and  <  + 1 (the bit-width of the link) and sent over the parallel PFL a finite number of times (see next) to bypass faulty wires, while recovering flit bits over the remaining healthy wires.Due to this bit-rotational mechanism the wires at the edges of the link are considered to be virtually adjacent to each other, forming a "ring." For each rotated version of the received corrupted flit D  , the healthy bits are compared against the fault vector and a flit recovery vector R  is generated each time, such that where R −1 is the partially recovered flit vector from the previous clock cycle and E is the bit-wise negation of the fault vector E. In other terms, if a bit from the current received flit vector D  is healthy (i.e., it utilized a faultless wire to arrive at the downstream router), as denoted by the corresponding bit (logic 1) of the fault vector E, then it is extracted and assembled in the current flit recovery vector R  .Otherwise, that bit of R  is left unconsidered for recovery; instead, the previously recovered corresponding flit bit is retrieved.For instance, in phase 3 of our example in Figure 1(a), in the first cycle of recovery (CLK 1 ), flit bits  1 and  4 are recovered; in CLK 2 , these bits are rotated and flit bits  0 and  3 are recovered at their relative bit placement, with  2 being recovered last in CLK 3 .The relative rotations of the transmitted unhealthy flit vector D   and the flit recovery vector R  in each cycle ensure that the recovered flit vector R  is progressively built.
The recovery vector R  requires a final ( − 1)-bit anticlockwise derotation to reproduce the healthy flit vector D downstream.The number of these derotations is directly related to the number of consecutive faulty link wires; we refer to this as the "maximum wire fault clustering"; this also determines the number of additional clock cycles that are required to transmit a flit over the PFL for recovery purposes, referred to as the "flit recovery latency, " with phase 2-b of PFLRM being exactly responsible in determining its size.Since in our example it equals two (with wires carrying flit bits  0 and  3 in Figure 1(a) being adjacent), R 3 is finally anticlockwise-rotated two bit positions to recover D. As mentioned above, possible wire faults at the link edges are also considered consecutive (bits  0 and  4 in Figure 1(a)), hence forming a "ring, " as Figure 1(b) shows, due to the bit-rotational nature of the PFLRM algorithm; this is a vital postulation which is considered in our proposed mathematical model in this paper (see Sections 3 to 5).Phase 2-b utilizes the same hardware and recovery principle as those of phase 3, which recovers the actual flit.It basically rotates the initial fault vector E and compares it with its previous rotated version, assembling the logic-1 fault vector bits, until all bits equal 1, indicating the absence of errors, as vector R 3 of Figure 1(a) shows.Since it uses the same hardware as that of phase 3 (calculation of the max wire clustering), ( 1) is reutilized with D  replaced by E (the fault vector acts as our "data flit"), such that As the same mathematical principles ((1) and ( 2)), and, thus, hardware, are used for both the calculation of flit recovery latency and actual flit recovery, the PFLRM hardware overhead can be reduced.In theory, PFLRM can tolerate up to  faults (flit bit width minus 1), though in such scenario the recovery latency is prohibitive.

Problem Definition
As in Section 2, we assume a parallel NoC link consisting of  wires ( being equal to the size of vector D) that are placed in parallel, wrapped around a common axis forming a ring shape (refer to the example contained in Figure 1 and outlined in Section 2).Each of these wires may be either healthy or faulty, but not both.The number of faulty wires  ( being equal to the number of   's in vector D  ), 0 ≤  ≤ ,  > 0, and the position (placement) of faulty wires are both random.
Consecutively positioned (adjacent) faulty wires form a fault-segment (in Figure 1(a) wires 2 and 3 form single faultsegment of size two, while wire 0 forms a separate faultsegment of size one).Let ,  ≤  ( = 0 ⇔  = 0), denote the size (or length) of the largest fault-segment present in the link.
Note that one should not view the representation depicted in Figure 1 in terms of graph theory, let alone random graph theory, as at best the links (possible nodes) can only form a complete cycle ("ring"), or a graph consisting of several connected components, where each node can only have exactly two neighbors (with all clear implications regarding the clustering coefficients) [31,32].Exploring the probability distributions of the occurrence of such configurations is beyond the scope of the current paper.What is actually desired here is the following.For given values of  and , we seek to find the probability distribution of ,   ( | ).

Algorithm Derivation
Hereafter, we present an algorithm in order to find the probability   ( | ) for each value of , for given values of  and .We find it useful to demonstrate the construction of the algorithm through arithmetic examples that clarify all notions involved.

Number of
Similarly, let (, , ) ⊆ (, ) denote the set of all possible wire arrangements for given , , and  values.Then, the problem reduces to finding |(, , )|, which when divided by |(, )| will yield exactly the required probability distribution   ( | ).

Size of Fault-Segment.
Let  denote the number of healthy wires in a parallel link; that is,  =  −  ( being equal to the number of   's in vector D  ).From the problem definition, the size of the largest fault-segment  has a lower bound which is equal to zero.However, it is possible to define the greatest lower bound of  more precisely (refer to Example 1 for demonstration) as Example 1.Let  = 14 and  = 8.Then,  =  −  = 6 and the greatest lower bound of  is ⌈/⌉ = ⌈8/6⌉ = 2 ≤ .Consequently, for this case  can never be equal to 0 or 1.An illustration of such a wire arrangement is with  = 2 (clustering of faulty wires 1 and 2, as well as of 8 and 9), where the link is shown as an "unwrapped" transverse section, with I and × denoting a healthy and faulty wires, respectively.The same link/wire representation is adopted throughout the remaining length of this paper.
The next step is to find a general algorithm for all possible (including the nonboldface in Table 1) cases.

String Representation of Wire Arrangements.
The set   of   (equivalent) rotations (or circular shifts) {  ,  +1 , . . .,  +  −1 } of a wire arrangement   ∈ (, , ) can equivalently be represented by the string   =  1  2 ⋅ ⋅ ⋅   , where   is the size of the th fault-segment followed by a single healthy wire.Making the convention that  1 = , we have Note that (10) allows for  , ̸ =1 = 0, denoting an empty fault-segment followed by a single healthy wire (refer to Example 3 below).
The string representation for a wire arrangement, as described above, will then allow us to find all subsets   .We now introduce some terminology that will help us to reach this goal.
(b) If there is more than one   satisfying the condition (a) above, then the string is said to have multiple periods that are all divisors of .
(c) If condition (a) is not satisfied, although the string   is nonperiodic, for the sake of generality, the period is considered to be   = .
(d) The period   of any wire arrangement   ∈ (, , ) from subset   , represented by the respective string   of period   , can be obtained as follows: Definition 5. (a) The frequency   of string   =  1  2 ⋅ ⋅ ⋅   is the number of occurrences of a repeating substring  1  2 ⋅ ⋅ ⋅    within   , where   is the period of   and is given by (b) If string   has multiple periods, then it also has multiple frequencies.
Note that due to the convention in the definition of string   (refer to (11)), one string   corresponds to   equivalent rotations of wire arrangements (refer to Example 7).If string   is nonperiodic, that is,   = , then by ( 10) and ( 12) the number of equivalent rotations of wire arrangements is Moreover, substituting ( 12 However, the periodic string  = 3030 with  = 2 corresponds to only  =  1 +  2 +  = 3 + 0 + 2 = 5 equivalent rotations of wire arrangements: Clearly, the introduction of the notion of the string   , with its one-to-one correspondence to subset   , as explained above, has reduced the current problem to finding all possible such sets   , for all  = 1, 2, . . ., , and their cardinalities, which are nothing else than the periods   (related to periods   of the strings   through ( 13) and ( 14)) of wire arrangements   in   .

Partitioning of the Number of Faulty
Wires and Corresponding Necklaces.We use integer partitions in order to find all string representations of all wire arrangements.Definition 8.A -partition  of a positive integer  is a partition consisting of exactly  terms, adding zeros whenever necessary.
Returning to the presented problem for a parallel link arrangement of  wires, with  faulty wires and the largest fault-segment , an -partition  of (integer)  consists of  (number of healthy wires) terms, with the largest term being equal to  (refer to Example 9).Example 9. Let  = 15,  = 9, and  = 3.Then,  = − = 6.The 6-partitions of  = 9, with the largest term being equal to  = 3, are given as follows: Each partition  defines a set of strings, which are given by specific permutations of 's characters.For instance,  = 3 + 3 + 3 + 0 + 0 + 0 corresponds to a set of strings, namely,  1 = 333000,  2 = 330030,  3 = 303030, and  4 = 300330.Hence, still, knowing the actual -partitions corresponding to given , , and  does not solve the problem, as the number of strings per partition must be found.This can be achieved by noting that the number of all possible strings   with nonintersecting sets of equivalent rotations of wire arrangements can be represented by the number of necklaces for each partition  (refer to Example 11 for illustration).We recall the definition of a necklace as follows.
Definition 10.A -ary necklace of length  is an equivalence class of -character strings over an alphabet of size , taking all rotations as equivalent [33].
Example 11.Let  = 8,  = 5, and  = 4.Then,  = 3.It turns out that there is only one 3-partition of  = 5, with the largest term being equal to  = 4; namely, All necklaces for the 3-partition above, with the corresponding equivalent rotations of wire arrangements, are Note that there is a one-to-one correspondence between the necklaces above and (all possible, for this case) strings   , whose corresponding sets of equivalent rotations of wire arrangements do not intersect.
For each -partition one can compute the corresponding number of necklaces (refer to (23)), which in turn can be used to compute the number of wire arrangements.Hence, the problem reduces to finding (a) all such partitions , as described above, and, subsequently, (b) their corresponding number of necklaces.
There are a number of known algorithms that can actually generate such a list of -partitions in a constant amortized time [34].Note here that a partition can be extended to an -partition by simply adding the necessary number of zeros.
An alternative way to approach the problem of finding the required partitions is by defining a string   =  0  1  2 ⋅ ⋅ ⋅   , where   is equal to the number of occurrences of integer  in a string   .Then, from the way the strings   and   are constructed and from ( 6) and (10), we set the following constrained system of equations: Let us now introduce a new string  *  =     ⋅ ⋅ ⋅   that arises by excluding any zero terms from string   .Let  ≤  be the number of nonzero terms in string   ; we can find the number of -ary necklaces for the corresponding original string   as follows: where   (  ,   , . . .,   ) denotes the number of -ary necklaces composed of   occurrences of  = , , . . ., ,  = ∑  =   = , and () = ∏ | (1 − 1/) is Euler's totient function, defined as a number of positive integers less than or equal to  that are coprime to  [35,36].
However, ( 23) counts all necklaces corresponding to strings   (refer to Example 7 for explanation), whether the latter are periodic or not.Clearly one should distinguish between periodic and nonperiodic strings.In order to do so, we simply consider all periods   (and corresponding frequencies   ) of the set (, , ), such that /  ≥ , and compute all aperiodic necklaces (or Lyndon words) of sets (/  , /  , ) for each   (including  1 = 1) separately, according to the following formula [35]: where is the Moebius function.

4.7.
Full Model for the Probability Distribution.Following the analysis above, it is not difficult to derive the following equation from ( 11) and ( 24) that yields the desired number of wire arrangements (once the number of -partitions, strings, i.e., necklaces, and frequencies are known): where  denotes the index for each of the Lyndon words corresponding to the -partitions of  and  the index of the Lyndon word that corresponds to the -partition.The probability distribution   ( | ) for all  (in the appropriate domain as given in ( 4)) is simply given by

Demonstration of the Effectiveness of the Derived Model and Results
The derived algorithm in Section 4 can be summarized as follows.
Label 4 (obtain full lists of the partitions of   with  being their maximum element present).Call available routines.
We demonstrate the applicability and, consequently, the effectiveness of the derived model using the following parameters that were chosen at random (relatively large numbers have been picked to show both the efficiency and the accuracy of the derived model).
The wire arrangements will have the following periods and corresponding frequencies (all common factors of  and , such that / ≥ ): Let us consider all three pairs of   and   one after another.
As a result, the total number of wire arrangements, according to (26) (27).Note that since in these two figures we use different link widths, with 16 wires in Figure 2 and 32 wires in Figure 3, the calculated faulty wire distributions are quite different for each other; even so, the applicability of our analytical model is effectively demonstrated here.Although a 2D simplified figure is in general desirable, Figure 3 is constructed in 3D so as to provide better visual resolution in demonstrating the range of results.All the demonstrated cases have been numerically tested and verified by the results obtained from a brute-force algorithm implemented in MATLAB (Section 5.1).

Verification of Mathematical Model for Correctness Using a Brute-Force Algorithm.
A brute-force algorithm, implemented using the MATLAB computing environment, is utilized to confirm and verify our presented mathematical model for any possible NoC parallel link length encompassing any number of erroneous (unhealthy) wires.This bruteforce algorithm generates all combinations of  number of faulty wires given a -length parallel link (composed of a  number of parallel wires), where  ≤  is based on a simple sequential lexicographical ascending order algorithm, which is a convenient way to generate combinatorial combinations [34].
Essentially, the algorithm generates the (, ) combinations of  objects, denoted by the set { −1 , . . .,  1 ,  0 }, chosen from the set of { −1 , . . .,  1 ,  0 } wire objects such that { −1 , . . .,  1 ,  0 } ⊆ { −1 , . . .,  1 ,  0 }.Note that there is no need to follow an ascending order, as the  combinations need not be ordered (since we are not interested in sorting the  number of combinations); the aim is to cover all combinations, and the ascending order of the lexicographical algorithm used in our brute-force approach ensures that all combinations are accounted for and covered and also that no combination cases are double-generated (due to symmetry) or repeated.For every combination generated, the number of consecutive objects that represent faulty wire elements is then accounted for; with all sizes of faulty wire clustering cases accumulated, the brute-force results are then compared to the results calculated using our mathematical model to determine its systematic accuracy, under any respective  and  values.
Each of the (, ) = !/(!( − )!) combinations requires () time to be produced as an output, and hence the sequential algorithm runs in (∑ = =1 (  ×(  , ))) time to produce, and not just generate, all possible combinations.When  is half the size of , the worst-case scenario of producing all combinations of (, ) is met.Additionally, as  grows linearly, an exponentially increasing number of iterations, and hence time, are required to output all lexicographical combinations, as presented in Section 5.2.Note, though, that there exist some optimized lexicographical algorithms (beyond the scope of the current work), which can generate these combinations using smaller data structures and require reduced memory space in computing, to hold combinations such as [37,38].
To generate (, ) combinations, we use binary strings with  objects having a binary value of one to denote the positions of the faulty wires in the -length wire (while − are zeros), while the remaining objects of the set  have a binary value of zero.These zero-value objects, denoted by the set {ℎ −−1 , . . ., ℎ 1 , ℎ 0 } represent the healthy objects (or wires) in , such that  =  +  or { −1 , . . .,  1 ,  0 } = { −1 , . . .,  1 ,  0 } ∪ {ℎ −−1 , . . ., ℎ 1 , ℎ 0 }.As mentioned above, the -combination brute-force approach is based on the wellknown lexicographical order algorithm presented in Knuth's seminal book [34], with the addition of inserting a subprocedure which measures the number of faulty wires clustering fault-segments, under each generated combination, required to compare against our mathematical model.Without loss of generality with regard to other options, the brute-force algorithm is presented as follows.
Label 3 (count fault-segments).Measure and bookkeep the size and number of consecutive object members (i.e., faulty wires).

Brute-Force Algorithm versus Analytical Method Compute
Costs.To demonstrate the effectiveness of our proposed analytical method, we have run a complete set of experiments for NoC parallel links containing up to 128 links, which is a typical wire bit-width in today's chip multiprocessors [1,2].Both methodologies (analytical and brute-force) exhibit an almost perfect exponential relationship of compute time versus the number of parallel NoC wires; however, the analytical model is several orders of magnitude faster and hence more efficient as the number of wires in a link increases, which makes it a desirable choice when designers need to compute the distribution of faults for wider links.In particular, the brute-force is about  0.5−8 slower than its analytical counterpart.This means that for  = 16 the two algorithms spend comparable times, while, for example, for  = 32, the brute force is three orders of magnitude slower than the analytical algorithm, and for  = 48 it is eight orders of magnitude slower, and so forth.

Applicability of the Combinatorial Algorithm
Our combinatorial algorithm presented in this paper which calculates the distribution of faults clustering is also relevant to other studies or applications where fault clustering, that is, consecutive faults that may lie in a circular or ring topological arrangement, need to be estimated and calculated for in order to asses risk and reliability and other parameters of interest pertaining to system or object resilience.Related applications of our mathematical model include the reliability and wearout assessment of adjoining parallel high-strength wires of suspension bridge cables [39,40], where high axial tensile stresses in tandem with the surrounding corrosive environment accelerate corrosion and embrittlement causing cables to deteriorate and eventually fail over time.
Another application of our model is to aid the assessment of the psychological/death chance cost of Russian roulette, both as a glorified game of ultimate risk, where a person spins the cylinder of a revolver that contains a single bullet and aims it at its head, and a tool for suicide [41].Our model can further be used to calculate the chance of drawing consecutive numbers in a standard roulette game and to derive the probability of having consecutive passenger cabins in a Ferris wheel being occupied (or failing).Next, the model can be used to calculate the recovery hardware cost of adjacent channel/link/node failures in optical ring networks [42] and to assess the structural reliability of consecutive gear teeth due to their exposure to continuous stresses which reduce their fatigue strength in a spur gear [27].Finally, another application of our combinatorial model is to estimate the chance of consecutive spokes in a wheel, regarded as a disk of uniform stiffness per length of circumference, failing due to their exposure to relentless stresses caused by high radial loads experienced in real-world conditions [43], among numerous other applications.

Conclusions
Networks-on-chips (NoCs) are critical on-chip communication subsystems that transfer packetized messages among the various computational tiles in today's ultrahigh performance general-purpose multicore chips such as chip multiprocessors (CMPs).These have been realized due to the continuous miniaturization of CMOS transistors which have enabled the massive integration of transistors on a single chip, exceeding the billion-transistor mark in today's CMOS process technologies.This progress, unfortunately, has also come at a cost of increased susceptibility to wearout and permanent failures.Parallel on-chip links are particularly prone to the effects of electromigration which can cause eventual permanent breakdown in links, manifesting to protocol-level deadlocks and indefinite CMP stalls, rendering the chip inoperable.
Realizing the importance of link failure models in NoCs, this paper derived and demonstrated a combinatorial algorithm that can be used to calculate the spatial probability distribution of wire faults in a parallel NoC interconnect link given its width and a given number of faulty wires, which can appear in this link.Particular emphasis was paid upon the adjacency of the faulty wires that form fault-segments separated by at least one healthy wire, as the size of the largest segment determines the additional delay required by partially faulty link recovery mechanisms, such those of [23][24][25], to recover corrupted flit data at the receiver router.The developed nearly full analytical model constitutes an application of partitions and necklaces through a systematic approach that derives the correspondence between the presented problem and necklaces, where periodicity plays a crucial role.The model is completely verified by and is far superior with regard to extensive brute-force numerical simulations.Finally, it is worth mentioning that the resulting formula of the presented algorithm can, mutatis mutandis, serve as a prototype for applications in the various "isomorphic" problems from other disciplines, areas, or frameworks, as discussed in Section 6 [39][40][41][42][43][44].
retransmission, downstream flit data reassembly and final data recovery (derotation)

Figure 2 :
Figure 2: Probability distribution   ( | ) with a total of  = 16 parallel wires including  faulty wires, where  is the size of the largest fault-segment.

Figure 3 :
Figure 3: Three-dimensional view of probability distribution   ( | ) with a total of  = 32 parallel wires including  faulty wires, where  is the size of the largest fault-segment.
Possible Wire Arrangements.Let (, ) = { 1 ,  2 , . . .,  || } denote the set of all possible wire arrangements   for given  and  values.The cardinality (i.e., number of elements) of set (, ) is simply equal to the number of combinations in choosing  faulty wires out of  wires, given by

Table 4 𝐻 3
Figures 2 and 3 show the distribution of the corresponding probabilities   ( | ) for various values of  obtained by