Application and Analysis of Multicast Blocking Modelling in Fat-Tree Data Center Networks

Multicast can improve network performance by eliminating unnecessary duplicated flows in the data center networks (DCNs). Thus it can significantly save network bandwidth. However, the network multicast blocking may cause the retransmission of a large number of data packets and seriously influence the traffic efficiency in data center networks, especially in the fat-tree DCNs with multirooted tree structure. In this paper, we build a multicast blocking model and apply it to solve the problem of network blocking in the fat-tree DCNs. Furthermore, we propose a novel multicast scheduling strategy. In the scheduling strategy, we select the uplink connecting to available core switch whose remaining bandwidth is close to and greater than the three times of bandwidth multicast requests so as to reduce the operation time of the proposed algorithm. Then the blocking probability of downlink in the next time-slot is calculated in multicast subnetwork by using Markov chains theory. With the obtained probability, we select the optimal downlink based on the available core switch. In addition, theoretical analysis shows that themulticast scheduling algorithm has close to zero network blocking probability as well as lower time complexity. Simulation results verify the effectiveness of our proposed multicast scheduling algorithm.


Introduction
Recently, data center networks (DCNs) have been widely studied in both academia and industry due to the fact that their infrastructure can support various cloud computing services.The fat-tree DCN, as a special instance and variation of the Clos networks, has been widely adopted as the topology for DCNs since it can build large-scale traffic networks by only using fewer switches [1].
Multicast transmission is needed for efficient and simultaneous transmission of the same information copy to a large number of nodes, which is driven by many applications that benefit from execution parallelism and cooperation, such as the MapReduce type of application for processing data [2].In fact, multicast is the parallel transmission of the data packets in complex network.For example, Google File System (GFS) is a distributed file system for massive data-intensive application in a multicast transmission manner [3].
There have been some studies on multicast transmission in fat-tree DCNs.The stochastic load-balanced multipath routing (SLMR) algorithm selects optimal path by obtaining and comparing the oversubscription probabilities of the candidate links, and it can balance traffic among multiple links by minimizing the probability of each link to face network blocking [4].But the SLMR algorithm only studies unicast traffic.The bounded congestion multicast scheduling (BCMS) algorithm, an online multicast scheduling algorithm, is able to achieve bounded congestion as well as efficient bandwidth utilization even under worst-case traffic conditions in a fat-tree DCN [5].Moreover, the scheduling algorithm fault rate (SAFR) reflects the efficiency level of scheduling algorithm.The larger the SAFR is, the lower efficiency the scheduling algorithm has.The SAFR in fattree DCNs increases faster with network blocking rate (NBR) compared with that in other DCNs as shown in Figure 1.In fact, the NBR reflects the degree of network blocking [6].
The scheduling processes in the existing scheduling algorithms [4][5][6] are based on the network state at current timeslot.They do not consider that network state may change when data flows begin to transfer after the current scheduling process is finished.This may lead to the network load imbalance because the bandwidth of multicast connection has not been allocated dynamically [7].Therefore, we develop an efficient multicast scheduling algorithm to achieve the scheduling of network flows at the network state of next timeslot in fat-tree DCNs.However, since the network state at next time-slot is probabilistic and not deterministic, it is difficult to predict the network state of next time-slot from the present state with certainty and find a deterministic strategy.The Markov chains can be employed to predict network state, even though state transition is probabilistic [8].Thus the next network states can be assessed by the set of probabilities in a Markov process [9].The evolution of the set of probability essentially describes the underlying dynamical nature of a network [10].In [11], the authors proposed a scheme by using Markov approximation, which aims at minimizing the maximum link utilization (i.e., the link utilization of the most blocked link) in data center networks.Moreover, the scheme provides two strategies that construct Markov chains with different connection relationships.The first strategy just applies Markov approximation to data center traffic engineering.The second strategy is a local search algorithm that modifies Markov approximation.
In this paper, we adopt Markov chains to deduce the link blocking probability at next time-slot and take them as link weight in the multicast blocking model in fat-tree DCNs.Therefore, available links are selected based on the network state at next time-slot and the optimal downlink are selected by the link weight.In the downlink selection, we compare the blocking probability and choose the downlinks with lowest blocking probability at next time-slot, which avoids MSaMC failure due to delay error.In particular, we find that the remaining bandwidth of the selected uplinks is close to and greater than the three times of multicast bandwidth requests, which can reduce the algorithm execution time and save bandwidth consumption.Theoretical analysis shows the correctness of the strategy while simulation results show that MSaMC can achieve higher network throughput and lower average delay.
The contributions of the paper can be summarized as follows: (i) We analyze why multicast blocking occurs in practical application.Afterwards, we present a novel way of multicast transmission forecasting and the multicast blocking model in fat-tree DCNs.
(ii) We put forward a multicast scheduling algorithm (MSaMC) to select the optimal uplinks and downlinks.MSaMC not only ensures lower network blocking but also maximizes the utility of network bandwidth resources.
(iii) Theoretical analysis shows that the link blocking probability is less than 1/3 by our proposed MSaMC algorithm, and the multicast network can be nonblocking if the link blocking probability is less than 0.1.
The rest of the paper is organized as follows: Section 2 describes the detrimental effects of multicast blocking in fat-tree DCNs.Section 3 establishes the multicast blocking probability model in fat-tree DCNs and deduces the link blocking probability at next time-slot based Markov chains.In Section 4, we propose multicast scheduling algorithm with Markov chains (MSaMC) and analyze the complexity of MSaMC algorithm in Section 5.In Section 6, we evaluate the performance of MSaMC by simulation results.Finally, Section 7 concludes this paper.

Cause of Multicast Blocking
A fat-tree DCN as shown in Figure 2 is represented as a triple (, , ), where  and  denote the number of core switches and edge switches, respectively, and  indicates the number of servers connecting to an edge switch.In fat-tree DCNs, all links are bidirectional and have the same capacity.We define the uplink as the link from edge switch to core switch and the downlink as the link from core switch to edge switch.A multicast flow request  can be abstracted as a triple (, , ), where  ∈ {1, 2, . . ., } is the source edge switch and  denotes the set of destination edge switches by the multicast flow request .The number of destination edge switches with multicast flow request  is represented as ||, || ≤  − 1, which is denoted as fanout .Note that the servers connecting to the same edge switch can freely communicate with each other, and the intraedge switch traffic can be ignored.Hence, both aggregation and edge layer can be seen as edge layer.
To illustrate the disadvantages of multicast blocking in fat-tree DCNs, a simple traffic pattern in a small fat-tree DCN is depicted in Figure 3. Suppose that there are two multicast flow requests,  1 and  2 , and every flow request looks for available links by identical scheduling algorithm.Both flow  1 and flow  2 have a source server and two destination servers located at different edge switches, and the sum of both is greater than the available link bandwidth.In particular, flow  1 and flow  2 forward through core switch 1 at the same time and are routed from core switch 1 to edge switch 2 through the same link by the scheduling algorithm, which will cause heavy blocking at the links connected to core switch 1.Therefore, the available bandwidth to each flow will suffer further reduction if the scheduler cannot identify heavy multicast blocking in the fat-tree DCNs. Figure 3 also explains the main reason of multicast blocking.We can see that multicast blocking has occurred at the link between core switch 1 and edge switch 2. Clearly, before the blocking at the link is alleviated, other links cannot release the occupied bandwidth.This means that the links from edge switch 1 to core switch 1, from edge switch 1 to core switch 2, from core switch 2 to edge switch 3, and from edge switch 3 to core switch 1 are released until the multicast blocking is alleviated.However, the fat-tree DCNs cannot accept the long time to address the blocking due to the requirement for low latency.
In the fat-tree DCNs, different source servers may execute scheduling algorithm in the same time so that they may occupy the same link and the multicast blocking will inevitably occur.Hence, the multicast blocking is a common phenomenon in the applications of DCN so that network performance will be reduced.In addition, there are also many servers as hotspots of user access, which may cause data flow transfer by many to one.In fact, the key reason of multicast blocking is that the network link state at next time-slot is not considered.Several works have been proposed to solve the network blocking in the transmission of multicast packets in DCNs [12,13].As data centers usually adopt commercial switches that cannot guarantee network nonblocking, an efficient packet repairing scheme was proposed [12], which relies on unicast to retransmit dropped multicast packets caused by switch buffer overload or switching failure.Furthermore, the bloom filter [13] was proposed to compress the multicast forwarding table in switches, which avoids the multicast blocking in the data center network.
To the best of our knowledge, the exiting multicast scheduling algorithms only considered the network state at the current time-slot in DCNs; thus the delay error between the algorithm execution time and the beginning transferring time of data flow will make the scheduling algorithm invalid.Based on the consideration, we focus on the study of the multicast scheduling in the network state at next time-slot based on Markov chains.

Model and Probability of Multicast Blocking
In the section, we first establish the multicast blocking model based on the topology of fat-tree DCNs by using a similar approach.Then we deduce the blocking probability of available downlinks at next time-slot.

Multicast Subnetwork.
A multicast bandwidth request corresponds to a multicast subnetwork in fat-tree DCNs, which consists of available core switches and edge switches for the multicast bandwidth request.The multicast subnetwork in Figure 4 has  destination edge switches,  available core switches, and  ×  servers, where 1 ≤  ≤ .In the process of multicast connection, the link weight of multicast subnetwork is denoted as the blocking probability at next time-slot.Thus our goal is to obtain the link blocking probability for any type of multicast bandwidth request at next time-slot.

Multicast flow
It is known that the fat-tree DCN is a typical large-scale network, where there are many available links that can meet the multicast connection request.When a link is available for a multicast bandwidth request , the blocking probability of the link at the current time-slot is given by  = /, where  is the remaining bandwidth.
A multicast connection can be represented by the destination edge switches.Given a multicast bandwidth request  with fanout  (1 ≤  < ), () indicates the blocking probability for this multicast connection.We denote the blocking of available uplink  as the events  1 ,  2 , . . .,   , and the blocking of available downlinks between available core switches and the th (1 ≤  ≤ ) destination edge switches as the events  1 ,  2 , . . .,   .All available links form a multicast tree rooted at the core switches that can satisfy the multicast connection in the multicast network.Other notations used in the paper are summarized in Notations.

Multicast Blocking Model.
In the multicast subnetwork, we employ  to express the event that the request of multicast connection with fanout  cannot be satisfied in the network shown in Figure 4. We do not consider the links whose remaining bandwidth is less than multicast bandwidth request , since the link is not available when the multicast data flow  goes through the link.We let ( | ) be the conditional blocking probability of state  and () be the probability of state .Then the blocking probability of subnetwork for a multicast connection is given by For the event , the data traffic of the uplinks does not interfere with each other; that is, the uplinks are independent.Therefore, we have () =    − .
From the multicast blocking subnetwork in Figure 4, we can obtain the blocking property of the fat-tree DCNs; that is, the multicast bandwidth request  from a source edge switch to distinct destination edge switches cannot be achieved if and only if there is no any available downlink connecting all destination edge switches.
In that way, we take   to denote the event that the multicast bandwidth request  with fanout  cannot be achieved in the available uplinks.Thus we can get An available downlink   , where 1 ≤  <  and 1 ≤  ≤ , represents a link from a core switch to the th destination edge switch.The event   can be expressed by events   's as follows: Afterwards, we define that the blocking of downlinks connecting to each destination edge switch is event  = { 1 ,  1 , . . .,   }; moreover, we have Based on the theory of combinatorics, the inclusionexclusion principle (also known as the sieve principle) is an equation related to the size of two sets and their intersection.For the general case of principle, in [14] For the events  1 ,  1 , . . .,   in a probability space (Ω, , ), we can obtain the probability of the event

𝑃 (𝜖
where (  ) denotes the probability of the event   .
Combining ( 1) and ( 2) with ( 6), the multicast blocking model for a multicast connection with fanout  is given by From ( 6), ∑ 1≤<≤ (  ∩   ) ≥ ∑ 1≤<<ℎ≤ (  ∩   ∩  ℎ ), the following inequality can be derived: Therefore, the minimum blocking probability of the event where ( 1 ) = ∏  =1  1 .Afterwards, we define  min () as the minimum blocking probability of multicast subnetwork, and the number of available core switches is .Thus we get where  ≤ .It is not difficult to find from (10) that the minimum blocking probability  min () is an increasing sequence with fanout .In other words, it is more difficult to realize a multicast bandwidth request with larger fanout since the number of core switches is less.Therefore, the minimum blocking probability with fanout  reflects the state of available link at next time-slot.

Link Blocking Probability at Next Time-Slot.
In this subsection, we calculate the blocking probability of available link at next time-slot based on Markov chains theory.We randomly select a link denoted by the th link to analyze.
In the multicast blocking model, we denote the current time-slot as , and the next time-slot as  + 1.   is the th link occupied bandwidth at time-slot ; that is,   () =   .() is the sum of occupied bandwidth of all available downlinks at time-slot ; namely, () = ∑  =1   (), and   ( + 1) refers to predicted occupied bandwidth of the th link at time-slot +1.In [15], the preference or uniform selection mechanism based on Markov chains is adopted for calculating the link blocking probability at next time-slot.Based on the mechanism, the probability   of the link incoming new flow at time-slot  + 1 can be given by where 1 ≤  ≤ .
In addition, we do not consider the case that the bandwidth of available link is decreasing; namely, the bandwidth of available link is enough for multicast bandwidth request.If a multicast bandwidth request selects the th link at time-slot  + 1, it means   ( + 1) will add   , where 1 ≤   ≤   ,   is defined as increasing the maximum number of data flows.Then we let    denote the probability of the th link flow remaining unchanged or increasing at time-slot  + 1; thus we can get where  = 1, 2, . . .,  and   = 0, 1, . . .,   .According to (12), we will calculate one-step transition probability of a multicast flow denoted as (  (+1) =   +  |   () =   ), which is a Markov process.

𝑃 (𝑦
where  = 1, 2, . . ., . In fact, (  ( + 1) =   +   |   () =   ) indicates the link blocking probability at time-slot  + 1, which is determined by   and   .The link blocking probability will be small when   is small at time-slot  + 1; otherwise, the link may be blocked at time-slot  + 1.Therefore, the range of   is very important to our proposed multicast scheduling algorithm.In this paper, we assume that the multicast bandwidth request  is one data flow unit, and   is an integral multiple of multicast bandwidth request .Select the core switch  and add it into the set ; (6) end if (7) end for (8) // Step 2: select appropriate core switches (9) Calculate the blocking probability of available downlinks at time-slot  + 1,   ( + 1), by equation ( 13); (10) for  = 1 to || do (11) Find the core switch(es) in  that are connected to a destination edge switch in ; (12) if There are multiple core switches to be found then (13) Select the core switch with the minimum blocking probability and deliver it to the appropriate set of core switches   ; (14) else (15) Deliver the core switch to the set   ; (16) end if (17) Remove destination edge switches that the selected core switch from  can reach;

Multicast Scheduling Algorithm with Markov Chains
In the section, we will propose a multicast scheduling algorithm with Markov chains (MSaMC) in fat-tree DCNs, which aims to minimize the blocking probability of available links and improve the traffic efficiency of data flows in the multicast network.Then we give a simple example to explain the implementation process of MSaMC.

Description of the MSaMC.
The core of MSaMC is to select the downlinks with minimum blocking probability at time-slot +1.Accordingly, the first step of the algorithm is to find the available core switches, denoted as the set , || ≤ .
We take the remaining bandwidth of the th uplink as    .Based on our theoretical analysis in Section 5, the multicast subnetwork may be blocked if it is less than 3; that is,    ≥ 3.
The second step is to choose the appropriate core switch which is connected to the downlink with minimum blocking probability at time-slot  + 1 in each iteration.At the end of the iteration, we can transfer the core switches from the set  to the set   .The iteration will terminate when the set of destination edge switches  is empty.Obviously, the core switches in the set   are connected to the downlinks with minimum blocking probability.And the set   can satisfy arbitrary multicast flow request in fat-tree DCNs [5].
Based on the above steps, we will obtain a set of appropriate core switches   .Moreover, each destination edge switch in  can find one downlink from the set   to be connected with the minimal blocking probability at time-slot  + 1.The third step is to establish the optimal path from source edge switch to destination edge switches through the appropriate core switches.The state of multicast subnetwork will be updated after the source server sends the configuration signals to corresponding forwarding devices.
The main process of the MSaMC is described in Algorithm 1.

An Example of the MSaMC.
For the purpose of illustration, in the following, we give a scheduling example in a simple fat-tree DCN as shown in Figure 5. Assume that we have obtained the network state at time-slot  and made a multicast flow request (1, (2, 3, 4), 50).The link remaining bandwidth  and link blocking probability  at next timeslot are shown in Tables 1 and 2, respectively.The symbol √ denotes available uplink and × indicates unavailable link.
For clarity, we select only two layers of the network and give relevant links in each step.As described in Section 4.1, the MSaMC is implemented by three steps.Firstly, we take the remaining bandwidth of the uplink as   (   ≥ 3 × 50) and find the set of available core switches; that is,  = {2, 3, 4}.Secondly, we evaluate the blocking probability of relevant downlinks at time-slot +1.In  effect, the blocking probability of downlink at time-slot  + 1 from core switch 2 to destination switch 2 is higher than that from core switch 3 to destination switch 2; therefore, we select the latter downlink as the optimal path.Subsequently, the core switch 3 is put into the set   .Similarly, we get the core switch 4 for the set   .Finally, the optimal path is constructed and the routing information is sent to the source edge switch 1 and core switches (3,4).
In Figure 5(a), the link remaining bandwidth from edge switch 1 to core switch 1 is no less than 150.By the above way, we find that the optimal path for a pair of source edge switch and destination edge switch is source edge switch 1 → core switch 3 → destination edge switch 2, source edge switch 1 → core switch 4 → destination edge switch 3, and source edge switch 1 → core switch 4 → destination edge switch 4, as shown in Figure 5(b).

Theoretical Analysis
In the section, we analyze the performance of MSaMC.By (9), we derived the blocking probability bound of multicast subnetwork, as shown in Lemma 1.

Lemma 1.
In a multicast subnetwork, the maximum subnetwork blocking probability is less than 1/3.Proof.We take the remaining bandwidth of uplink to be no less than 3 by the first step of Algorithm 1, and thus the maximum value of link blocking probability  is 1/3; in other words, the available link remaining bandwidth just satisfies the above condition; that is,   = 3.
From ( 9) and De Morgan's laws [16], we can obtain the probability of event Therefore, based on (10), the subnetwork blocking probability is maximum when the number of uplinks is 1.Thus we can obtain Then we have max  min () = 1/3 as  → ∞.This completes the proof.
The result of Lemma 1 is not related to the number of ports of switches.This is because the deduction of Lemma 1 is based on the link blocking probability ,  = /.However, the multicast bandwidth  and the link remaining bandwidth  will not be affected by the number of ports of switches.Therefore, Lemma 1 still holds when the edge switches have more ports.Moreover, the size of switch radix has no effect on the performance of MSaMC.
At time-slot  + 1, the data flow of available link will increase under the preference or uniform selection mechanism.In addition, the blocking probability of available link should have upper bound (maximum value) for guaranteeing the efficient transmission of multicast flow.Based on (7) and Lemma 1, we can get max   = 1/3 when the number of uplinks and downlinks are equal to 2, respectively.Clearly, this condition is a simplest multicast transmission model.In real multicast network, satisfying   ≪ 1/3 is a general condition.
In addition,   is proportional to (  ( + 1) =   +   |   () =   ); namely, the link blocking probability will increase as the multicast flow gets larger.Therefore, (  ( + 1) =   +   |   () =   ) is monotonously increasing for   .Theorem 2. As the remaining bandwidth of available link  is no less than 3, the multicast flow can be transferred to  destination edge switches.
Proof.For each incoming flow, by adopting the preferred selection mechanism in selecting the th link, when   ≥ 1, Complexity we compute the first-order derivative of (13) about   , where  = 1, 2, . . ., .

𝜕 𝜕𝑝
In ( 16), the third term is more than zero, and the second term is greater than the absolute value of the first term when   ≥ 3; hence, we can obtain (  (+1) =   +  |   () =   ) > 0. Therefore, (  (+1) =   +  |   () =   ) is monotonously increasing function for   when   ≥ 3. The multicast flow request  is defined as one data unit; evidently,   ≥ 3.In other words, the remaining bandwidth of available link can satisfy the multicast bandwidth request  at time-slot  + 1 if  ≥ 3.This completes the proof.
On the basis of Theorem 2, the first step of Algorithm 1 is reasonable and efficient.The condition with  ≥ 3 not only ensures the sufficient remaining bandwidth for satisfying the multicast flow request but also avoids the complex calculation of uplink blocking probability.However, the downlink has data flow coming from other uplinks at any time-slot, which results in the uncertainty of downlink state at time-slot  + 1.Therefore, we take the minimum blocking probability at timeslot  + 1 as the selection target of optimal downlinks.
Due to the randomness and uncertainty of the downlink state, it is difficult to estimate the network blocking state at time-slot  + 1. Afterwards, we deduce the expectation that the th downlink connects to the th destination edge switch at time-slot  + 1, denoted by   (,   ),  = 1, 2, . . ., .Given that the data flow in the th downlink is   , we can obtain where By (17), we conclude the following theorem which explains the average increase rate of data flow at each downlink.
Theorem 3. In a fat-tree DCN, the increased bandwidth of downlink is no more than two units on the average at time-slot  + 1.
When   <   (,   ) −   + 1, the number of increased data flows is larger than   ; however, it is not allowed by the definition of   ; thus we can obtain  21) is very large, the blocking probability of downlink is higher, vice versa.To clarify the fact that the downlink has lower blocking probability at next time-slot, we have the following theorem.Proof.Based on (21), we take the minimum value of   as 2.
Thus we get This completes the proof.
In order to show that the MSaMC manifests the lower blocking probability of downlink at time-slot  + 1 under the different values of   , we provide the following comparison as shown in Figure 6.
In Figure 6, (  ( + 1) >   (,   ) |   () =   ) indicates the downlink blocking probability, and their values are not more than 0.125 for different   and   .At the zero point, the blocking probability is close to zero unless   > 0.1.In real network, the condition of   > 0.1 is rarely.Therefore, the MSaMC has very lower blocking probability.
In the following, we analyze the time complexity of MSaMC.The first step of MSaMC takes the time complexity of () to identify available core switches.In the second step, the MSaMC needs to find the appropriate core switches.We need ( ⋅ ) time to calculate the blocking probability of available downlinks at time-slot  + 1 and select the appropriate core switches to the set   , where  ≤  − 1.In the end, we take ( + ) time to construct the optimal paths from source edge switch to destination edge switches.Thus the computational complexity of MSaMC is given by  ( +  ⋅  +  + ) ≤  ( + ( − 1) 2 + 2 ( − 1)) =  ( 2 +  − 1) .

(23)
Note that the complexity of the algorithm is polynomial with the number of core switches  and the number of edge switches , which means that the computational complexity is rather lower if the fanout  is very small.Therefore, the algorithm is time-efficient in multicast scheduling.

Simulation Results
In this section, we utilize network simulator NS2 to evaluate the effectiveness of MSaMC in fat-tree DCNs in terms of the average delay variance (ADV) of links with different time-slots.Afterwards, we compare the performance between MSaMC and SLMR algorithm with the unicast traffic [4] and present the comparison between MSaMC and BCMS algorithm with the multicast traffic [5].
6.1.Simulation Settings.The simulation network topology adopts 1024 servers, 128 edge switches, 128 aggregation switches, and 64 core switches.The related network parameters are set in Table 3.Each flow has a bandwidth demand with the bandwidth of 10 Mbps [4].For the fat-tree topology, we consider mixed traffic distribution of both unicast and multicast traffic.For unicast traffic, the flow destinations of a source server are uniformly distributed in all other servers.The packet length is uniformly distributed between 800 and 1,400 bytes and the size of each multicast flow is equal [17,18].

Comparison of Average Delay Variance.
In this subsection, we first define the average delay variance (ADV) and then compare the ADV of the uplink and downlink by the different number of packets.
Definition 5 (average delay variance).Average delay variance (ADV)  is defined as the average of the sum of the transmission delay differences of the two adjacent packets in a multicast subnetwork; that is, where  is the number of available links,  is the number of packets in an available link, and () indicates the transmission delay of packet at time-slot .
WE take ADV as a metric for the network state of multicast subnetwork.The smaller the ADV is, the more stable the network state is, vice versa.
Figure 7 shows the average delay variance (ADV) of links as the number of packets grows.As the link remaining bandwidth  is taken as  or 2, the average delay variance has bigger jitter.This is because the link remaining bandwidth cannot satisfy the multicast flow request  at time-slot  + 1.
The average delay variance is close to a straight line when the link remaining bandwidth is 3, which implies that the network state is very stable.Therefore, the simulation result manifests that the optimal value of the link remaining bandwidth  is 3.
From Figure 8, we observe that the jitter of uplink ADV is smaller than that of the downlink ADV.This is because the fat-tree DCN is a bipartition network; that is, the bandwidth of the uplink and downlink is equal.However, the downlink load is higher than the uplink load in the multicast traffic; therefore, the uplink state is more stable.

Total Network Throughput.
In the subsection, we set the length of time-slot  as / and 2(/).We can observe from the Figure 9(a) that MSaMC achieves better performance than the SLMR algorithm when the length of time-slot  is 2(/).This is because MSaMC can quickly recover the network blocking, and thus it can achieve higher network throughput.In addition, the MSaMC cannot calculate the optimal path in real time when the length of time-slot  is /; therefore, the SLMR algorithm provides the higher throughput.
Figure 9(b) shows throughput comparison of MSaMC and BCMS algorithm under mixed scheduling pattern.The throughput of BCMS algorithm is lower as the simulation time increases gradually.The multicast transmission of BCMS algorithm needs longer time to address the problem of network blocking; therefore, the throughout will decrease sharply if the network blocking cannot be predicted.In contrast, the MSaMC can predict the probability of network blocking at next time-slot and address the delay problem of dynamic bandwidth allocation.Therefore, the MSaMC can obtain higher total network throughput.

Average Delay.
In this subsection, we compare the average end-to-end delay of our MSaMC, SLMR algorithm with the unicast traffic, and BCMS algorithm with mixed traffic over different traffic loads.Figure 10 shows the average end-to-end delay for the unicast and mixed traffic patterns, respectively.
We can observe from Figure 10 that, as the simulation time increases gradually, the MSaMC with  = 2(/) has the lowest average delay than SLMR and BCMS algorithms for the two kinds of traffic.This is because SLMR and BCMS algorithms utilize more backtracks to eliminate the multicast blocking; therefore, they take more time to forward data flows to destination edge switches.In addition, we can also find that when the length of the time-slot is 2(/), our MSaMC has the minimum average delay.This is because the time-slot with length 2(/) can just ensure that data can be transmitted accurately to destination switches.The shorter time-slot with less than 2(/) will lead to the incomplete data transmission while the longer time-slot with more than 2(/) will cause the incorrect prediction for traffic blocking status.

Conclusions
In this paper, we propose a novel multicast scheduling algorithm with Markov chains called MSaMC in fat-tree data center networks (DCNs), which can accurately predict the link traffic state at next time-slot and achieve effective flow scheduling to improve efficiently network performance.We show that MSaMC can guarantee the lower link blocking at next time-slot in a fat-tree DCN for satisfying an arbitrary sequence of multicast flow requests under our traffic model.In addition, the time complexity analysis also shows that the performance of MSaMC is determined by the number of core switches  and the destination edge switches .Finally, we compare the performance of MSaMC with an existing unicast scheduling algorithm called SLMR algorithm and a well-known adaptive multicast scheduling algorithm called

Figure 1 :
Figure 1: The relationship between network blocking rate (NBR) and scheduling algorithm fault rate (SAFR) in different DCNs.

( 18 )
Update the set of remaining core switches in ; (19) end for (20) // Step 3: establish the optimal pathes (21) Connect the links between source switch and destination edge switches through appropriate core switches in the set   ; (22) Send configuration signals to corresponding devices in multicast subnetwork; Algorithm 1: Multicast scheduling algorithm with Markov chains (MSaMC).

Figure 5 :
Figure 5: An example of the MSaMC.

Figure 7 :Figure 8 :
Figure 7: Average delay variance (ADV) comparison among the link of different remaining bandwidth.

Table 2 :
The link probability at next time-slot (%).
(b) The selected optimal paths by the MSaMC