Optimal Control Strategy for Traffic Driven Epidemic Spreading Based on Community Structure

It is shown that community structure has a great impact on traffic transportation and epidemic spreading. The density of infected nodes and the epidemic threshold have been proven to have significant relationship with the node betweenness in traffic driven epidemic spreading method. In this paper, considering the impact of community structure on traffic driven epidemic spreading, an effective and novel strategy to control epidemic spreading in scale-free networks is proposed. Theoretical analysis shows that the new control strategy will obviously increase the ratio between the first and the second moments of the node betweenness distribution in scale-free networks. It is also found that the more accurate the community is identified, the stronger community structure the network has and the more efficient the control strategy is. Simulations on both computer-generated and real-world networks have confirmed the theoretical results.


Introduction
In the past few years, lots of epidemics among humans, animals, and plants caused an enormous amount of damage and loss.Since lots of real-world networks can be properly described as complex networks with nodes representing individuals and edges denoting the interactions among them, the disease outbreaks in biological systems can be viewed as the epidemic spreading on complex networks.With the booming development of complex networks theory, a number of models have been proposed to characterize the epidemic spreading [1][2][3][4][5][6][7][8][9][10][11].In those most extensively studied models, they assume that the spreading is driven by reaction processes, which occurs from every infected node through all its neighbours at each time step.Some disease contagions, however, are found to interact only with a small subset of their neighbours: hubs do not always interact with all their neighbours at the same time [12], and individuals in a social network do not interact simultaneously with all of their acquaintances [13].A novel approach called traffic driven epidemic spreading is introduced to investigate the outcome of epidemic spreading process driven by traffic flows [7,[14][15][16].Through theoretical predictions and extensive numerical simulations, it is shown that the traffic driven epidemic spreading depends directly on flow conditions, in particular on the node betweenness distribution.(Betweenness is a measure of the extent to which a node lies on the paths between others).
Accompanied with the continuing study of complex networks, community structure, a common feature of many networks [17][18][19][20][21][22][23] (which is the tendency for nodes to divide into subsets within which node-node connections are dense but between which connections are sparser) is proven to have the impact on information transfer capacity and epidemic spreading [24][25][26][27][28].We have also proposed a novel routing strategy based on community structure which can reduce the betweenness of the nodes at the edge of the communities by minimizing the number of the communities with the shortest path passing through to enhance the packet delivery capability.
Aiming at controlling the epidemic spreading, we propose a control strategy to reshape the distribution of node Mathematical Problems in Engineering betweenness based on community structure.And we disclose that the strategy increases the average routing length and the average node betweenness so that it has no effect on homogeneous networks.However, it has obvious effects on controlling the epidemic spreading in scale-free networks because it can enrich the ratio between the first and the second moments of the node betweenness distribution.

Models
The routing dynamics model can be described as follows: all the nodes can create packets with addresses of destination, receive packets from other nodes, and route packets to their destinations; at each time step, a packet is generated with probability  with randomly chosen origins and destinations, and all the packets are forwarded one step toward their destinations according to the routing strategy; a packet, upon reaching its destination, is removed from the system.And in this paper, two distinct widely used routing strategies are presented: shortest path routing strategy [29], SHT, which has the minimum number of nodes in the path; efficient path routing strategy [30], EFF, which has the minimum number of the sum of node degrees in the path.
How to divide network into communities is of the first importance.We use the extremal optimization-based algorithm presented by Duch and Arenas [19] to divide the network into  DA communities, while the GN algorithm [17] to divide network into a given single digit of communities.And to investigate the influence of the accuracy of community structure identification on the validity of the control strategy, we employ the modularity measure which is proposed in [17,18].Consider a particular division of a network into  communities, and let us define an  ×  symmetric matrix E whose element   is the fraction of all edges in the network that link nodes in community  to nodes in community .The modularity measure, , is defined as follows: where   is the sum of   for a certain , that is,   = ∑    .Different divisions lead to different  where the maximum of them is named as  max .The higher the modularity  max , the stronger the community structure the network has.In our strategy, we use DA algorithm [19] to divide network into  DA communities to get the highest modularity  max .

Analysis
In those widely used spreading models, an individual is represented by a node which can be classified in two or three possible states: susceptible (which will not infect others but may be infected), infected (which is infective), and recovered (which is recovered and has acquired immunization).We use SI model to discuss the density of infected nodes () and SIR model to investigate the epidemic threshold   in both homogeneous networks and heterogeneous networks such as scale-free networks.

Traffic Driven Epidemic Spreading in Homogeneous
Networks.SI model is used in the scenario that infected nodes remain always infective with only one step: susceptible → infected.Starting from an initial fraction of infected individuals,  0 , the infection spreads in the network through packet exchanges.A susceptible node has a probability  of becoming infected every time it receives a packet from an infected neighbour.After a transient time, we compute the average density of infected nodes, (), which is the prevalence of epidemic in the network.The approximate results of the density of infected nodes () in homogeneous networks can be obtained using mean-field theory as follows: Equation ( 2) means that the average density of newly infected nodes is proportional to the effective spreading rate , the density of susceptible nodes that may become infected, 1 − (), the probability of a packet passing through a link pointing to an infected node, (), the total number of packets,  * , and the fraction of packet passing through node which is equal to the average betweenness, ⟨⟩.Equation ( 2) can be solved with the initial condition () =0 =  0 , When the epidemic begins spreading, the density of infected nodes is very small, and we can get that where the epidemic outbreak time scale of homogeneous networks is  SIR model is often used while nodes run stochastically through the cycle susceptible → infected → recovered.At each time step, a susceptible node has a probability V of becoming infected every time it receives a packet from an infected one.At the same time, the infected nodes will be cured and will return to the recovered state with probability .An effective spreading rate  = V/ is defined as the effective infected probability.Without lack of generality, we can set  = 1, since it only affects the definition of the time scale of the disease propagation.Thus in homogeneous networks, we add the decaying item, which is proportional to the product of the curing rate  (here  = 1) and the average density of infected nodes (), to the right side of (2): After imposing the stationary condition of (6), ()/ = 0, we obtain the epidemic threshold of traffic driven SIR epidemic model in homogeneous networks:

Traffic Driven Epidemic Spreading in Scale-Free Networks.
And in scale-free networks, the equation is The right-hand side takes into account the probability that a node with  links belongs to the susceptible class represented by (1 −   ()) and gets the infection via packets travelling from infected nodes.The latter process is determined by the spreading probability , the number of packets that a node of degree  receives at each time step  *  *   , and the probability Θ() that a packet travels through a link pointing to an infected node.
Assume that the network is uncorrelated, Θ() takes the form When the epidemic begins spreading, the density of infected nodes   () is very small, and ( 8 Substituting ( 9) into ( 10), we get With the initial condition () =0 =  0 and we obtain And the epidemic outbreaks time scale of scale-free networks is We can also get the epidemic threshold of traffic driven SIR epidemic model in scale-free networks:

Simulation Results and Discussions
In all simulations, mods = 1 means the traditional shortest path routing strategy [29] or the traditional efficient path routing strategy [30].
At first, we employ a family of pseudorandom networks [18], since all other properties will be equivalent to fully random networks except the controllable varying strength of community structure.These networks are comprised of 128 nodes which are split into 4 communities of 32 nodes each.Each node has on average  in edges connecting it to nodes Then, we divide these networks into mods = 1 (no division of network, the traditional SHT or EFF strategy), 2, 4(actually,  DA = 4 in these networks), and 8 communities to test the validity of our control strategies by checking the average density of infected nodes.In all simulations, we generate 100 instances, and the result is the average of the 100 instances.What is more, the propagation is computed averaging over 100 different starting configurations in a certain instance.As shown in Figure 1, there is only minor difference between CSHT and CEFF in the homogeneous networks.And in every situation, when we use our control strategy with mods = 1, 2, 4( DA ), and 8, the average routing length is increasing, which means that the traffic will pass through more nodes.Consequently, the average node betweenness is increasing which results in the decline of epidemic outbreak time scale.And in simulations, there will be more infected nodes in the networks using our control strategy with mods = 4( DA ) than mods 1 (the traditional strategy) which means that our control strategy does not work in the homogeneous networks.
As (4) shows, epidemic spreading in homogeneous networks is proportional to the average betweenness ⟨⟩.In our strategy, the packet will pass through more nodes which will result in the increase of average betweenness.That is why our controlling strategy does not work in homogeneous networks.
Then, we use networks with  in = 8 and 15 to check the critical epidemic threshold of SIR model.Simulations are shown in Figure 2, and the results of using ( 7) are shown in Table 1.
From Figure 2(a) we can observe that when the spreading rate  is lower than 0.49, the infected nodes disappear; while it is up to 0.50, the infections can proliferate on the networks.It means that the critical epidemic threshold is between 0.49 and 0.50 which is in good agreement with the result of (7) in Table 1, so are the others.And the comparison between mods = 1 and mods = 4( DA ) also certifies that our control strategy does not work in the homogeneous networks.
Then, we survey the effectiveness of our strategy in scalefree networks.We generate scale-free network [31] with 100 nodes and BA parameter  = 2 and ER network [32] with 100 nodes and ER parameter  = 0.04.The two networks have the same node number and the same average degree ⟨⟩ = 4.As Figure 3 shows, epidemic spreads more quickly on the scale-free networks because of their heterogeneous structure.And CSHT can significantly reduce the average density of infected nodes in scale-free networks, when we divided the network into mods =  DA communities, while CSHT does not work in homogeneous networks.And CEFF is also ineffective in both situations.Then, we focus on the node betweenness of scale-free networks to check the effect of different control strategies.
Figure 4 provides insight into how the control strategy works by comparing the betweenness distributions with different parameter mods in the case of a BA network with 100 nodes and BA parameter  = 2. Figure 4(a) shows the betweenness plotted against the node index with mods = 1,  DA for CSHT, while Figure 4(c) shows histograms of the betweenness distribution.When mods = 1, which means that the packet will travel via the traditional shortest path or efficient path, the majority of the nodes have very low betweenness, but a small number of them are spread over a very wide range.When the network is divided into mods =  DA communities, node betweenness is confined to a narrow band, most of its upper edges are higher than previously shown in Figures 4(a) and 4(c).It means that there is a sharp increase in the average betweenness ⟨⟩ of the whole network, while the variance of the whole betweenness declines obviously more than the increase of the square of the average betweenness ⟨⟩ 2 , which results in the decrease of the mean square of the betweenness ⟨ 2 ⟩.As shown in Figures 4(b) and 4(d), when we divide the network into a different number of communities, the betweenness of every node shows a little change.That is why CEFF also does not work.
Then, we check the impact of node number , BA parameter , and the accuracy of identified communities on our CSHT strategy.We generate a series of networks with different node numbers but with the same BA parameter and divide each of them into mods = 2 communities using GN algorithm and mods =  DA communities using DA algorithm.As shown in Figure 5, when we divided the network into mods =  DA communities, the highest modularity measure  max is obtained, which means better community structure identification.
The relation between critical epidemic threshold and node number is shown in Figure 6(a).Then, a series of networks with the same node number but different BA parameter are generated to get the relation between critical epidemic threshold and BA parameter in Figure 6(b).
As Figure 6(a) shows, when we use CSHT strategy with mods =  DA , we can obtain a higher critical epidemic threshold than with mods = 2 which means that the better the community structure identification is, the more effective our CSHT strategy works.As we presented in Figure 5, when the node number is increasing with BA parameter fixed to 2, the community structure the network has is becoming stronger.Combining the pheromone that the greater the node number is, the more the epidemic threshold is increased in Figure 6(a), we believe that the stronger community structure the network has, the more effective our CSHT strategy is.The conclusion is also proven by Figure 6(b) because when BA parameter increases, the BA network has fuzzy community structure, which results in the increasing rate of epidemic threshold being reduced.
Finally, we test our CSHT control strategy on a realworld network.We choose the E-mail network with 1133 nodes [33] and use split it into mods = 2, 15 communities with GN, DA algorithm, respectively.The critical epidemic threshold of the E-mail network is enhanced from 0.0406 (the traditional SHT strategy) to 0.0411 and 0.0451.It means that our strategy also makes the effect in the real-world network.

Conclusion
Considering the impact of community structure, this paper has proposed a new control strategy to restrain the traffic driven epidemic spreading in scale-free networks.The characteristic of our CSHT strategy is to increase the ratio between the first and the second moments of the node betweenness distribution in scale-free networks by minimizing the number of the communities the routing path passes through.Firstly, it has been found that our CSHT strategy works well in the scale-free networks.Secondly, we have also found that if a network is divided into more reasonable and more accurate communities or the network has stronger community structure, the increase of epidemic threshold is more pronounced.At last, we apply our CSHT strategy on the E-mail network to show the validity of the strategy on realworld networks.

Table 1 :
(7)tical epidemic threshold using(7). out edges to nodes of other communities.While  in is varied, the value of  out is chosen to keep the average degree constant, and set to 16 in our paper.As  in is increased, the communities become better defined and easier to identify.Here, we use networks with  in = 8 (which is the same as a real random network with  max = 0.2162), 12 ( max = 0.5008), and 15 ( max = 0.6771).