Information Propagation in Online Social Network Based on Human Dynamics

and Applied Analysis 3


Introduction
Rapid development of information and communication technology has increased the wide adoption of online social network in our life.Indeed, online social network such as Sina Microblog, Twitter, and Facebook had become an indispensable part of our life.Every day we sign into our homepages more than once to view and share information.These online social networks have common characteristics: instantaneity, simplicity, and universality.Taking Sina Microblog, for example, unlike the traditional blog, it allows the use of mobile devices to disseminate information by a length of 140 characters text at anytime and anywhere.Investigating the online social network is crucial in a broad range of settings from information propagation and viral marketing to political purposes.
Recent years, online social network as a platform for the empirical study of information has been widespread concern [1][2][3][4].Despite the progresses that have been made, the empirical study of information propagation is still in its infancy.Studies in this direction have been mostly hindered by the shortcoming of available large-scale data.However, the availability of large-scale data from online social network has recently created unprecedented opportunities to explore the impact of human behaviors on the information propagation.
Firstly, information propagation in online social network is determined by rhythms and activity patterns of human [5,6].An increasing number of recent measurements indicate that human activity patterns are heterogeneous and bursty [7][8][9][10][11].If only considering the time interval between events, these human activity patterns are often described by a powerlaw interevent time distribution () ∼  − , where  is the time interval between two consecutive activities [12].Recently, the researchers began to realize that the bursty human behavior has an important impact on the dissemination of information [13,14].
Secondly, the wide adoption of online social network has increased the competition among information for our limited attention.Every day we receive a lot of information from various online social networks.However, we do not have enough time and attention to disseminate each message which we received.It is an interesting question that whether such a competition may affect the velocity of information propagation.The issue of limited attention has been studied through messages posted and forwarded in online social networks [15,16].However, how limited attention affects velocity of information propagation is still unclear.
In this paper, we propose an extended Susceptible-Infected (SI) propagation model, incorporating bursty human activity patterns and limited attention for the first time.Then, we obtain a large number of real data to test the model.Adopting the methods of theoretical research and empirical analysis, we study the information spreading process in social networking qualitatively and quantitatively.The key contributions of this study are summarized as follows.
(1) From the empirical statistical results we find that at the group level, the interactive time (time interval between two consecutive login microblog homepage) follows power-law distribution with the slope ≈ 2.5.
And the distribution of newly infected individual (calculate as the number of new forwarding per day) follows power-law with the slope ≈ 1.5.Two slope values satisfy the relationship 2.5 − 1.5 ≈ 1.0.
(2) Through both the theoretical research and simulation, we prove that (a) if the generation time distribution follows power-law with exponent , then the decay of propagation velocity will be characterized by the same power-law distribution; (b) if bursty human behavior follows a power-law distribution with exponent , the decay of propagation velocity also follows a powerlaw with exponent  ≈  − 1.
In summary, although tremendous efforts have been made regarding the research about information propagation, further study based on human dynamics is still needed to unveil the role of human behaviors for the information propagation in online social network.In future studies, on the other hand, we can use other more mature theories to research the spreading dynamics, such as in the references [17,18].
The rest of this paper is organized as follows.Section 2 gives the data description.In Section 3, we propose the extended SI model.In Section 4, we present simulation results and observations.Section 5 introduces theoretical analysis.Finally, in Section 6, we conclude the work.

Data Description
The dataset of this paper was collected from Sina Microblog (http://www.weibo.com/),one of the most popular microblog platforms in China at present.The dataset includes 345,095 messages from 41667 individuals during 2009/8/16 to 2011/6/4, collected by snowball sampling.These messages have been forwarded 203,997,094 times and triggered 58,617,139 comments.For each message, message ID, releasing time, times of forwarding, and number of comments were recorded.For each individual, the individual ID and the timing of individual sign in his/her microblog homepage were recorded.
The basic statistical results show that at the group level, the interactive time (time interval between two consecutive login microblog homepage) follows power-law distribution with the slope ≈ 2.5 (Figure 1(a)).And the distribution of newly infected individual (calculate as the number of new forwarding per day) follows power-law with the slope ≈ 1.5 (Figure 1(b)).If set the slope of interactive time distribution is  and the slope of newly infected individual distribution is , we find that there is the relationship  ≈ −1 between two slopes.

Model Description.
In this paper, we use the branching processes [19,20] in conjunction with power-law human behaviors to describe the process of information propagation.We adopt the Susceptible-Infected (SI) propagation model for the simulation of information propagation in online social networks.Similar to the classical SI model, the population is divided into two states, either susceptible (S) or infected (I).In the information propagation model, however, the susceptible individual is defined as the one who has not yet known a piece of message, and the infected individual is defined as the one who knows the message and shares the message with his/her friends.After being infected, an individual will never return to susceptible state.At time , there are () susceptible individuals and () infected individuals, and the population  = () + ().
Initially all individuals are susceptible except for a single infected individual.Different with the traditional model, at a given time step, an infected individual can be inactive; that is to say, infected individual will not infect connected susceptible individuals at that time step.The time interval between two consequent active steps of an infected individual is defined as the interactive time, which is often characterized by a power-law distribution () ∼  − at the group level.Meanwhile, different individuals have different active time interval and each individual  acts with an unchanged interactive time   .
On the other hand, the advent of online social network has greatly lowered the cost of information generation and propagation, boosting the potential reach of each message.However, the abundance of information to which we are exposed through online social networks is exceeding our capacity to consume it.Due to the limited time and attention, the individual cannot continuously check the update of information on his/her homepage.We assume that individuals interact on a directed online social network.Each individual is equipped with two lists.One is the screen where received messages are recorded and maintained a time-ordered list of messages.The other is memory where individual interested messages are recorded.Each individual can share some of the messages from the list with his/her friends.The friends in turn pay attention to a newly received message by placing it at the top of their lists.Because of the limited attention, we allow messages to survive in an individual's screen for a finite amount of time .Meanwhile, we assume that each individual only forwards each message once, and then the individual loses interest in the message.In addition, if the individual no forwarding the message within , the individual will no longer be concerned about the message and delete it from the screen.Each message may attract the individual's attention with probability ; that is to say, the individual will forward the message with probability .

SI Model Based on Bursty and Limited Attention.
According to the previous description, the SI model incorporating bursty and limited attention is illustrated in Figure 2. We characterize the timing of information propagation by the generation time Δ, which is defined as the time interval between the forwarding of an individual and the forwarding of his/her followers.
To sum up, the extended SI model is defined as follows.
Step 1.At time step  =   , an individual  posts a message.Meanwhile, individual  receives the message, where  ∈   and   is the set of individual 's neighbors.
Step 2. For each individual , the first active time step is  0 ,  0 ∈ (  ,   +   ), and individual  will be active at the time steps  =  0 +   ,  = 1, 2, 3, . .., where   is the active time interval of individual .
Step 3. At each active time step, individual  will forward the message with the probability .If individual  forwards the message at the time step   , we obtain the generation time Δ =   −   and generation time must satisfy the condition Δ < .
Step 4. Update the time step  =   and repeat Step 1 to Step 3 until the preset time steps.
In addition, we also introduce two indicators to characterize the velocity of information propagation: (1) the first time step when the number of infected individuals exceeds half of the population, defined as half time  * ; (2) the mean infection time of an individual after the outbreak, defined as mean time   = ∑  max  = 0 (()/), where  max is the maximum simulation step, such as in our simulation  max = 10 4 .

Simulation Results and Observations
In our simulations, initially all individuals are susceptible except for a single infected individual.Each individual  has an unchanged interactive time   , which follows power-law distribution () ∼  − with 2 <  < 3. We set  = 1440 time steps.This is because messages will survive in an individual's list one day, namely, 1440 minutes [15].Simulations were performed on a BA network with size  = 10 4 and ⟨⟩ = 10.We set the degree of attention  = 0.5 and randomly select an initial infected node.For detailed comparison, we also performed the same SI dynamics with exponential interactive time distribution () ∼  − .From the numerical simulation results (Figures 3 and 4), we have the following observations of the propagation process.Observation 1.In power-law case, the average number of newly infected individuals () and the generation time (Δ) follow power-law distributions with the exponent  ≈  − 1 (Figure 3).In both panels, the black lines have slopes −1.8, −1.5, and −1.2.The results show that () and (Δ) decay as a power law with the exponent  ≈  − 1.In the exponential case, () decays fast, in stark contrast to the power-law case.The results are the average over 2 × 10 3 independent runs.Observation 2. The smaller the exponent  of interactive time distributions, namely, the larger heterogeneity of interactive time, resulting in the slower velocity.The half time  * and mean time   monotonic decrease with the increase of exponent  (Figure 4).In order to investigate the impact of attention on the propagation process, we fixed interactive time following powerlaw distribution with the exponent  = 2.5 and randomly select an initial infected node.From other parameters  = 1440, simulations were also performed on a BA network with size  = 10 4 and ⟨⟩ = 10.The results are averaged over 2 × 10 3 independent runs.From the numerical simulation results (Figure 5), we have the following observation of the propagation process.
Observation 3. The higher the degree of attention, the faster the velocity.The half time  * and mean time   monotonic decrease with the increase of attention  (Figure 5).

Theoretical Analysis
In this section, the properties of propagation dynamics are analyzed.We prove that the decay exponent of propagation velocity equals that in the generation time distribution.Furthermore, we also proved that the exponent  characterizing the bursty is related to that in the decay of propagation velocity  by the relation  =  − 1.
Proposition 1.If the distribution of generation time follows power-law (Δ) ∼ Δ − with 1 <  < 2, the decay of propagation velocity also follows power-law () ∼  − and with the same exponent .
Proof.We consider a general theory of propagation process in online social networks.We assume that the propagation process outbreaks starting from a single infected individual at time  = 0.In this case, the average number of new infected individuals at time  is [19] where   is the average number of individuals at generation  away from the first infected individual, where * denotes the convolution operation; for example,  (0) *  (1) () = ∫  0  (0) () *  (1) (1 − ) .
This preposition means that if the generation time distribution follows a power-law with the exponent , then the decay of propagation velocity will be characterized by the same power-law distribution.Proposition 2. If the distribution of interactive time follows a power-law () ∼  − with 2 <  < 3, the decay of propagation velocity also follows a power-law distribution () ∼  − with 1 <  < 2 and  =  − 1.
Thus, the proposition is proved.

Conclusion
An extended SI model is proposed in this paper.Different from the analysis of the network topology, we study the information propagation in online social networks from the perspective of human dynamics.We found that human behavior affects the range and velocity of information propagation greatly.
In the future, with the development of online social systems, there may be other factors influencing information propagation in online social network.Therefore, we must improve the propagation model in order to better explain the propagation process.

Slope = 2 Figure 1 :
Figure 1: Empirical data.The distribution of interactive time at the group level.(b) The distribution of newly infected individuals, inset: the cumulative distribution of newly infected individuals, namely, the distribution of all infected individuals.The results are the average of all messages.

Figure 4 :
Figure 4: (a) The fraction of infected nodes  with different exponent .(b) The half time  * and the mean time   as the functions of exponent .

Figure 5 :
Figure 5: (a) The fraction of infected nodes  with different attention .(b) The half time  * and mean time   as the functions of attention .
Since the generation time probability density function is related to the interactive time probability density function[21], therefore we have  (