Impact of Bursty Human Activity Patterns on the Popularity of Online Content

The dynamics of online content popularity has attracted more and more researches in recent years. In this paper, we provide a quantitative, temporal analysis about the dynamics of online content popularity in a massive system: Sina Microblog. We use time-stamped data to investigate the impact of bursty human comment patterns on the popularity of online microblog news. Statistical results indicate that the number of news and comments exhibits an exponential growth. The strength of forwarding and comment is characterized by bursts, displaying fat-tailed distribution. In order to characterize the dynamics of popularity, we explore the distribution of the time interval Δt between consecutive comment bursts and find that it also follows a power-law. Bursty patterns of human comment are responsible for the power-law decay of popularity. These results are well supported by both the theoretical analysis and empirical data.


Introduction
The advent of Web 2.0 and online social media 1 is fostering web-mediated brokers such as microblog and search engines, through which anyone can easily publish and promote content online.The dynamics of online content popularity has been deeply affected by the existence of these web-mediated brokers.The Web 2.0 and online social media not only change traditional communication processes with new types of phenomena, but also generate a huge amount of time-stamped data, making it possible for the first time to study the dynamics of online content popularity and human activity patterns at the global system scale.
Many human factors may affect the popularity of online content, which include human interests 2-4 , social identity 5, 6 , limited attention 7 , and memory effects 8 .In this paper, we focus on the impact of bursty patterns of human comment on the popularity of Sina Microblog http://www.weibo.com/9 .Temporal heterogeneous and bursty are widely observed in many human-activated systems, which may result from both endogenous mechanisms like the highest-priority first protocol and exogenous factors like the seasonality and heterogeneity of human activities 10 .This phenomenon was found in various human activity patterns such as instant messaging 11, 12 , web browsing 13, 14 , E-mail and surface mail 15, 16 , and mobile phone calls 17, 18 .If only regarding the timing of events, these human activity patterns are often described by a power-law distribution P τ ∼ τ −α , where τ is the time interval between two consecutive activities.Regarding this phenomenon, many mechanisms of human dynamics have been proposed to explain the temporal bursts, such as task priority mechanism 15 , memory effect mechanism 8, 19 , human interaction mechanism 20 , interest-driven mechanism 2, 3 , and social identity mechanism 5, 6 .
Here, the popularity of microblog news is defined as the number of comments per day posted for a piece of news.It is well documented that the statistical properties of the variable are very heterogeneous, with distribution following power-law.We explore the distribution of the time interval Δt between consecutive comment bursts and find it follows power-law.Furthermore, we prove that the exponent α, which characterizes the bursty patterns of human comment, is connected to that in the decay of popularity θ by the relation θ α − 1.
The rest of this paper is organized as follows.Section 2 gives the data set.In Section 3, we study the dynamics of online news popularity.Section 4 introduces the relationship between bursty patterns of human activity and online content popularity.The power-law distribution is verified in Section 5. Finally, in Section 6 we conclude the work.

Data Set
The data of this research is collected from Sina Microblog http://www.weibo.com/, which is one of the biggest microblogging platforms in China.We collected all news about a public topic, dated from August 20, 2009 to September 3, 2010, with the duration of 380 days.During this period, there are totally 125,150 pieces of news released, which have been forwarded 2,260,826 times and triggered 1,786,000 comments.For each piece of news, news ID, releasing time, times of forwarding, and number of comments were recorded.Therefore, we can track the dynamics of one specific piece of news through a unique ID.
From the statistical results, we find that during the data collecting time window, the number of news and comments exhibits an exponential growth Figure 1 .At the beginning of the observation time window, only a small amount of news and comments were posted.This is because Sina Microblog was just launched on August 14, 2009, and only a small group of people knew this application at that time.
The measurements also indicate that the number of forwarding and comments possess heterogeneity and burst.In our data set, among all the 125,150 pieces of news, 65772 pieces of news were forwarded and 69440 pieces of news were commented, respectively.As shown in Figure 2, the strength of forwarding S forward the number of forwarding for a piece of news and the strength comment S comment the number of comments for a piece of news follow power-law with the same exponent.From the figures, we can find that most news received few forwarding and comments, but only a small part of news has a lot of forwarding and comment.

The Dynamics of Online Popularity
In order to quantitatively analyze the popularity of online microblog news, we consider the number of comments per day posted for a piece of news, expressed by X t at time t.We study which represents the relative variation of the measurement in the time unit.Here, we use one day as the time unit, so X is the average value of comment strength.And X 1 means the number of comments posted for a piece of news in the first day after the news released.
The relative variation of comment in the time unit is shown in Figure 3.Most news experienced a burst and received little attention thereafter.Since the relative variation may be negative, indicating a decrease in popularity, but our main concern is the positive values, so the events with negative variation are neglected.
Another way to characterize the dynamics of bursty systems is to study the distribution of time intervals between successive events.We analyzed the time distribution between consecutive comment bursts 21 , namely, the time intervals between positive ΔX, shown in Figure 4.The intervals between bursts are distributed in a power-law.
We use maximum likelihood estimation MLE method 22 in conjunction with the Kolmogorov-Smirnov KS statistic table 23 to verify whether the fit is a good match to the data.In this case, the KS statistic suggests that the power-law curve is the better fit for the data, which will be explained in detail in Section 5.

Relationship between Bursty Human Activity Patterns and Popularity
In this section, we focus on the impact of bursty human activity patterns on the dynamics of popularity.First of all, we show that the number of comments for a piece of news X t can be derived from the comment patterns of users.Assume that a given piece of news is released at time t 0 and that all users can comment on it.The comment patterns are different from the browsing patterns 24 .Every user can comment on a piece of news more than once.In Figure 5, we show the comment patterns of one user, each vertical line represents a separate comment on the news.The thick line denotes the time when the user comments on the news for the first time after it was released at t 0 .The release time of the news t 0 divides the time interval Δt into two consecutive comments of length t 1 and t 2 , where t 1 t 2 Δt.The probability that a user comments at time t 2 after the news was released is proportional to the number of possible Δt intervals.For a user characterized by a power-law intercomment time distribution with exponent α and a minimum time unit of t m , the probability of finding an Δt interval having a length larger than t 2 is
For all users characterized by different exponents, the number of comment X t can be calculated analytically as the average of 4.1 over the observed exponent values: For simplicity, we assume that t m 1 and focus on the case that all users are characterized by the same exponent α.For example, the intercomment time distribution follows a power-law with exponent α 2.5 at the collective level Figure 7 .Hence, 4.2 can be written as Thus, we prove that the number of comments for a piece of news decay follows a powerlaw with the exponent α − 1, namely, the decays of popularity follows a power-law with the exponent α − 1.
In our data set, the number of comments X t for a piece of news follows a powerlaw with exponent θ 1.5 Figure 6 a .More than 80% of comments take place within the first day, then decay to only 10% on the second day, and finally reach a small amount after five days.Meanwhile, we statistic the lifetime of all news Figure 6 b and find the average lifetime of all news is 5.16 days.Distribution of interval between two consecutive comments at the collective level follows a power-law with exponent α 2.5 Figure 7 .
To sum up, we prove the fact that bursty human activity patterns deeply affected the popularity of news.Meanwhile, we conclude that the exponent α characterizing the bursty human activity patterns is connected to that in the decay of popularity θ by the relation: These results are supported by both the theoretical analysis and empirical data.

Testing the Power-Law Hypothesis
Recent empirical observations suggested that power-law distributions occur in many natural and man-made systems.Unfortunately, most previous empirical studies of powerlaw distributed data have not attempted to test the power-law hypothesis quantitatively.Instead, they typically rely on qualitative appraisals of the data, for instance, based on visualizations.
In this section, we use a goodness-of-fit test to tell whether the fit is a good match to the data.First, we fit our empirical data to the power-law model using the methods of maximum likelihood estimation MLE and calculate the KS statistic for this fit 22 .Next, we use the KS table 23 obtaining good basis to confirm or reject the power-law distribution hypothesis.
Mathematically, a quantity x obeys a power-law if it is drawn from a probability distribution as follows: where α is a constant parameter of the distribution known as the exponent or scaling parameter.
In the discrete case, power-law distribution, known as the zeta distribution 25 , is expressed as where ζ α is the Riemann zeta function defined as Maximum likelihood estimation of the zeta distribution maximizes the log-likelihood function given by where l α | x is the likelihood function of α, given the data x x i 1 ≤ i ≤ n, L α | x is the log-likelihood function.
When x min > 1, this maximum can be obtained theoretically for the zeta distribution by finding the zero of the derivative of the log-likelihood function 26 :

5.4
The most commonly used goodness-of-fit test is the Kolmogorov-Smirnov KS test 27 , which is simply the maximum distance between the CDFs cumulative distribution functions of the data and the fitted model: where F x is the cumulative distribution function of the data, P x is the cumulative distribution function for the power-law model that best fits the data in the region x ≥ x min .Firstly, we fit our empirical data to the power-law model using the MLE method and calculate the KS statistic for this fit.Secondly, we adopt the KS table, shown in Table 1, for obtaining a goodness-of-fit estimate 23 .Statistics were collected from the simulations to generate the KS quantiles.For each of the logarithmically spaced sample sizes, 10,000 power-law distributions were simulated, with random exponents from 1.5 to 4.0.Thirdly, we calculate the P value, namely, the fraction of the time that the resulting statistic is larger than the value for the empirical data.Conover 28 presented detailed instructions of how to use the KS table for obtaining a goodness-of-fit estimate.
According to the above theory and KS table, we obtain the goodness-of-fit estimates as shown in Table 2.
From Table 2, we can find that data sets described in our paper are consistent with a power-law distribution.For the data set described in Figure 3, however, most data is less than 1.Therefore, in this case we cannot use the method of maximum likelihood estimation.Here, we use the method of nonlinear least square and find that the data obeys a power-law distribution in certain areas.

Conclusion
The main goal of this paper is to explore the impact of bursty human activity patterns on the dynamics of popularity.Through theoretical analysis and empirical data, we prove that bursty human activity patterns are responsible for the power-law decay of popularity.This conclusion is consistent with previous studies 21, 24, 29-31 .Our statistical results also indicate that the number of news and comments exhibits an exponential growth.The strength of forwarding and comment is characterized by bursts, displaying fat-tailed distribution.In order to characterize the dynamics of bursty systems, we explore the distribution of the time interval Δt between consecutive comment bursts and find that it also follows a power law.
Our measurements also indicate that microblog news have short lifetime.Most comments take place within the first day and the average lifetime of all news is 5.16 days.The average lifetime may vary for different social media, but the decay law of popularity is very likely generic, as they do not depend on the content, but are determined mainly by the human activity patterns.Indeed, the exponent α, which characterizes the bursty human activity patterns, is connected to that in the decay of popularity θ by the relation θ α − 1.

4 bFigure 1 :Figure 2 :
Figure1: a The number of news and b the number of comments as the functions of time.In this figure, we find abnormal busty phenomena during some days marked by shadow.With further analysis of the data, we find that these phenomena may result from exogenous factors.For example, there are two major natural disasters happened on April 14, 2010 and August 7, 2010, which are Qinghai earthquake and Zhouqu mudslide, respectively.

Figure 4 :Figure 5 :
Figure 4: Distribution of popularity burst, which follows power-law with slope 3.0.

Slope = 1 . 4 bFigure 6 :
Figure6: a The number of comments X t for a piece of news as the function of time.b The distribution of news lifetime, which follows a power-law with slope 1.4.All the results are the average over 69440 pieces of news, which have been commented on.

Table 1 :
KS test table for power-law distribution.The table was created assuming MLE as the estimation method.

Table 2 :
Basic parameters of the data sets described in our paper, along with their power-law fits and the corresponding P value.