The dynamics of online content popularity has attracted more and more researches in recent years. In this paper, we provide a quantitative, temporal analysis about the dynamics of online content popularity in a massive system: Sina Microblog. We use time-stamped data to investigate the impact of bursty human comment patterns on the popularity of online microblog news. Statistical results indicate that the number of news and comments exhibits an exponential growth. The strength of forwarding and comment is characterized by bursts, displaying fat-tailed distribution. In order to characterize the dynamics of popularity, we explore the distribution of the time interval Δt between consecutive comment bursts and find that it also follows a power-law. Bursty patterns of human comment are responsible for the power-law decay of popularity. These results are well supported by both the theoretical analysis and empirical data.
1. Introduction
The advent of Web 2.0 and online social media [1] is fostering web-mediated brokers such as microblog and search engines, through which anyone can easily publish and promote content online. The dynamics of online content popularity has been deeply affected by the existence of these web-mediated brokers. The Web 2.0 and online social media not only change traditional communication processes with new types of phenomena, but also generate a huge amount of time-stamped data, making it possible for the first time to study the dynamics of online content popularity and human activity patterns at the global system scale.
Many human factors may affect the popularity of online content, which include human interests [2–4], social identity [5, 6], limited attention [7], and memory effects [8]. In this paper, we focus on the impact of bursty patterns of human comment on the popularity of Sina Microblog (http://www.weibo.com/) [9]. Temporal heterogeneous and bursty are widely observed in many human-activated systems, which may result from both endogenous mechanisms like the highest-priority first protocol and exogenous factors like the seasonality and heterogeneity of human activities [10]. This phenomenon was found in various human activity patterns such as instant messaging [11, 12], web browsing [13, 14], E-mail and surface mail [15, 16], and mobile phone calls [17, 18]. If only regarding the timing of events, these human activity patterns are often described by a power-law distribution P(τ)~τ-α, where τ is the time interval between two consecutive activities. Regarding this phenomenon, many mechanisms of human dynamics have been proposed to explain the temporal bursts, such as task priority mechanism [15], memory effect mechanism [8, 19], human interaction mechanism [20], interest-driven mechanism [2, 3], and social identity mechanism [5, 6].
Here, the popularity of microblog news is defined as the number of comments per day posted for a piece of news. It is well documented that the statistical properties of the variable are very heterogeneous, with distribution following power-law. We explore the distribution of the time interval Δt between consecutive comment bursts and find it follows power-law. Furthermore, we prove that the exponent α, which characterizes the bursty patterns of human comment, is connected to that in the decay of popularity θ by the relation θ=α-1.
The rest of this paper is organized as follows. Section 2 gives the data set. In Section 3, we study the dynamics of online news popularity. Section 4 introduces the relationship between bursty patterns of human activity and online content popularity. The power-law distribution is verified in Section 5. Finally, in Section 6 we conclude the work.
2. Data Set
The data of this research is collected from Sina Microblog (http://www.weibo.com/), which is one of the biggest microblogging platforms in China. We collected all news about a public topic, dated from August 20, 2009 to September 3, 2010, with the duration of 380 days. During this period, there are totally 125,150 pieces of news released, which have been forwarded 2,260,826 times and triggered 1,786,000 comments. For each piece of news, news ID, releasing time, times of forwarding, and number of comments were recorded. Therefore, we can track the dynamics of one specific piece of news through a unique ID.
From the statistical results, we find that during the data collecting time window, the number of news and comments exhibits an exponential growth (Figure 1). At the beginning of the observation time window, only a small amount of news and comments were posted. This is because Sina Microblog was just launched on August 14, 2009, and only a small group of people knew this application at that time.
(a) The number of news and (b) the number of comments as the functions of time. In this figure, we find abnormal busty phenomena during some days marked by shadow. With further analysis of the data, we find that these phenomena may result from exogenous factors. For example, there are two major natural disasters happened on April 14, 2010 and August 7, 2010, which are Qinghai earthquake and Zhouqu mudslide, respectively.
The measurements also indicate that the number of forwarding and comments possess heterogeneity and burst. In our data set, among all the 125,150 pieces of news, 65772 pieces of news were forwarded and 69440 pieces of news were commented, respectively. As shown in Figure 2, the strength of forwarding Sforward (the number of forwarding for a piece of news) and the strength comment Scomment (the number of comments for a piece of news) follow power-law with the same exponent.
(a) The strength of forwarding Sforward. (b) The strength of comment Scomment. From the figures, we can find that most news received few forwarding and comments, but only a small part of news has a lot of forwarding and comment.
3. The Dynamics of Online Popularity
In order to quantitatively analyze the popularity of online microblog news, we consider the number of comments per day posted for a piece of news, expressed by X(t) at time t. We study
(3.1)[ΔX]t=[X(t)-X(t-1)]X¯,
which represents the relative variation of the measurement in the time unit. Here, we use one day as the time unit, so X¯ is the average value of comment strength. And X(1) means the number of comments posted for a piece of news in the first day after the news released.
The relative variation of comment in the time unit is shown in Figure 3. Most news experienced a burst and received little attention thereafter. Since the relative variation may be negative, indicating a decrease in popularity, but our main concern is the positive values, so the events with negative variation are neglected.
Probability distribution of ΔX. We adopt the method of nonlinear least square to estimate the exponent. Slope=1.07, SSE=1×10-4, R-square=0.8999, and RMSE=7.9×10-4.
Another way to characterize the dynamics of bursty systems is to study the distribution of time intervals between successive events. We analyzed the time distribution between consecutive comment bursts [21], namely, the time intervals between positive ΔX, shown in Figure 4. The intervals between bursts are distributed in a power-law.
Distribution of popularity burst, which follows power-law with slope = 3.0.
We use maximum likelihood estimation (MLE) method [22] in conjunction with the Kolmogorov-Smirnov (KS) statistic table [23] to verify whether the fit is a good match to the data. In this case, the KS statistic suggests that the power-law curve is the better fit for the data, which will be explained in detail in Section 5.
4. Relationship between Bursty Human Activity Patterns and Popularity
In this section, we focus on the impact of bursty human activity patterns on the dynamics of popularity. First of all, we show that the number of comments for a piece of news X(t) can be derived from the comment patterns of users.
Assume that a given piece of news is released at time t0 and that all users can comment on it. The comment patterns are different from the browsing patterns [24]. Every user can comment on a piece of news more than once. In Figure 5, we show the comment patterns of one user, each vertical line represents a separate comment on the news. The thick line denotes the time when the user comments on the news for the first time after it was released at t0. The release time of the news t0 divides the time interval Δt into two consecutive comments of length t1 and t2, where t1+t2=Δt. The probability that a user comments at time t2 after the news was released is proportional to the number of possible Δt intervals. For a user characterized by a power-law intercomment time distribution with exponent α and a minimum time unit of tm, the probability of finding an Δt interval having a length larger than t2 is
(4.1)p(Δt>t2)=(α-1)tmα-1∫t2∞(Δt)-αdΔt=(t2tm)-α+1.
In (4.1) we assume that α>1.
Comment patterns of user, each vertical line represents a separate comment on the news. The thick line denotes the time when the user comments on the news for the first time after it was released at t0.
For all users characterized by different exponents, the number of comment X(t) can be calculated analytically as the average of (4.1) over the observed exponent values:
(4.2)X(t)~〈(ttm)-α+1〉α.
For simplicity, we assume that tm=1 and focus on the case that all users are characterized by the same exponent α. For example, the intercomment time distribution follows a power-law with exponent α=2.5 at the collective level (Figure 7).
Hence, (4.2) can be written as
(4.3)X(t)~t-α+1.
Thus, we prove that the number of comments for a piece of news decay follows a power-law with the exponent α-1, namely, the decays of popularity follows a power-law with the exponent α-1.
In our data set, the number of comments X(t) for a piece of news follows a power-law with exponent θ=1.5 (Figure 6(a)). More than 80% of comments take place within the first day, then decay to only 10% on the second day, and finally reach a small amount after five days. Meanwhile, we statistic the lifetime of all news (Figure 6(b)) and find the average lifetime of all news is 5.16 days. Distribution of interval between two consecutive comments at the collective level follows a power-law with exponent α=2.5 (Figure 7).
(a) The number of comments X(t) for a piece of news as the function of time. (b) The distribution of news lifetime, which follows a power-law with slope = 1.4. All the results are the average over 69440 pieces of news, which have been commented on.
Distribution of intervals between two consecutive comments at the collective level, which follows power-law with exponent α=2.5.
To sum up, we prove the fact that bursty human activity patterns deeply affected the popularity of news. Meanwhile, we conclude that the exponent α characterizing the bursty human activity patterns is connected to that in the decay of popularity θ by the relation:
(4.4)θ=α-1.
These results are supported by both the theoretical analysis and empirical data.
5. Testing the Power-Law Hypothesis
Recent empirical observations suggested that power-law distributions occur in many natural and man-made systems. Unfortunately, most previous empirical studies of power-law distributed data have not attempted to test the power-law hypothesis quantitatively. Instead, they typically rely on qualitative appraisals of the data, for instance, based on visualizations.
In this section, we use a goodness-of-fit test to tell whether the fit is a good match to the data. First, we fit our empirical data to the power-law model using the methods of maximum likelihood estimation (MLE) and calculate the KS statistic for this fit [22]. Next, we use the KS table [23] obtaining good basis to confirm or reject the power-law distribution hypothesis.
Mathematically, a quantity x obeys a power-law if it is drawn from a probability distribution as follows:
(5.1)p(x)∝x-α,
where α is a constant parameter of the distribution known as the exponent or scaling parameter.
In the discrete case, power-law distribution, known as the zeta distribution [25], is expressed as
(5.2)p(x)=ζ(α,x)ζ(α,xmin),
where ζ(α) is the Riemann zeta function defined as ∑k=1∞k-α.
Maximum likelihood estimation of the zeta distribution maximizes the log-likelihood function given by
(5.3)l(α∣x)=∏i=1nxi-αζ(α),L(α∣x)=logl(α∣x)=∑i=1n(-αlog(xi)-log(ζ(α)))=-α∑i=1nlog(xi)-nlog(ζ(α)),
where l(α∣x) is the likelihood function of α, given the data x=xi1≤i≤n, L(α∣x) is the log-likelihood function.
When xmin>1, this maximum can be obtained theoretically for the zeta distribution by finding the zero of the derivative of the log-likelihood function [26]:
(5.4)ddαL(α∣x)=-∑i=1nlog(xi)-n1ζ(α)ddαζ(α)=0,ζ′(α)ζ(α)=-1n∑i=1nlog(xi).
The most commonly used goodness-of-fit test is the Kolmogorov-Smirnov (KS) test [27], which is simply the maximum distance between the CDFs (cumulative distribution functions) of the data and the fitted model:
(5.5)K=maxx≥xmin|F(x)-P(x)|,
where F(x) is the cumulative distribution function of the data, P(x) is the cumulative distribution function for the power-law model that best fits the data in the region x≥xmin.
Firstly, we fit our empirical data to the power-law model using the MLE method and calculate the KS statistic for this fit. Secondly, we adopt the KS table, shown in Table 1, for obtaining a goodness-of-fit estimate [23]. Statistics were collected from the simulations to generate the KS quantiles. For each of the logarithmically spaced sample sizes, 10,000 power-law distributions were simulated, with random exponents from 1.5 to 4.0. Thirdly, we calculate the P value, namely, the fraction of the time that the resulting statistic is larger than the value for the empirical data. Conover [28] presented detailed instructions of how to use the KS table for obtaining a goodness-of-fit estimate.
KS test table for power-law distribution. The table was created assuming MLE as the estimation method.
Sample number
Quantile
0.9
0.95
0.99
0.999
50
0.0826
0.0979
0.1281
0.1719
100
0.0580
0.0692
0.0922
0.1164
500
0.0258
0.0307
0.0412
0.0550
1000
0.0186
0.0216
0.0283
0.0358
2000
0.0129
0.0151
0.0197
0.0246
3000
0.0102
0.0118
0.0155
0.0202
4000
0.0102
0.0118
0.0155
0.0202
5000
0.0073
0.0086
0.0113
0.0147
10000
0.0059
0.0069
0.0089
0.0117
50000
0.0025
0.0034
0.0061
0.0077
According to the above theory and KS table, we obtain the goodness-of-fit estimates as shown in Table 2.
Basic parameters of the data sets described in our paper, along with their power-law fits and the corresponding P value.
Smaple number
n
〈x〉
σ
ζ'(α)/ζ(α)
α
P value
Figure 2(a)
125 150
2147.03
2253.87
0.1936
1.617
0.95
Figure 2(b)
125 150
1696.11
2489.25
0.2061
1.621
0.95
Figure 4
4 138
478.05
752.74
0.3876
3.048
0.72
Figure 6(a)
69 440
0.24
1.55
0.1864
1.552
0.80
Figure 6(b)
69 440
966.94
1783.26
0.1573
1.395
0.90
Figure 7
1 786 000
3284.13
22829.74
0.2653
2.496
0.90
From Table 2, we can find that data sets described in our paper are consistent with a power-law distribution. For the data set described in Figure 3, however, most data is less than 1. Therefore, in this case we cannot use the method of maximum likelihood estimation. Here, we use the method of nonlinear least square and find that the data obeys a power-law distribution in certain areas.
6. Conclusion
The main goal of this paper is to explore the impact of bursty human activity patterns on the dynamics of popularity. Through theoretical analysis and empirical data, we prove that bursty human activity patterns are responsible for the power-law decay of popularity. This conclusion is consistent with previous studies [21, 24, 29–31]. Our statistical results also indicate that the number of news and comments exhibits an exponential growth. The strength of forwarding and comment is characterized by bursts, displaying fat-tailed distribution. In order to characterize the dynamics of bursty systems, we explore the distribution of the time interval Δt between consecutive comment bursts and find that it also follows a power law. Our measurements also indicate that microblog news have short lifetime. Most comments take place within the first day and the average lifetime of all news is 5.16 days. The average lifetime may vary for different social media, but the decay law of popularity is very likely generic, as they do not depend on the content, but are determined mainly by the human activity patterns. Indeed, the exponent α, which characterizes the bursty human activity patterns, is connected to that in the decay of popularity θ by the relation θ=α-1.
Acknowledgments
This work was supported by Program for New Century Excellent Talents in University (NCET-11-0597) and the Fundamental Research Funds for the Central Universities (2012RC1002).
TapscottD.WilliamsA. D.ZhaoZ. D.XiaH.ShangM. S.ZhouT.Empirical analysis on the human dynamics of a large-scale short message communication systemShangM. S.ChenG. X.DaiS. X.WangB. H.ZhouT.Interest-driven model for human dynamicsHanX. P.ZhouT.WangB. H.Modeling human dynamics with adaptive interestYanQ.WuL.YiL.Research on the human dynamics in mobile communities based on social identityYanQ.YiL.WuL.Human dynamic model co-driven by interest and social identity in the MicroBlog communityWengL.FlamminiA.VespignaniA.MenczerF.Competition among memes in a world with limited attentionVazquezA.Impact of memory on human dynamicshttp://www.weibo.comZhouT.ZhaoZ.-D.YangZ.ZhouC.Relative clock verifies endogenous bursts of human dynamicsHongW.HanX. P.ZhouT.WangB. H.Heavy-tailed statistics in short-Message communicationZhaoZ.-D.ZhouT.Empirical analysis of online human dynamicsGonçalvesB.RamascoJ. J.Human dynamics revealed through Web analyticsRadicchiF.Human activity in the webBarabásiA. L.The origin of bursts and heavy tails in human dynamicsOliveiraJ. G.BarabásiA. L.Darwin and Einstein correspondence patternsJoH.-H.KarsaiM.KerteszJ.KaskiK.Circadian pattern and burstiness in mobile phone communicationCandiaJ.GonzálezM. C.WangP.SchoenharlT.MadeyG.BarabásiA.-L.Uncovering individual and collective human dynamics from mobile phone recordsVázquezA.OliveiraJ. G.DezsöZ.GohK. I.KondorI.BarabásiA. L.Modeling bursts and heavy tails in human dynamicsWuY.ZhouC.XiaoJ.KurthsJ.SchellnhuberH. J.Evidence for a bimodal distribution in human communicationRatkiewiczJ.FortunatoS.FlamminiA.MenczerF.VespignaniA.Characterizing and modeling the dynamics of online popularityClausetA.ShaliziC. R.NewmanM. E. J.Power-law distributions in empirical dataGoldsteinM. L.MorrisS. A.YenG. G.Problems with fitting to the power-law distributionDezsöZ.AlmaasE.LukácsA.RáczB.SzakadátI.BarabásiA.-L.Dynamics of information access on the webJohnsonN. L.KotzS.KempA. W.BaukeH.Parameter estimation for power-law distributions by maximum likelihood methodsPressW. H.TeukolskyS. A.VetterlingW. T.FlanneryB. P.ConoverW. J.SzaboG.HubermanB. A.Predicting the popularity of online contentCraneR.SornetteD.Robust dynamic classes revealed by measuring the response function of a social systemSalganikM. J.DoddsP. S.WattsD. J.Experimental study of inequality and unpredictability in an artificial cultural market