Power-Law Properties of Human View and Reply Behavior in Online Society

Statistical properties of the human comment behavior are studied using data from “Tianya” and “Tieba” which are very popular online social systems or forums in China. We find that both the reply numberR and the view numberV of a thread in a subforum obey the power-law distributions P R R and P V ∝ V , respectively, which indicates that there exists a kind of highly popular topics. These topics should be specially paidmuch attention, because they play an important role in the public opinion formation and the public opinion control. In addition, the relationship between R and V also obeys the power-law function R ∝ V γ . Based on the human comment habit, a model is introduced to explain the human view and reply behaviors in the forum. Numerical simulations of the model fit well with the empirical results. Our findings are helpful for discovering collective patterns of human behaviors and the evolution of public opinions on the virtual society as well as the real one.


Introduction
Statistical properties and models of human behaviors have received much attention in different scientific fields, such as sociology, psychology, and economics.However, most of the existing findings are only qualitative analyses for the lack of real data about the complexity of human behaviors.Usually, it is assumed that the human behavior is a Poisson progress 1, 2 which is a kind of the Markov progress.However, some researchers found that the interevent time distribution of some human behaviors is power-law which means that it is a non-Markov 3-14 one.More and more researchers are interested in it for its importance in the theory and potential applications.
As an important part of modern life and human dynamics, the human behavior on the Internet also attracts more and more attention.Chmiel

et al. investigated the flows of visitors
Mathematical Problems in Engineering migrating between different portal subpages.A model of portal surfing was developed where a browsing process corresponds to a self-attracting walk on weighted networks with a short memory 15 .Grabowski found that the distribution of human activity has the form of a power-law 16 distribution.Based on the data from "Tianya", Wu et al. found that the dynamics of human comment in the online society is non-Markov.Further, they proposed a model to explain it 17 .All these researches indicated that some kinds of human behavior in on-line systems were non-Markov.They have some common statistic properties.More and more researchers considered the forum as a virtual society to study the property and the evolution of complex friendship networks 18, 19 .
A forum is very important for the information and the spreading of public opinions.Many public opinions were also formatted and then spread in the forum.Analyzing the user behavior in the forum is not only helpful for understanding the human behavior and enhancing the information spreading, but also for designing a better website which is important for the information spreading.Recently in China, the news about controlling public opinions on purpose by news have attracted more and more attention.There was a report that at least a half of public opinions in the Internet were proposed by some companies on purpose.So it is very important to study the human comment behavior in the forum.Yu et al. analyzed the view and reply data in the forum which was the beginning of researches on the human comment behavior in the forums.They found that the view and reply numbers of a thread in the sub-forum were power-law.However, they mainly considered statistic properties of the behavior and did not present a model to explain the basic mechanism 20 .
In this paper, we consider the data collected from "Tianya" and "Tieba" which are very popular on-line social system in China and different from those in 17 .We show that both the view number V and the reply number R of a thread in the sub-forum obey powerlaw distributions which confirmed Yu et al.'s finding 20 .The relationship between V and R is also power-law.These present that a lot of topics are important in the formation and evolution of public opinions.Furthermore, based on the human habit, a model is proposed to explain these phenomena.Numerical simulations are given to explain the human comment behavior in the forum.We hope it is useful for understanding complex human behaviors in the forums.
This paper is organized as follows: in Section 2, the origin of the data is introduced.The statistical results are presented in Section 3. The model and numerical simulations are presented in Section 4. Finally, our conclusion is given in Section 5.

Description of the Original Data
Our data are obtained from "Tianya" http://www.tianya.cnand "Tieba" http://tieba.baidu.com, which are two most popular on-line social systems in China.Our data are collected from the sub-forums of "Tianya" and "Tieba."Each user is assigned a different identity name ID in the forums.A topic in the sub-forum is called a thread.A thread is a minimal unit, and it can be divided into a root thread and the reply threads.A root thread is a new topic, and the reply threads are related to a root one.The users discuss the public opinion in both the root and reply threads.Until 2010/02/11, there were 33,296,350 IDs in "Tianya," and about 200,000 IDs on average were on-line at the same time.The topics and the public opinion in "Tianya" and "Tieba" reflect part of the public opinions of the real society in China.Our data sets are collected from the threads in four sub-forums.The types of these topics are different from public news to personal stories which indicate that our results are general for different contents.The format of the data is shown in Table 1, where the first column is the title of a thread, the second one gives the author's name of a root thread, the third one shows R and V , and the last one is the last update time of a thread.

Statistical Results
In the forum, the view and reply times of a thread reflect the influencing ability of a topic.Further, more reply times mean more discussions and more communications.These two parameters play an important role in the public opinion formation and the web design.
Hence, we study statistical properties of V and R in the thread of each sub-forum.Four subforums are randomly selected as our data sets.The topics and some prosperities are listed in Table 2.
The distributions of V and R in each sub-forum are shown in Figure 1 from which we can clearly see that all the distributions are power-law, although the threads differ in their contents.Their exponents vary with different sub-forums.These results show that the process of human comments is non-Markov which is the same as the human dynamics of the letter and e-mail communications, the web browsing, online movie watching, and broker trades.The heavy tail of the distribution allows for much more numbers of threads which have larger amounts of V and R than the Poisson progress.The thread which has more V and R has much more influences on the public opinion.The number of these kind of threads is so large that they cannot be ignored.A large population will read the thread by which their opinions may be influenced.So we must pay much attention to them.
As is known to all, the more the view, the more the reply.However, the quantity relationship between V and R is not very easy to know and it is the basic property of a thread.Hence, next we mainly focus on the relationship between the human's view and reply behaviors in Figure 2. We found that it can be illustrated as a straight line in a log-log plot, which means R ∝ V γ .It is easy to understand that the more the view, the more the reply.Moreover, the nonlinear relationship here also means that the reply number increases slower than the view one when the view number is large enough.It also indicates that human's interest in reply decreases as the increment of V .

The Model and the Simulations
In order to get a better understanding of our empirical observations in Section 3, we propose a model based on our intuitive experience about the human comment habit.We see that the view number of each sub-forum increases more quickly as the time evolves.There are many threads on each sub-forum.Each thread will be viewed based on its content and its previous view time.Hence, our model is defined by the following scheme.
Step 1 growing .At time t 0, there are a few threads on the sub-forum, and each thread has a random small V and R. At each step, a new thread is created, and there are c * t θ views on the old thread.All the old threads have the probabilities to be viewed.
Step 2 view habit .The probability that an old thread is viewed at each step is based on its attraction Π i A i t /ΣA i t , where A i t is the attraction of a thread i at time t and it is reflected by the previous view number V i t , that is, A i t A 0 V i t .Here A 0 represents the initial attraction which is different due to different topics.
Step 3 reply habit .At each step, when the user views a thread, he has a probability P i L * R i /V i η to reply the thread.Mathematically, the model is similar to the growing networks in 21 .Based on the analysis of the growing network in this paper, we obtain that the distribution of V i is a powerlaw one, that is, P V i ∝ V i −α at a large enough time t where the exponent α is 1 1/ 1 θ .
To compare our model with empirical observation results, let us take the sub-forum C in our data sets as an example.Here we use the parameters θ 0.9, L 0.1, η 0.5 in the simulation.The results are shown in Figure 3. Figure 3 a presents that the distribution of the view is indeed a power-law one with a similar exponent as that from the data.Figure 3 b shows that the reply number also obeys a power-law distribution.The nonlinear relationship between the view and reply times is shown in Figure 3 c which is the same as that from the data.In Figure 3 d , we further study the relationship between the parameter η and the slope γ.We see that γ decreases as η increases.From the analyses above, we can see that the proposed model can well-describe most important features in the human view and reply behaviors in online social systems.

Conclusion
In this paper, we analyze the statistical properties of the view and reply behaviors in on-line social systems.We find that they are different types of interactive human dynamics which are non-Markov.The view and the reply behaviors follow power-law distributions, and the relationship between them also follows a power-law one.A model based on the personal attraction is introduced to explain the human complex behavior.Numerical simulations of the model fit well with empirical results.Our work is useful to understand the human complex behavior in realistic society, for example, the human discussion behavior in a meeting or group communications in trunked mobile telephony 22 .We expect that quantitative understanding of human view and reply behaviors, when combined with additional content analyses, will open a new perspective on distinguishing fraud public opinions from realistic opinions.

Figure 2 :
Figure 2: The power-law relationship between V and R, where a sub-forum A, the slope γ 0.77, b subforum B, the slope γ 0.89, c sub-forum C, the slope γ 0.85, d sub-forum D, the slope γ 0.90.

Figure 3 :
Figure 3: Simulation results of the model whose parameters are selected as θ 0.9, L 0.1, η 0.5, where the dashed line shows the slope of the fitting function.a The distribution of V of a thread.The slope of fitting function is α 1.51 ± 0.02.b The distribution of R of a thread.The slope of fitting function is α 1.41 ± 0.04.c The relationship between V and R. The slope of fitting function is γ 0.9 ± 0.03.d The relationship between the slope γ and the parameter η.

Table 1 :
Detailed format of a subforum.

Table 2 :
Detailed informations about four randomly selected sub-forums.