Modeling Random Forwarding Actions for Information Diffusion over Mobile Social Networks

Modeling information diffusion over social networks has attracted a lot of attention from both academia and industry. Based on universal generating function method and discrete stress-strength interference theory, a novel method is proposed to model the users’ random forwarding actions, and the most susceptible users are extracted. The effect of a user on information diffusion is quantified as node susceptibility (NS), and NS is defined as the probability that quantity of information (message) the user forwards is larger than that he receives. The model can address three questions: which users are most susceptible, which types of information they are most susceptible to, and when they are most susceptible. The solutions of these questions are very helpful for the practitioners. A case study is used to illustrate the feasibility and practicality of the proposed model.


Introduction
With the development of Internet and mobile information technologies, social networks have incrementally become the most popular applications [1,2].A mobile social network connects organizations or individuals using smartphones with sharing of information through social networking applications such as MySpace, Facebook, and scientific collaboration networks [3,4].Mobile social networks are present in virtual communities and work with web-based social networks to spread content, increase accessibility, and connect users from wherever they are [5].One of the distinguishing factors of social networks is the ability to enable people to simultaneously share information with large number of peers [6].Due to its diverse implication, research on social networking from various perspectives has received remarkable attention.In the area of social networks, modeling and analyzing their networking structures [7], dynamic evolution [8], and the characteristics of information diffusion [9] are hot topics.
More recently, the smartphones have increased rapidly, and social networks are experiencing explosive growth, not only in the number of communities but also in the overall population [10].For example, Facebook has over 1.18 billion monthly active users as of August 2015 [11].Everyday users are sharing and exchanging the information by means of "word-of-mouth" communications in such large-scale social networks.Under such circumstances, in-depth analyzing and quantifying random actions (to forward the information or not) of users for information diffusion over social networks have become more and more important [12].For each user (a human in a social network or a website on the Internet), whether to forward the information depends on several uncertain factors, including the interestingness of the information, the reliability of the information, egoistic motivation, and altruistic motivation [13].Whatever the motivation, forwarding actions indicate that the user is susceptible to the information, and the higher the frequency, the more susceptible the user.In this sense, the study of random forwarding actions can help the politicians/enterprises to identify the susceptible users and then to achieve the most effective advocation/advertisement.On the other hand, the global diffusion of detrimental information (e.g., computer viruses, rumors) causes great damage to society.It is of great importance to identify the susceptible users and timely quarantine them [14].Therefore, from the point of view of the security, the study of random forwarding actions can help to prevent the diffusion of detrimental information.
The rapid developments of communication and information technologies enable us to access, collect, and store the real-world big data on information diffusion, making the related research meaningful and versatile and, meanwhile, more challenging.Back to 2001, Pastor-Satorras and Vespignani [15] studied epidemic/computer virus spreading over network, indicating that the study on information diffusion originates from the study on epidemic/computer virus diffusion.One of the earliest and prominent studies on information diffusion is [16], which studied and analyzed the dynamics of information dissemination through blogspace from two points of view: macroscopic and microscopic.Subsequently, there are numerous related works addressing information diffusion over social networks.Here, we summarize the most representative works that are relevant to our study.The existing related works can be broadly classified into two categories.The first one focuses on analyzing the effect of network structure on information diffusion [17][18][19][20][21].In [17], the authors studied the scaling law of a few large networks and showed that the vertex connectivity obeys a scale-free distribution of power law.Donetti et al. [18] reported that scale-free structures may be generated by optimal designing for network mechanisms.The work [19] reported that the scale-free network can optimize the network performance.Recently, how the network structure of microblog influences information diffusion was studied in [20].By studying followers' topology, the authors presented an invariant characteristic that the users' followers count obeys a distribution of power law with exponent near 2. In [21], the authors comparatively explored the network structure, geographic distribution of users, and interaction pattern in social networks.Based on the study, the authors suggested that information can be organized by a few central users bridging small communities.
The second category focuses on analyzing the effect of network nodes (users) on information diffusion using different mathematical models [22][23][24][25][26][27][28].Kimura et al. [22] considered the optimization problem of extracting the most influential nodes over a social network.Later, Yu et al. [23] proposed a community-based greedy algorithm to mine top- influential nodes over mobile social networks.By identifying the important information nodes, Ilyas et al. [24] studied how to restrain the private information diffusion.As social networks (e.g., Twitter and Facebook) become ubiquitous, the global effect of a node on diffusion rate on Twitter was studied in [25,26].More recently, Saito et al. [27] proposed an efficient method to find a new kind of influential nodes (supermediators) over a social network and characterized the properties of supermediators.From another perspective, Belák et al. [28] studied the effect of hidden nodes on information diffusion and characterized information cascades.
The study presented in this paper focuses on modeling nodes' random forwarding actions and analyzing the effect of these actions on information diffusion; that is, our study falls into the second category.Although the above-cited literatures provide systematic approaches and useful tools for analyzing the effect of network nodes on information diffusion, the majority of them neglect the dynamic and random characteristics of this problem.Given the complexity and uncertainty of social network, it is difficult for nodes to maintain the same effect on information diffusion during different periods.In addition, most of them mainly focus on extracting the most influential nodes, while mining the most susceptible nodes has not been well investigated.In fact, quantifying nodes' random forwarding actions and finding the most susceptible nodes also play an important role for information diffusion.On the one hand, from the academic research point of view, for information to diffuse, it in essence relies on nodes to forward the information that they receive.However, whether to forward the information is uncertain and depends on many factors.Therefore, quantifying nodes' random forwarding actions can help to objectively and rationally analyze which nodes are susceptible to the information and when they are most susceptible.On the other hand, from the practical significance point of view, for politicians/enterprises, it is easier to obtain advocation/purchasing from the susceptible nodes, rather than the influential nodes.Therefore, identifying the most susceptible nodes can help politicians/enterprises to achieve the most effective advocation/advertisement.This work is motivated by the challenges of quantifying nodes' random forwarding actions and finding the most susceptible nodes, at the same time emphasizing the dynamics characteristics of this problem.The study aims to address three key questions: (1) which nodes are most susceptible.(2) which types of information they are most susceptible to.and (3) when they are most susceptible.To this end, a novel and efficient model for analyzing the effect of nodes on information diffusion is proposed based on universal generating function (UGF) method and discrete stress-strength interference (DSSI) theory.Stress-strength interference models have been widely used in component reliability analysis, but to the best of our knowledge, it is the first time that stress-strength interference model is applied to information diffusion analysis.In our model, the effect of node is quantified as node susceptibility (NS), which is relevant to two random variables: quantity of information (message) that the node receives and quantity of information (message) that the node forwards, and NS is defined as the probability that the latter (strength) is larger than the former (stress).Based on NS, the proposed model can help decisionmakers to dynamically identify which nodes are most susceptible to the corresponding information at different periods of time.The innovations and practical significance of this paper are as follows.
(i) Approach Innovations.To model random forwarding actions over mobile social networks, DSSI model is applied to information diffusion analysis for the first time.Unlike the continuous stress-strength interference (CSSI) model, DSSI model can calculate system reliability (NS in this paper) based on observations of stress and strength when the distributions of stress and strength are unavailable.Moreover, since the stress and strength in the paper are discrete random variables, UGF method is utilized to represent their probability mass functions for the calculation of NS.In this sense, the calculation of NS is based on actual observation data, rather than being dependent on decision-makers' subjective judgments; therefore, the decision results are objective and will be updated with the updated observation data.
(ii) Practical Significance.For the decision-makers (practitioners), modeling and decision process are easy to implement, since the calculation of NS is based on the observations of random variables which can be obtained directly from the database.In conventional approaches, decision-makers need to know specialized knowledge of filtering appropriate criteria from a lot of criteria and specifying the weights of criteria for optimization decisions, but, here, they only need to record the observations of the relevant variables, which can simplify the process of decision.
The rest of this paper is organized as follows.Section 2 describes the theoretical background of the proposed model.Model formulation is presented in Section 3. In Section 4, a case study is presented to illustrate the feasibility and efficiency of the proposed model.The paper ends with conclusions in Section 5.

Theoretical Background
Before describing the mathematical model, we introduce some definitions and notations related to universal generating function (UGF) method and discrete stress-strength interference (DSSI) model.They will be used in Section 3.

Brief Description of UGF Method.
We put emphasis on the basic concept but not the fundamental mathematics of UGF method.Ushakov [29] first introduced the concept of UGF.Then, Lisnianski and Levitin [30] and Levitin [31] applied UGF method to reliability analysis and optimization of multistate system.
Definition 1 (UGF of discrete random variable).The UGF of  is defined as a polynomial function of variable ,   (), and It should be mentioned that there exists a one-to-one correspondence between the p.m.f. and UGF of a discrete r.v.This means that, for an arbitrary discrete r.v., its UGF is uniquely determined by its p.m.f..   () = ⨂ (  1 () ,   2 () , . . .,    ()) .

Discrete Stress-Strength Interference (DSSI) Model.
Stress-strength interference model [32] has been widely used for reliability analysis of component, where "component" is not necessarily the raw goods or parts but can be an entire system.Stress-strength analysis is an efficient tool used in reliability engineering.
Definition 4 (component reliability).Let  1 and  2 denote stress on a component and strength of a component, respectively; then, the component reliability denoted by  is defined as Equation ( 5) is the most basic expression of the stressstrength interference model, which indicates that the component reliability is defined as the probability that the strength is larger than the stress.
If  1 and  2 are treated as continuous r.v. and their probability density functions are denoted by  1 ( 1 ) and  2 ( 2 ), respectively, (5) can be rewritten as Probability density function Figure 1: Component reliability as overlap of stress and strength. or Figure 1 exhibits visually the component reliability which is defined by the area where both tail curves interfere or overlap with each other.For the sake of clarity, (6a) and (6b) can be called the continuous stress-strength interference (CSSI) model.
If  1 and  2 are two discrete r.v. with the p.m.f. as follows, where  1 and  2 are, respectively, numbers of possible values that  1 and  2 can take on, then, according to Definition 1, the UGF of  1 and  2 can be obtained as follows: If ( 1 ,  2 ) is a function of  1 and  2 , based on the UGF method introduced above, we can obtain the UGF of ( 1 ,  2 ) as follows: where   and   ( = 1, 2, . . ., ) are possible values of function ( 1 ,  2 ) and corresponding probabilities, respectively, and  ≤  1 ×  2 .
Definition 5 (discrete stress-strength interference (DSSI) model).If ( 1 ,  2 ) =  2 −  1 , the component reliability can be calculated as Equation ( 10) is called the DSSI model, where (  ) is a binary-valued function with domain on the set of possible values of function ( 1 ,  2 ) as

Model Formulation
In this section, a mathematical model is formulated for random forwarding actions for information diffusion over social networks.First, model description and notations used to develop the model are presented.Then, based on DSSI model introduced above, the effect (forwarding actions) of node is quantified as node susceptibility (NS).Finally, the most susceptible node is identified., for a piece of information, the user randomly forwards it or not.Whether to forward the information depends on several uncertain factors, including the interestingness of the information, the reliability of the information, egoistic motivation, and altruistic motivation.Whatever the motivation, forwarding actions indicate that the user is susceptible to the information, and the higher the frequency, the more susceptible the user.The decision problems addressed in this paper are as follows: (1) which users are most susceptible, (2) which types of information they are most susceptible to, and (3) when they are most susceptible.

Model Description and
For the sake of clarity of model description and development, we give the notations used to develop the model in Notation.

Quantifying Forwarding Actions Based on DSSI Model.
In this subsection, definition of node (user) susceptibility (NS) is first given.Then, calculation steps of NS are presented.

Definition of Node Susceptibility (NS).
As previously analyzed, forwarding actions indicate that the user is susceptible to the information, and the higher the frequency, Period Time interval Time node Figure 2: Modeling period and time node.

Relative frequency
Relative frequency the more susceptible the user.To model the forwarding actions, a novel and universal criterion for the effect of user on information diffusion will be introduced, namely, node susceptibility (NS).NS is based on DSSI model and UGF method introduced in Section 2. DSSI model considers two main random variables (r.v.): a stress which is any load applied on a component and a strength which is the maximum tolerance that the component can withstand without failing.To develop the model, this paper recognizes   as equivalent to component, random quantity of   that   received at   ,   as equivalent to stress, and random quantity of   that   forwarded at   ,   , as equivalent to strength.For the sake of clarity, we give the following definition of NS.
Definition 6 (node susceptibility (NS)).Suppose that quantity of   that   received at   ,   and quantity of   that   forwarded at   ,   are r.v.NS of   in regard to   at   denoted by NS  is the probability that   is larger than   .As a result, NS  is given by The following should be noted about the definition of NS: (1) The statistic character of   is based on a group of observations, that is, the observation parameters   .Similarly, the statistic character of   is also based on a group of observations, that is, the observation parameters   .This means that   's p.m.f. and   's p.m.f.can be obtained from their observations, respectively.
(2) Since the observations are objective, NS  is not dependent on the subjective judgments of the decision-makers.In addition, the more the observations are, the more accurate the evaluation is going to be.
(3) According to the definition of NS, random forwarding actions are quantified as a probability, which shows the degree of user's susceptibility to the corresponding information at the corresponding period.Therefore, based on NS, decision-maker can identify which users are most susceptible to the corresponding information at different periods.

Calculation Steps of NS.
According to UGF method and DSSI model introduced previously, calculation steps of NS are given as follows.
Step 1 (deriving   's p.m.f. and   's p.m.f.).Suppose that the observations of   are  1 ,  2 , . . .,    and the observations of   are  1 ,  2 , . . .,    .Two groups of observations can be described by histograms, as shown in Figure 3, and the class intervals of observations and their corresponding relative frequencies are obtained.
To obtain   's p.m.f., the midpoint values of each class interval [ where where  Step 2 (deriving   's UGF,   's UGF, and (  ,   )'s UGF).According to Definition 1, the UGFs of   and   can be given as follows: Because (  ,   ) is a function of   and   , based on Definitions 2 and 3, the UGF of (  ,   ) can be obtained as follows: where   and   ( = 1, 2, . . ., ) are possible values of function (  ,   ) and corresponding probabilities, respectively, and  ≤   ×   .
Step 3 (calculating NS  based on DSSI model).Suppose that (  ,   ) =   −   , according to Definition 5, NS  can be calculated as where (  ) is a binary-valued function with domain on the set of possible values of function (  −   ) as Equations (19a1), (19a2), and (19aN) shows that at corresponding period, which user is most susceptible to information   .For example, at period  1 , user   1 is most susceptible to information   , and at period   , user    is most susceptible to information   .Therefore, based on each user's NS, three main questions are solved: (1) which user is most susceptible.(2) which type of information he is most susceptible to, and (3) when he is most susceptible.In real-life decision, more susceptible users can be extracted as needed.To this end, decision-makers (politicians or enterprises) only need to rank NS  ,  = 1, 2, . . ., ; and  = 1, 2, . . ., , and set thresholds to mine the top ranked ones and then extract the corresponding users.To achieve the most effective advocation/advertisement, at the corresponding period, politicians/enterprises can post the corresponding information to these users.

Case Study
This section aims to illustrate the feasibility and practicality of the proposed model through its application to a test case.

Case Description.
The case study was motivated by the problem of extracting appropriate users for advertisement over a social network-Meituan.Meituan is a Chinese groupbuying website for locally found consumer products and retail services, and it sells vouchers from merchants for deals, subject to minimum number of buyers who demand a discount.Meituan generates most of its revenue from mobile application services, and it has partnering agreement with 400 thousand Chinese local businesses.In 2014, Meituan accounts for 60% of the market share of deal-of-the-day group-buying websites in China, and in 2015 it has 200 million users [36].One of the goals of Meituan is to find the most appropriate consumers for merchants and to provide the most efficient Internet promotion [37].To this end, the proposed model in this paper will be applied to address the problem.Based on the model, decision-maker can extract appropriate consumers for different advertisements (e.g., cate, entertainment, and shopping) at different periods and then develop the most efficient Internet promotion strategy.
Without loss of generality, in this case study, six candidate users (i.e.,  = 6) are under consideration, and three types of advertisement information (i.e.,  = 3) will be posted to them.The observation of random forwarding actions contains four periods (i.e.,  = 4), and each period contains thirty time nodes (i.e.,   = 30,  = 1, 2, 3, 4).The objective is to determine which two users should be extracted and which type of advertisement information should be posted to these users at the corresponding period.It should be noted that the dimension of candidate set of users in case study is much less than the actual number of the users.The setting of this parameter is mainly based on the following two considerations.On the one hand, the main purpose of conducting case study is demonstrating the application of the proposed model, and low-dimensional parameter setting helps to clearly demonstrate calculation process.On the other hand, in the proposed model, although the dimensions of parameter settings have effect on computational complexity, the effect is little, because there are many data processing tools for mass data under the environment of big data.In essence, the key step of calculation of NS is Step 1: deriving   's p.m.f. and   's p.m.f., where the class intervals of observations and their corresponding relative frequencies can be obtained from the histograms of observations.In real life, when the amount of observations is massive, the histograms can be directly obtained by using Statistic Package for Social Science (SPSS, a widely used program for statistical analysis in social science) (see the Appendix).In this sense, the model implementation in practice is feasible.
For the sake of clarity, the observation parameters   and   are, respectively, listed in Tables 1 and 2.

Results and Analysis.
Based on the calculation steps of NS introduced in Section 3, we can obtain each user's NS in regard to corresponding advertisement information at corresponding period.As an example of calculation steps, we give the calculation of NS 123 as follows.
According to the observation parameters in Tables 1 and  2, we describe two groups of data (data in bold font) by histograms as shown in Figure 4.
Table 3 and Figure 5 show which two users should be extracted and which type of advertisement information should be posted to them at the corresponding period.For example, at the first period, users  3 and  4 are most susceptible to information  1 , while users  2 and  6 are most susceptible to information  2 .At the second and third periods, users  2 and  4 are most susceptible to information  1 .Users  3 and  4 are most susceptible to information  2 at the third and fourth periods.These indicate that on the one hand different users are susceptible to different types of information at the same period, and on the other hand the same user is susceptible to different types of information at different periods.On the whole, users  2 ,   more attractive.To quantitatively perform further analysis on the conclusion mentioned above, we calculate user's average NS and information's average NS, respectively, shown in Figures 6 and 7.
It can be seen from Figure 6 that, in regard to information  1 ,  4 is the most susceptible user, followed by  1 .In regard to information  2 ,  4 is also the most susceptible user, followed by  3 .In regard to information  3 ,  6 is the most susceptible user, followed by  1 .Compared with other users,  5 is the least susceptible user, and he seems to be susceptible to none of these types of information.Indeed, information  2 and information  1 are more attractive, which can also be shown in Figure 7.Moreover, Figure 7 shows that values of average NS of three types of information at the fourth period are the highest, which means that advertisement at this period will be most effective.

Comparison with Other Methods of Measuring User
Susceptibility in the Existing Literature.To comprehensively analyze the proposed method for measuring user susceptibility, in this subsection, we will give the comparison with other methods in the existing literature [33][34][35].The comparison includes quantified items, quantitative measures, mathematical model, key input parameters for decisions, implementation process in practice, and decision objectives.
For the sake of clarity, the results of the comparison are displayed in Table 4.
Table 4: Comparison with other methods in the existing literature.
Literature [33] Literature [34] Literature [35] This  It can be seen from Table 4 that, on the one hand, mathematical models are common tools of quantifying user susceptibility for information diffusion and, on the other hand, given the difference in decision objectives and quantitative measures, the expression of the model is different, making the input parameters for decisions different in different methods.Generally, in the case that the complexities of the models are equivalent, the model with fewer input parameters is easier for decision-makers (practitioners) to implement in practice.Solving the model with more input parameters requires decision-makers to know specialized knowledge and make subjective judgments, which can enhance difficulty and subjectivity of decision-making.In this sense, the models in literature [33] and this paper may be superior to those in literature [34].
With respect to the models in literature [33] and this paper, it is difficult to say which one is better, since different models serve different decision objectives.By conducting a randomized experiment, the decision results in literature [33] are that younger users are more susceptible than older users, and married individuals are the least susceptible in the decision to adopt the product offered.These decision results give some suggestions in the spread of the product in social networks from a macroscopic perspective.In comparison, the decision results (answers to the three questions) in this paper can provide the practitioners with specific reference to make rational decisions on effective information diffusion.In addition, the decision results in literature [33] are static to some extent.Our decision results are dynamic and can be updated with the updated observation data, which can help the practitioners to make information diffusion strategy dynamically.

Conclusion
In this work, a novel and efficient model for analyzing the effect of nodes on information diffusion is proposed based on universal generating function (UGF) method and discrete stress-strength interference (DSSI) theory.In this model, the effect of user on information diffusion is quantified as node susceptibility (NS), and based on NS the proposed model can help decision-makers to identify which users are most susceptible to the corresponding information at different periods.The contributions of the research can be summarized as follows.
(1) To take into account the influence of randomness and uncertainty, the model introduces a novel and universal evaluation criterion-node susceptibility (NS), based on discrete stress-strength interference (DSSI) theory.Since the calculation of NS is based on the realistic observations of the corresponding random variables, the decision results are rational and objective.
(2) By modeling random forwarding actions, the effect of network nodes on information diffusion is analyzed quantitatively and dynamically.In proposed model, three main questions are solved: (i) which nodes are most susceptible, (ii) which types of information they are most susceptible to, and (iii) when they are most susceptible.The solutions of these questions are very helpful for the practitioners to make rational decisions on effective information diffusion.(3) Different from the existing related works that mainly focus on extracting the most influential nodes, this work focuses on extracting the most susceptible nodes, which exploits a new idea for studying information diffusion over social networks.
Despite the contributions, this study has several limitations.Although the proposed model can provide objective and dynamic decisions based on NS, it cannot provide the cause of the fluctuation of NS.In other words, decisionmakers may not know why some users are not susceptible and why some types of information are not attractive.In addition, users' forwarding actions may be influenced by network structure or topology relationship of users, which are not considered in the model.
Based on these considerations, this research suggests two avenues for future research.(i) To make the factors that influence the fluctuation of NS more obvious, future researchers can introduce information evaluation mechanism in the model.(ii) When modeling random forwarding actions, network structure or topology relationship of users can be considered as one of the factors influencing information diffusion.It is envisioned to be possible to apply the point-set topology theory and graph theory to address this new issue.the number of interval of output is 7. Finally, click OK, and the histogram of   's observations is obtained (Figure 12).Similarly, the histogram of   's observations can also be obtained based on the steps above.The general forms of two histograms are shown in Figure 3.
(c) Obtaining   's p.m.f. and   's p.m.f.Based on the Histograms in (b).The midpoint values of each class interval of two histograms are, respectively, treated as possible values of   and   , and relative frequencies of each class interval are treated as corresponding probabilities.Thus,   's p.m.f. and   's p.m.f.can, respectively, be obtained according to ( 13)-( 14) in Section 3.2.2.
Steps 2 * and 3 * are the same as Steps 2 and 3 we presented in Section 3.2.2,so we will not repeat here for reasons of brevity.

Notations
: Index of users : Index of types of information : Index of periods : Index of time nodes of the th period   : The th user   : The th type of information   : The th period

Figure 3 :
Figure 3: Histograms of stress and strength observations.

Figure 8 :Figure 9 :
Figure 8: Set   as the variable name in "Variable View."

Figure 12 :
Figure 12: Creating histogram with relative frequency as ordinate.