^{1}

^{2}

^{3}

^{4}

^{1}

^{1}

^{2}

^{3}

^{4}

A growing number of web users around the world have started to post their opinions on social media platforms and offer them for share. Building a highly scalable evolution prediction model by means of evolution trend volatility plays a significant role in the operations of enterprise marketing, public opinion supervision, personalized recommendation, and so forth. However, the historical patterns cannot cover the systematical time-series dynamic and volatility features in the prediction problems of a social network. This paper aims to investigate the popularity prediction problem from a time-series perspective utilizing dynamic linear models. First, the stationary and nonstationary time series of Weibo hot events are detected and transformed into time-dependent variables. Second, a systematic general popularity prediction model N-

Social media platforms, for example, Weibo and Facebook, enable web users to post their views in the virtual communities behind screens. This new type of communication has been well accepted and acclaimed for its apparent advantages of low cost and low user interaction risk. However, the obvious benefits have created potential problems that almost everybody may encounter. An increasing number of social users indulge in passing unverified gossip without doing anything meaningful in the virtual community. Therefore, establishing a highly extensible propagation prediction model has recognized the wisdom of employing social media analysis problems. The related research aims to provide reasonable technique for avoiding public opinion outbreaks and maintaining social stability.

Information popularity prediction in social networks is usually related to the information diffusion models and information-carrying patterns. The primary idea of popularity analysis is to design or learn a prediction model that can accurately reflect a hot event’s information propagation law. On the other hand, the temporal evolution of event popularity can be separated into two types (stationary and nonstationary) according to its trend degree of fluctuation. This work is based on the assumption that a hot event’s evolution prediction accuracy is closely correlated to its propagation type in a social network. Some of the current excellent literature on prediction and detection problems in engineering applications pays particular attention to deep learning [

The rest of this paper is organized as follows. Section

Social networks play an essential role in our daily life. People join multiple social network platforms, for example, Facebook and Twitter, to enjoy different services. Information propagation through online social networks has proved to be a powerful tool in many situations. The review in [

Previous studies of information propagation prediction in social networks refer to one of the following three tasks: predicting information popularity [

Weibo 1 is the unique microblog platform of China which can satisfy regular users’ demands to send messages across the country. Any public opinion that might spread in society would certainly rely on the Weibo platform in China. There will always be hundreds of millions of users forwarding their posts to Weibo from the portal, forums, moments, and other media if the content is novel or valuable. Weibo social network is popularly accepted as a monitoring center for public opinion in the era of big data. This study takes Weibo as the carrier to deal with popularity prediction considering its influence on all social networks in China.

The popularity prediction in social networks is a typical time-series problem, in which solutions can provide useful decision supports for avoiding negative propaganda, rumor, hotspot disposal, and so forth. The current research on Weibo popularity prediction is mainly based on the methods of epidemic models [

Traditional predictive models are usually related to information cascade theory [

Some investigators [

Popularity trend fluctuation in a social network can be directly exploited for the evolution prediction problem. Some studies deal with the phenomenon of trend fluctuation but lack the investigation of the volatility role in social networks’ prediction problems. Wang et al. [

Constructing a general evolutionary prediction model with time series is considered one of the most critical social network analysis tasks. Understanding the evolution trend volatility and offering some vital insights into its applications in the popularity prediction problem benefit a lot in the time-series analysis of social networks.

This paper addresses studying the popularity prediction problem from a time-series perspective by means of dynamic linear models. A systematic general popularity prediction model N -

Time-series schematic diagram. The evolution of the stationary time series is usually around a fixed interval, while the evolution interval of the nonstationary time series has apparent variation. (a) Stationary time series. (b) Nonstationary time series.

On top of that, the explanatory compensation variables are introduced into the model N -

One of the more significant contributions to emerge from this study is that we recommend a general time-series modeling method of popularity prediction by establishing dynamic linear models based on stationary and nonstationary time-series evolution characteristics in social networks.

The benefit of N -

This study evaluates N -

This section systematically reviews the evolution analysis technologies based on model theory, aiming to analyze evolution prediction of information diffusion dynamics in social networks. Characterization of time series that appeared in social networks is essential for our increased perception of network information diffusion rules. Simultaneously, we require learning the time-series law of social networks so that a high-precision information diffusion prediction model can be established. Time series refers to a number sequence arranged by successive observations of the same phenomenon at separate times. Statistically, a time series is the realization of a stochastic process, which can be divided into stationary and nonstationary according to its statistical characteristics. However, the series that appear in social networks are, for the most part, nonstationary. The following part of this section describes the preliminaries to establish a popularity evolutionary prediction model of information diffusion dynamics based on time-series characteristics.

The state space model (SSM) is a dynamic time-domain model with time-dependent variables. A growing number of studies around the world have started to apply SSM in social and economic analysis. One purpose of SSM was to process time sequences of several variables into vector time series, transforming the information cascade problem of the social network into a vector sequence analysis problem. Specifically, an SSM is a dynamic system that evolves over time, which is determined by two different time series. One is called the state sequence, denoted by

Markov hypothesis: the state sequence

Conditional independence assumption: under the condition of

Based on the above assumptions, the primary form of an SSM can be formalized as the state equation (

Constructing an SSM for a specific time-series system is to estimate the state sequences that may occur in the system life cycle. The state estimation problem is a dynamic estimation problem, which can be divided into three types: smoothing, filtering, and prediction. Among them, filtering is the core. The process of state estimation is shown in Figure

A state estimation problem is called smoothing if we utilize the real-time information up to the current moment

A state estimation problem is addressed as filtering if both the real-time system information up to the current moment

A state estimation problem is designated as prediction if real-time information up to the current moment

The process of state estimation. The initial system state

In summary, the SSM can be seen as a theoretical algorithm framework, which provides a flexible method for time-series analysis by means of state vectors. The system state prediction based on the SSM is convenient for the analyst employing statistical theory to test the model. Existing research on system state prediction recognizes the critical role played by the SSM. Many economic and financial time-series models can be represented in the form of SSM, such as the autoregressive integrated moving average (ARIMA) model [

The DLM is presented as a particular case of a general SSM with Gaussian and linear characteristics. The estimation and forecasting tasks can be obtained recursively by the well-known Kalman filter of DLM which is the most prevalent and widely accepted approach in analyzing state space problems. The information cascade prediction of social networks can therefore be simplified as a state iteration problem. The characteristic of a DLM is that all model variables obey Gaussian distribution and satisfy linear relationships. A DLM can be represented by the following mathematical form:

Equations (

Let

Given the state transition matrix

Given the state transition matrix

When the parameters

The information diffusion evolution prediction of social networks mainly involves the first two categories. The following is a brief description of the Kalman filtering, smoothing, prediction, and the maximum likelihood estimation method applied in this study.

As mentioned above, the state estimation can be divided into prediction, filtering, and smoothing according to the information obtained from the system. Among them, filtering is the core of state estimation. In the filtering problem, the data are supposed to arrive sequentially in time. We require a procedure to estimate the state vector’s current value, based on the observations up to time

The Kalman filtering algorithm [

The conditional distribution equation (

The primary operations of the Kalman filtering are predicted, which aim to ascertain the prior distribution and normalization factor of the system states. First of all, we calculate the predicted mean

On top of that, we estimate the predicted mean

The correction operations emphasized in the Kalman filtering were based on the parameter values obtained by the prediction operations, the objective of which is to calculate the filtering mean

Prediction operations:

Correction operations:

Smoothing estimation in time-series problems is mainly used for retrospective analysis of observation sequences to explore potential phenomena or laws underlying the observation values. In the case of many practical problems, the system’s a priori parameters cannot be calculated directly, which need to be estimated. For example, in economic research, researchers may necessitate to apply a country’s recent gross domestic product to understand the socioeconomic behavior of a country’s systems in the past. The forward-backward smoothing algorithm can make such an estimation based on an observation sequence. For each state in a discrete system, it calculates both the “forward” probability of reaching the state and the “backward” probability of generating the model’s final state. This study applies the underlying gradient descent idea in the forward-backward smoothing algorithm to modify the past state observation values. Filtering distribution is necessary for the forward-backward smoothing. The derivation process of the filtering distribution is as follows:

According to equation (

This study takes the output, that is, the state mean

The smooth distribution and the normal distribution with parameters

Then, the filter distribution

Under the assumption of the Gaussian distribution related to the DLM models, the execution process of the forward-backward smoothing algorithm can be summarized as Algorithm

end

end

As explained in Algorithm

Prediction of state variables:

Prediction of observation variable:

Maximum likelihood estimation (MLE) is one of the most important and widely used parameter estimation methods in statistics. The idea is to determine the unknown parameters in a system model by maximizing the probability of observed values. Under the condition that the state transition matrix

This section uses the simplified expression

The derivation

The purpose of equation (

A growing number of information senders and receivers around the world have started to post their speeches, thoughts, and comments online on some new social media platforms such as Weibo, Facebook, Twitter, WeChat, and LinkedIn. These attention-grabbing open media have been well accepted and acclaimed for their apparent advantages: faster transmission speed, being more timely, broader coverage, higher public, and being more interactive. These types of social network platforms will play an increasingly significant role in our future life, society, business, and so forth.

The graphical properties of the users and the relationships between users in a social platform are called social networks. The information popularity prediction in social networks is a typical time-series problem, the solutions of which can provide useful decision supports for avoiding negative propaganda, rumor, hotspot disposal, and so forth. The issue of time-series analysis in social networks has received considerable attention. However, a systematic understanding and process of how to analyze time-series problems in social networks is still lacking.

This section takes Weibo as an example to discuss how the DLM can be used as a standard analysis framework for the propagation and prediction issues of social networks with time-series characteristics. The study first highlights some important statistics and evolution rules of the Weibo social network, including the number of daily posts, retweets, comments, and how they change with time. Second, we use the theory of the DLM, combined with social data and search data, to decompose the hotspot events’ time series and predict their propagation effect on the Weibo network.

In this paper, the time series of information propagation in Weibo is regarded as the superposition of trending, periodic, and random terms.

The trend term characterizes the objective evolutionary trend of Weibo hotspot events.

The periodic term describes the regular changes due to cyclic factors such as the incubation, development, outbreak, mature, recession, and extinction periods of a network public opinion event.

The random term involves a sudden change or noise disturbance. A sudden change refers to a change caused by unexpected circumstances such as the hot news retweet or emergent unexpected social events. The noise illustrates the influence of many random factors, such as different attitudes of users who release or forward messages, public opinion reversal caused by facts, and public opinion explosion.

The DLM model can describe the parameters of the information propagation time series listed above. The proposed models are given in Section

A well-structured propagation prediction model benefits researchers or government departments to find the regularity, suddenness, or reversibility in the process of information propagation. Predicting the spread of a particular news event is helpful to guide the negative public opinion in advance. There are two challenges facing the research on the information propagation model and the prediction of the online social networks that are similar to Weibo:

Avoidance of low-quality data: Spam users (nonzombies) are flooded in every Weibo platform corner. Many users engaged in Taobao, microbusiness, and network shopping platforms continue to release advertisements, order display, and purchasing agency information on Weibo. Such jumbled information results in low-quality data. Despite the importance of statistical characteristics such as the number of posts, comments, retweets, and mentions of a hot event on the Weibo social network, there remains a paucity acceptable accuracy of information propagation prediction based on only the specific numbers.

Data association mining: The essence of prediction analysis is to find the law of the event occurrence from the mass data. The timestamp of a time-series system cannot be used directly for prediction. We still require discovering the underlying factors that influence the time-series state change.

This study adopts a divide-and-conquer strategy in response to the issue of low-quality data. The propagation of a hot event on Weibo can be divided into stationary and nonstationary by calculating the event’s time-series volatility. First of all, we use DLM models established by using the historical time-series information to deal with stationary event popularity prediction. On top of that, some explanatory variables (hot event characteristic variables) are captured to enhance the DLM models. The improved models are applied to deal with the propagation prediction of nonstationary situations.

Association mining methods in [

A popularity prediction framework for information propagation of a hot event on the Weibo social network. First, the explanatory (time) variables of an event were determined by the data preprocessing, and the key variables affecting the event propagation were identified by the correlation analysis. Second, the framework recognizes the propagation type (stationary or nonstationary) of the event. Third, a DLM is established based on the variables and the event propagation type. Consequently, time-series analysis can be conveniently achieved based on the DLM.

A DLM is called a stationary event propagation prediction model (

Definition

The model representation of

The observation equation:

The state equation:

A DLM is called a nonstationary event propagation prediction model (N -

Definition

The observation equation:

The state equation:

This section discusses the datasets, stationary and nonstationary identification, and the model evolution analysis of three hot events that appeared on the Weibo social network.

To estimate the effectiveness of the proposed models for popularity prediction, the survey collected the datasets from the Chinese popular social platform Sina Weibo 2, including three Weibo hot public opinion events with the three content categories of social, entertainment, and international news from 2015 to 2017. A brief description of the three events is presented as follows:

The datasets “Chengdu female driver was beaten.csv,” “Baihe Bai derailed.csv,” and “THADD incident.csv” refer to the social event (Chengdu-Driver), an entertainment event (Baihe), and an international news event (THAAD), respectively.

The original hot event data is prepared by crawling user information pages involved in the three event topics, including the post contents, user attributes, follower relations, the number of comments and thumb up, post time, and forwarding (propagation) links. This study implements the data crawling and the proposed prediction models by the programming language Python version 3.7 and runs the codes on a Linux Server with Intel(R) Xeon(R) CPU (E5-2620 v4) and GeForce TITAN X GPU (12 GB memory). The number of valid entries is 95%, 96%, and 98% of the total results returning from our crawler system, respectively. The details about the event datasets are shown in Table

Details about the three datasets of hot public opinion events.

Datasets | Classification | Users | Followers | Forwarding statistic |
---|---|---|---|---|

Chengdu-Driver | Social | 7259 | 135455 | 159711 |

Baihe | Entertainment | 1533 | 1395880 | 536674 |

THAAD | International news | 5532 | 1891069 | 2130227 |

Trends for the development of the three Weibo hot events. The social event is more likely to receive social attention according to an earlier increase of forwarding behaviors, followed by entertainment and international news.

Our experiments will evaluate and predict the popularity of the three hot events on the Weibo social network. The best environment setting is to get the full structure of Weibo as a map of the spread. However, Weibo allows only a small part of follower relations to be returned by crawlers. Hence, the relations that come from traditional Weibo crawling methods are quite incomplete. On the other hand, the size of Weibo network is enormous. The empirical treatment is to extract a subnet with common characteristics from the original Weibo network. We repeatedly remove the nodes with the degrees less than 2 or more than 1000. Then, the community discovery algorithm reported in [

The degree distributions of the Weibo subnet used in the experiments.

Historically, the statistical concept “volatility” has been used to describe the degree of return fluctuation on the investment in financial markets. Using this concept to distinguish the stationary and nonstationary events is recognized as an essential experimental step. The experiment requires to be explicit about exactly how to obtain the “volatility” of a hot event on Weibo social network. The calculation process is defined as follows.

Calculate the percentage return:

Calculate the volatility (standard deviation) of sequence

The life cycle of an event is divided into time-series intervals in days.

Count the forwarding numbers from 8 : 00 to 24 : 00 at the interval of 1 hour within a day (obtaining sequence

The percentage return is calculated to obtain the sequence

Calculate the day-to-day volatility, denoted by

The possible value of volatility is divided into several (set as nine) intervals. The forwarding numbers of the event in different volatility intervals are calculated by simple numerical statistics.

Estimate the ratio of the forwarding number to the total number of retweets with different volatility intervals.

Calculate the sum of the forwarding ratios with volatility values greater than 1. If the parameter is more than 75%, the event is decided to be nonstationary; otherwise, it is stationary.

The threshold of 75% is used to distinguish the application scopes of the two models

Stationary sequence recognition result on the event Chengdu-Driver: (a) forwarding longitudinal series 1–4; (b) forwarding longitudinal series 3–4. The sum of the forwarding ratios with volatility values greater than 1 equals 0.78 which is shown in (a). The score indicates the nonstationary characteristic of the event. The instability is mainly caused by the forwarding behavior of levels 3 and 4 according to the value 0.88 shown in (b).

Figure

The red point shown in Figure

Stationary sequence recognition results on the events Baihe and THAAD. The degree of nonstationarity of the three events is sorted as THAAD (0.91), Baihe (0.83), and Chengdu-Driver (0.78).

Having explained what is meant by explanatory variables, we will now discuss how to find the best explanatory variable. A simple linear regression model is used to establish the quantitative relationships between event data and explanatory variables to determine their correlation scores. This study selects the best explanatory variable by comparing the Pearson Correlation Coefficients between all variables and the forwarding numbers at time series.

We refer to the percentage of forwarding numbers, comment numbers, and thumb-up numbers of an event at time

Correlation analysis results of explanatory variables.

Events | CN | TUN | SI | CI | TUI | AveL |
---|---|---|---|---|---|---|

Chengdu-Driver | 0.7233 | 0.6725 | 0.6658 | 0.5413 | 0.1442 | |

Baihe | 0.7297 | 0.5425 | 0.6835 | 0.5511 | 0.2234 | |

THAAD | 0.7784 | 0.5265 | 0.8011 | 0.5643 | 0.1143 |

In general, the higher the correlation between the explanatory variable sequence and the target sequence, the better the prediction effect. Figure

The quantitative relationship between the event popularity evolution and the social intensity of the event Chengdu-Driver. An apparent positive correlation between popularity and social intensity can be observed.

In order to evaluate the effect of the time-series model and the temporal model after the introduction of the explanatory variable SI, we compare the gradient boosting regression tree (GBRT) model based on model combinations. The evaluation indexes adopted include the following: the determination coefficient

Comparison of the evaluation indicators of prediction models.

Prediction model | ||||
---|---|---|---|---|

GBRT | 0.7645 | 41% | 76% | 103.45 |

0.8761 | 44% | 77% | 89.63 | |

N - | 0.8975 | 52% | 81% | 88.74 |

The results of this study indicate that when the popularity prediction of a Weibo hot event only depends on the time information, the error between the prediction result and the real event propagation is usually large. By comparing three hot Weibo events, we can see from Figures

Popularity evolution prediction of social event: “Chengdu female driver was beaten by changing routes.” (a) Prediction accuracy comparison. (b) Cumulative relative error analysis. The predicted result of the NSEPP model with explanatory variables is the closest to the actual evolution. Social intensity can significantly reduce the cumulative error.

Popularity evolution prediction of entertainment event: “Baihe Bai derailed.” (a) Prediction accuracy comparison. (b) Cumulative relative error analysis. A similar prediction result is obtained concerning a slightly rising error compared with Figure

Popularity evolution of international news event: “THAAD.” (a) Prediction accuracy comparison. (b) Cumulative relative error analysis. A similar prediction result is obtained concerning a slightly rising error compared with Figure

We illustrate the experimental results by using event Chengdu-Driver as an example. The blue trend line recorded in Figure

The entertainment event Baihe (Figure

The common feature of the three Weibo hotspot events is that they all have burst traffic characteristics. In order to verify the adaptability of these events to the parameters defined by the prediction models, the event THAAD with the highest volatility was selected to decompose the time-series quantification values. Figure

Time-series decomposition of nonstationary event “THAAD” with burst traffic. (a) The trend

This research is undertaken to design a time-series analysis framework and provide a generic solution for the evolutionary prediction of information popularity dynamics in social networks. This study also aims to alleviate the contradiction between stationary and nonstationary event propagation in a standard prediction model. The fact that all hot events in a social network are analyzed in a class of models without considering their volatility may not necessarily serve as an effective way to increase the prediction accuracy. Hence, the framework’s investigation divides event propagation into two kinds (stationary and nonstationary) according to event volatility. This study defines two specific DLM models, namely,

This paper mainly reveals the superiority of using the N -

Experiments on three popular hot events that appeared on the Weibo social network with different topic categories confirm the effectiveness and superiority of the N -^{2},

The data used to support the findings of this study are publicly available at

The authors declare that they have no conflicts of Interest.

This work was supported by the National Natural Science Foundation of China (nos. 61902324, 11426179, and 61872298), the Social Science Planning Project of Sichuan Province (no. SC20TJ020), the Science and Technology Program of Sichuan Province (nos. 2021YFQ0008, 2020JDRC0067, and 2019GFW131), the Foundation of Cyberspace Security Key Laboratory of Sichuan Higher Education Institutions (no. sjzz2016-73), the Scientific Research Fund of Sichuan Provincial Education Committee (nos. 15ZB0134 and 17ZA0360), and the Open Fund Project of Xihua University (nos. 20170410143123 and szjj2015-059).