Evolutionary Prediction of Nonstationary Event Popularity Dynamics of Weibo Social Network Using Time-Series Characteristics

,


Introduction
Social media platforms, for example, Weibo and Facebook, enable web users to post their views in the virtual communities behind screens. is new type of communication has been well accepted and acclaimed for its apparent advantages of low cost and low user interaction risk. However, the obvious benefits have created potential problems that almost everybody may encounter. An increasing number of social users indulge in passing unverified gossip without doing anything meaningful in the virtual community. erefore, establishing a highly extensible propagation prediction model has recognized the wisdom of employing social media analysis problems. e related research aims to provide reasonable technique for avoiding public opinion outbreaks and maintaining social stability.
Information popularity prediction in social networks is usually related to the information diffusion models and information-carrying patterns. e primary idea of popularity analysis is to design or learn a prediction model that can accurately reflect a hot event's information propagation law. On the other hand, the temporal evolution of event popularity can be separated into two types (stationary and nonstationary) according to its trend degree of fluctuation.
is work is based on the assumption that a hot event's evolution prediction accuracy is closely correlated to its propagation type in a social network. Some of the current excellent literature on prediction and detection problems in engineering applications pays particular attention to deep learning [1][2][3]. It is now well established from a variety of studies (e.g., Lin, 2020 [1]) that a small dataset can train a high-precision model. Some studies of engineering problems benefit a lot by using hybrid deep learning models [2] or deep fusion models [3]. On the other hand, dynamic liner models (DLM) [4,5] are quite suitable to be employed for the prediction problem concerning their feature extensibility. DLM has a strong interpretability compared with the deep learning models. Hence, this study seeks to obtain a general popularity prediction framework based on DLM, which will help address the evolutionary prediction problem of information propagation dynamics based on stationary theory and time-series characteristics. e rest of this paper is organized as follows. Section 2 summarizes the related works. Section 3 systematically reviews the evolution analysis models and assessment methods used in this study, including the state space models, dynamic linear models, Kalman filtering, Kalman smoothness and prediction, and maximum likelihood estimation. Section 4 details the proposed information popularity prediction models based on DLM. Section 5 presents the experimental results. Finally, we conclude this work in Section 6.

Related Work
Social networks play an essential role in our daily life. People join multiple social network platforms, for example, Facebook and Twitter, to enjoy different services. Information propagation through online social networks has proved to be a powerful tool in many situations. e review in [6] on information propagation has highlighted several application disciplines ranging from biology to social sciences, mathematics, physics, and computer science [7]. e classical application in this field is the virus spread analysis in ecology [8], biology [9], and marketing [10].
Previous studies of information propagation prediction in social networks refer to one of the following three tasks: predicting information popularity [11][12][13][14], foretelling user influence [15][16][17][18], and divining information diffusion paths (links) [19][20][21][22]. Some of the literature focuses on the user influence in the social analysis [15,17]. Some significant studies are concerned with link prediction to reveal the evolution of real social networks [19,20]. Much of the current literature on the propagation prediction pays particular attention to information popularity, since it can clearly and intuitively reveal the genuine impact of a hot event by employing statistics. For example, the popularity of video content on social network platforms TikTok and YouTube is often regulated by the statistics such as views, followers, favorites, shares, and downloads [23,24]. e popularity prediction focuses on the whole trend of information diffusion, for example, the propagation ranges [25] and lifespans [26,27], to provide the valuable decision-making supports for network public opinion monitoring and guidance. is study sets out to propose the universal scheme of the information popularity prediction problems with time-series characteristics.
Weibo 1 is the unique microblog platform of China which can satisfy regular users' demands to send messages across the country. Any public opinion that might spread in society would certainly rely on the Weibo platform in China.
ere will always be hundreds of millions of users forwarding their posts to Weibo from the portal, forums, moments, and other media if the content is novel or valuable. Weibo social network is popularly accepted as a monitoring center for public opinion in the era of big data.
is study takes Weibo as the carrier to deal with popularity prediction considering its influence on all social networks in China.
e popularity prediction in social networks is a typical time-series problem, in which solutions can provide useful decision supports for avoiding negative propaganda, rumor, hotspot disposal, and so forth. e current research on Weibo popularity prediction is mainly based on the methods of epidemic models [28,29], classification models [30,31], and regression models [32,33]. Such approaches highlight the requirement for time-series analysis of social networks. Time-series analysis can help researchers realize the random mechanism of generating time feature sequences, set up the data generation model, and predict the future possible values of time series. However, a systematic understanding and process of how to analyze time-series problems in the evolution prediction of a social network is still lacking.
Traditional predictive models are usually related to information cascade theory [6], divided into graph-based and non-graph-based ones. e former explore the dynamics of information propagation by individual nodes starting from an initial set of nodes and spreading through the network based on a cascade model, for example, linear threshold models [34] and independent cascade models [35]. e latter mathematically study diffusion using population-based dynamics, for example, SIR models [36]. Some recent attention has focused on the application of incorporating both traditional prediction models and time-series factors [37,38]. Wu et al. [39] have shown an example to apply the time series of dynamic data in the task of popularity analysis.
is method focuses on the regression modeling of time series corresponding to the propagation of the user-generated content. e investigation pattern of user-generated content popularity over time is supported by Hu et al. [40], who utilize regression models and three time-series features to recognize content changes. Matsubara et al. [41] accompanied additional examination of popularity prediction by applying the SpikeM pattern to fit the above-reported time-series models. e published studies have focused on using a temporal feature as an analytical tool rather than concentrating on the model theory of time series in social networks. Some tiny differences that appeared in the popularity of prediction research may lead to consequential large variations in approach.
Some investigators [38,42] have examined the popularity of the trend analysis in the prediction problem of social networks. Manshad et al. [38] suggested a new time-series trend prediction method based on irregular cellular learning automaton and evolutionary computation. Figueiredo et al. [42] used YouTube videos as samples to extract popular trends from historical uploaded video objects, combining the new time-series classification algorithm (TrendLearner) with the target features to predict the new target trend.
Popularity trend fluctuation in a social network can be directly exploited for the evolution prediction problem. Some studies deal with the phenomenon of trend fluctuation but lack the investigation of the volatility role in social networks' prediction problems. Wang et al. [37] introduced a time-series prediction method based on complex network theory, which maps the time series to a data network and extracts fluctuation sequence features based on its network topology. e data fluctuation term is recommended as an optimal proposal to the prediction problem for the first time. Hu et al. [43] proposed a time-series feature space containing the average of the prevalence, trend, and period to capture viral hot topics' epidemic behavior. e study observed a high degree of similarity between the short-term trend fluctuations of these hot topics.
Constructing a general evolutionary prediction model with time series is considered one of the most critical social network analysis tasks. Understanding the evolution trend volatility and offering some vital insights into its applications in the popularity prediction problem benefit a lot in the time-series analysis of social networks. is paper addresses studying the popularity prediction problem from a time-series perspective by means of dynamic linear models. A systematic general popularity prediction model N -SEP 2 M is proposed to recognize and predict the nonstationary event propagation of a hot event on the Weibo social network. First of all, the popularity evolution of a social network can be distinctly divided into two patterns, stationary (Figure 1(a)) and nonstationary (Figure 1(b)), according to the apparent difference in volatility. e institution concept of the volatility of a time feature sequence shows that the quantitative fluctuation term has a high correlation in the accuracy of event propagation prediction. Our idea requires creating separate prediction models for the stationary and nonstationary events based on volatility calculation.
On top of that, the explanatory compensation variables are introduced into the model N -SEP 2 M, optimizing dynamic time-series prediction models of the Weibo hotspot event propagation. We summarize the main contributions as follows: (i) One of the more significant contributions to emerge from this study is that we recommend a general time-series modeling method of popularity prediction by establishing dynamic linear models based on stationary and nonstationary time-series evolution characteristics in social networks. (ii) e benefit of N -SEP 2 M is that the above social network prediction problems' feature parameters can be updated simply by adding matrix rows and columns of the model, thus avoiding the negative influence on model design and model adaptability for the variations of the target prediction task. (iii) is study evaluates N -SEP 2 M on three real-world hot events of the Weibo social network. e results show the superior performance by using a proposed explanatory compensation variable, social intensity (SI). e accuracy of the N -SEP 2 M model can be significantly improved by adding compensation parameters.

Preliminaries
is section systematically reviews the evolution analysis technologies based on model theory, aiming to analyze evolution prediction of information diffusion dynamics in social networks. Characterization of time series that appeared in social networks is essential for our increased perception of network information diffusion rules. Simultaneously, we require learning the time-series law of social networks so that a high-precision information diffusion prediction model can be established. Time series refers to a number sequence arranged by successive observations of the same phenomenon at separate times. Statistically, a time series is the realization of a stochastic process, which can be divided into stationary and nonstationary according to its statistical characteristics. However, the series that appear in social networks are, for the most part, nonstationary. e following part of this section describes the preliminaries to establish a popularity evolutionary prediction model of information diffusion dynamics based on time-series characteristics.

State Space Model.
e state space model (SSM) is a dynamic time-domain model with time-dependent variables. A growing number of studies around the world have started to apply SSM in social and economic analysis. One purpose of SSM was to process time sequences of several variables into vector time series, transforming the information cascade problem of the social network into a vector sequence analysis problem. Specifically, an SSM is a dynamic system that evolves over time, which is determined by two different time series. One is called the state sequence, denoted by θ t , that is, θ(t) { }, with respect to the discrete time variable t ∈ N. Any state belonging to sequence θ t in a system is hidden and unobservable, since the system is inevitably subject to external interference. e other is counterpart observable sequence, denoted by y t , t ∈ N. e SSM connects the two series by introducing an iterative state equation and an observation equation.
e former depicts the transition relation of any adjacent states between the current state θ t− 1 and the next moment θ t . e latter indicates the internal relationship between observation and state sequences. An SSM follows two basic assumptions: Based on the above assumptions, the primary form of an SSM can be formalized as the state equation (1) and the observation equation (2): where the mathematical notations ω t and ] t represent system noise and measurement noise, respectively. e two noises are typically considered to be the mutually independent Gaussian distributions. e function g t depicts the transfer relationship of the state variables θ t . Both g t and θ t may be linear or nonlinear. According to the different function forms, the SSM can be roughly divided into three categories: linear, nonlinear, and mixed linear/nonlinear. e complexity increases from the linear to the nonlinear models. Function h t is used to measure the mapping relationship between the state variables in sequence θ t and the observation variables in sequence y t .
Constructing an SSM for a specific time-series system is to estimate the state sequences that may occur in the system life cycle. e state estimation problem is a dynamic estimation problem, which can be divided into three types: smoothing, filtering, and prediction. Among them, filtering is the core. e process of state estimation is shown in Figure 2.
(i) A state estimation problem is called smoothing if we utilize the real-time information up to the current moment t to estimate and trace back the past states. In other words, the noise-removed state values within the total time T, that is, ∀t < T, are the target to be traced and identified by investigating the observation sequence y 1 , y 2 , . . . , y t . (ii) A state estimation problem is addressed as filtering if both the real-time system information up to the current moment t and the observed value y t are used to estimate and revise the current state θ t . (iii) A state estimation problem is designated as prediction if real-time information up to the current moment t is adopted to predict the future states. e problem estimates the state values at time t + 1, t + 2, . . . , t + k, respectively, according to the known observation sequence y 1 , y 2 , . . . , y t .
In summary, the SSM can be seen as a theoretical algorithm framework, which provides a flexible method for time-series analysis by means of state vectors. e system state prediction based on the SSM is convenient for the analyst employing statistical theory to test the model. Existing research on system state prediction recognizes the critical role played by the SSM. Many economic and financial time-series models can be represented in the form of SSM, such as the autoregressive integrated moving average (ARIMA) model [44], dynamic linear model (DLM) [4,5], and stochastic volatility model (SVM) [45]. is paper innovatively proposes a general algorithm framework for the time-series analysis of information propagation in social networks by using a DLM. e approach (Section 3.2) introduced for this study is one of a well-designed DLM. An empirical analysis of information dissemination prediction is presented in Section 4 by using the Weibo social network data and DLMs.

Dynamic Linear Model.
e DLM is presented as a particular case of a general SSM with Gaussian and linear characteristics. e estimation and forecasting tasks can be obtained recursively by the well-known Kalman filter of DLM which is the most prevalent and widely accepted approach in analyzing state space problems. e information cascade prediction of social networks can therefore be simplified as a state iteration problem. e characteristic of a DLM is that all model variables obey Gaussian distribution and satisfy linear relationships. A DLM can be represented by the following mathematical form:   Discrete Dynamics in Nature and Society Equations (3) and (4)  e uppercase parameters G t and F t denote the state transition matrix and measurement matrix, respectively. e parameters ω t and ] t are the system noise and measurement noise, respectively. Two kinds of noises are independent of each other, which follow the Gaussian distribution.
Let φ � y t , θ t , G t , F t , W t , V t denote the parameter set related to a DLM. en, all parameters in set φ except the observed variables y t may be unknowns when a DLM is given a practical application task. e dynamic linear model can be used to process parameter estimation tasks. e problem can be divided into three categories according to different parameter estimation targets required in set φ: (i) Given the state transition matrix G t , measurement matrix F t , system noise W t , and measurement noise V t , the estimated target parameter is the state θ t . is kind of problem is a typical state estimation problem. en, we generally use the Kalman filtering, smoothing, and prediction algorithms. (ii) Given the state transition matrix G t and measurement matrix F t , the target is to estimate noise items W t and V t . en, the maximum likelihood estimation approach has obvious advantages. (iii) When the parameters θ t , G t , F t , W t , and V t are unknown and we require estimating the parameters in them, the Markov Chain Monte Carlo (MCMC) method based on Markov process theory is the potential choice.
e information diffusion evolution prediction of social networks mainly involves the first two categories. e following is a brief description of the Kalman filtering, smoothing, prediction, and the maximum likelihood estimation method applied in this study.

Kalman Filtering.
As mentioned above, the state estimation can be divided into prediction, filtering, and smoothing according to the information obtained from the system. Among them, filtering is the core of state estimation. In the filtering problem, the data are supposed to arrive sequentially in time. We require a procedure to estimate the state vector's current value, based on the observations up to time t, and to update our estimates and forecasts as new data become available at time t + 1. e filter process provides the available specifications for updating the current inference on the state vector as new data. We first give the derivation and procedure of the Kalman filtering algorithm.
e Kalman filtering algorithm [46,47] has been developed to recursively solve the linear filtering problem of discrete data by using the Bayes theorem [48]. e primary task is to predict the prior distribution of the state time, that is, the probability density at the time t, based on the state of the system at time t − 1. en, the algorithm corrects the prior distribution using the Bayes theorem after the likelihood factor, that is, system observation value at time t, is obtained. e result probability density is recognized as the posterior distribution of the state at time t. erefore, the state estimation of each system moment can be regarded as a normative prediction/ correction process. Let the abbreviated expression y 1:t with t ≤ T denote the observation sequence y 1 , y 2 , . . . , y t and let symbol p represent a conditional distribution. We formalize the Kalman filter process below to facilitate the introduction of filtering problems of social network information diffusion.
e conditional distribution equation (5) describes the probability distribution of the discrete random variable (state value θ t ) under the condition when another discrete random variable (observation sequence y 1:t ) obtains possible fixed values. Both variables θ t and y 1:t follow Gaussian distributions in a DLM model. All of the distributions represented as the form p(·) mentioned above can be uniquely determined by the expectation and covariances. e primary operations of the Kalman filtering are predicted, which aim to ascertain the prior distribution and normalization factor of the system states. First of all, we calculate the predicted mean α t and covariance matrix R t of state variables at any time t:  where G t and G t ′ are the state transition matrix and its transpose. Variable notation m t � k�t k�0 θ k p(θ k ) with respect to 0 ≤ t ≤ T denotes the state mean from θ 0 to θ t .
On top of that, we estimate the predicted mean f t and covariance matrix Q t of the observed variable at time t: where F t and F t ′ are the measurement matrix and its transpose. V t denotes the measurement noise. α t and R t− 1 are the predicted mean and covariance matrix calculated by equations (6) and (7), respectively. e correction operations emphasized in the Kalman filtering were based on the parameter values obtained by the prediction operations, the objective of which is to calculate the filtering mean m t and covariance matrix C t of the state variable at time t: where Q − 1 t is the inverse of covariance matrix Q t . e detailed procedure of Kalman filtering algorithm is given in Algorithm 1.

Kalman Smoothness and Prediction.
Smoothing estimation in time-series problems is mainly used for retrospective analysis of observation sequences to explore potential phenomena or laws underlying the observation values. In the case of many practical problems, the system's a priori parameters cannot be calculated directly, which need to be estimated. For example, in economic research, researchers may necessitate to apply a country's recent gross domestic product to understand the socioeconomic behavior of a country's systems in the past. e forward-backward smoothing algorithm can make such an estimation based on an observation sequence. For each state in a discrete system, it calculates both the "forward" probability of reaching the state and the "backward" probability of generating the model's final state.
is study applies the underlying gradient descent idea in the forward-backward smoothing algorithm to modify the past state observation values. Filtering distribution is necessary for the forward-backward smoothing. e derivation process of the filtering distribution is as follows: � p θ t |y 1:t p θ t+1 |θ t p θ t+1 |y 1: T p θ t+1 |y 1:t dθ t+1 .
According to equation (12), the entire smoothing process is reverse propelling by time, which employs the predicted distribution p(θ t+1 |y 1:t ), the filtering distribution p(θ t |y 1:t ) of the state, and the smooth distribution p(θ t |y 1: T ) of the next moment. rough forward filtering, the filter density distribution p(θ t |y 1:t ) and the predicted density distribution p(θ t+1 |y 1:t ) at time t can be obtained. en, we combine them with the smooth density distribution p(θ t+1 |y 1: T ) at time t + 1. e objective smooth density distribution p(θ t |y 1: T ) at time t can be deduced backward.
is study takes the output, that is, the state mean m t and covariance C t , of the Kalman filter algorithm (Algorithm 1) as the input of forward-backward smoothing and prediction algorithm. e aim is to obtain their estimated values that are similarly denoted by m e t and C e t , respectively. e algorithm follows a binary classification process. First, the smooth distribution is equal to the filter distribution if the time parameter satisfies t � T: C e t � Var θ t |y 1: en, the filter distribution p(θ t |y 1: T ) ∼ N(m e t , C e t ) can be obtained successfully.
Under the assumption of the Gaussian distribution related to the DLM models, the execution process of the forward-backward smoothing algorithm can be summarized as Algorithm 2.
As explained in Algorithm 2, it is clear that the filter distribution has been determined under any time t. Subsequently, we can deal with the prediction problems involved in the observed values or state values at any future time point t + k(k > 0) by a recursive prediction process. e calculation detail is as follows: (i) Prediction of state variables: (ii) Prediction of observation variable: 6 Discrete Dynamics in Nature and Society

Maximum Likelihood Estimation.
Maximum likelihood estimation (MLE) is one of the most important and widely used parameter estimation methods in statistics. e idea is to determine the unknown parameters in a system model by maximizing the probability of observed values. Under the condition that the state transition matrix G t and the measurement matrix F t are known from the system environment, the maximum likelihood estimation of noise terms can be derived. is section uses the simplified expressiony 1: T to describe the observation data y 1 , y 2 , . . . , y T , whose distribution depends on the state noise W and the observation noise V. Let the joint probability density L(V, W|y 1: T ) denote the likelihood function of noise items V and W relative to the observation sequence y 1: T . We have the following formal description: e derivation p(V, W|(y t |y 1:t− 1 )) in equation (18) is the prediction probability density function of the observation sequence y 1:t− 1 , which obeys the Gaussian distribution with mean f t and variance Q t obtained by the forward Kalman filter algorithm. To simplify the operation, we convert equation (18) to its log-likelihood form: e purpose of equation (19) is to find the noise terms V and W, which makes the occurrence probability of observation sequence y 1: T in the system the greatest. According to the principle of the MLE, parameters V and W can be estimated only by dealing with the optimization task of maximizing equation (19). Finally, we use the quasi-Newton method [49] to solve the parameters.

Information Popularity Prediction in Social Networks Based on Dynamic Linear Models
A growing number of information senders and receivers around the world have started to post their speeches, thoughts, and comments online on some new social media platforms such as Weibo, Facebook, Twitter, WeChat, and LinkedIn. ese attention-grabbing open media have been well accepted and acclaimed for their apparent advantages: faster transmission speed, being more timely, broader coverage, higher public, and being more interactive. ese types of social network platforms will play an increasingly Input: Initial state mean m 0 and initial state covariance matrix C 0 . Output: State mean m t and state covariance matrix C t at each time t. (1) for ∀t ∈ [1, 2, . . . , T] do (2) Prediction operations:  e graphical properties of the users and the relationships between users in a social platform are called social networks. e information popularity prediction in social networks is a typical time-series problem, the solutions of which can provide useful decision supports for avoiding negative propaganda, rumor, hotspot disposal, and so forth. e issue of time-series analysis in social networks has received considerable attention. However, a systematic understanding and process of how to analyze time-series problems in social networks is still lacking.
is section takes Weibo as an example to discuss how the DLM can be used as a standard analysis framework for the propagation and prediction issues of social networks with time-series characteristics.
e study first highlights some important statistics and evolution rules of the Weibo social network, including the number of daily posts, retweets, comments, and how they change with time. Second, we use the theory of the DLM, combined with social data and search data, to decompose the hotspot events' time series and predict their propagation effect on the Weibo network.
In this paper, the time series of information propagation in Weibo is regarded as the superposition of trending, periodic, and random terms.
(i) e trend term characterizes the objective evolutionary trend of Weibo hotspot events. (ii) e periodic term describes the regular changes due to cyclic factors such as the incubation, development, outbreak, mature, recession, and extinction periods of a network public opinion event. (iii) e random term involves a sudden change or noise disturbance. A sudden change refers to a change caused by unexpected circumstances such as the hot news retweet or emergent unexpected social events. e noise illustrates the influence of many random factors, such as different attitudes of users who release or forward messages, public opinion reversal caused by facts, and public opinion explosion. e DLM model can describe the parameters of the information propagation time series listed above. e proposed models are given in Section 4.3.

Challenges of Information Propagation Prediction.
A well-structured propagation prediction model benefits researchers or government departments to find the regularity, suddenness, or reversibility in the process of information propagation. Predicting the spread of a particular news event is helpful to guide the negative public opinion in advance.
ere are two challenges facing the research on the information propagation model and the prediction of the online social networks that are similar to Weibo: (1) Avoidance of low-quality data: Spam users (nonzombies) are flooded in every Weibo platform corner. Many users engaged in Taobao, microbusiness, and network shopping platforms continue to release advertisements, order display, and purchasing agency information on Weibo. Such jumbled information results in low-quality data. Despite the importance of statistical characteristics such as the number of posts, comments, retweets, and mentions of a hot event on the Weibo social network, there remains a paucity acceptable accuracy of information propagation prediction based on only the specific numbers. (2) Data association mining: e essence of prediction analysis is to find the law of the event occurrence from the mass data. e timestamp of a time-series system cannot be used directly for prediction. We still require discovering the underlying factors that influence the time-series state change.

Solution.
is study adopts a divide-and-conquer strategy in response to the issue of low-quality data. e propagation of a hot event on Weibo can be divided into stationary and nonstationary by calculating the event's timeseries volatility. First of all, we use DLM models established by using the historical time-series information to deal with stationary event popularity prediction. On top of that, some explanatory variables (hot event characteristic variables) are captured to enhance the DLM models. e improved models are applied to deal with the propagation prediction of nonstationary situations.
Association mining methods in [50] are introduced to find explanatory variables from the Weibo platform by executing information filtering, text classification, data normalization, and so forth. en, we calculate the correlation coefficient between the event propagation time series and the explanatory variable time series. e resulting highly correlated variables were chosen as the random terms of the time series in the process of event propagation. e complete framework for information popularity prediction of a hot event on the Weibo social network is shown in Figure 3.

Modeling of Weibo Hot Events Prediction Based on Dynamic Linear Models
Definition 1. A DLM is called a stationary event propagation prediction model (SEP 2 M) if it has the characteristic of local linear trend and seasonality and possesses the following state equation form: where Definition 1 provides a general model to deal with the popularity prediction problems with stationary event 8 Discrete Dynamics in Nature and Society behavior in social networks, where μ t and δ t are the level trend and the inclination trend at time t, respectively. m e t denotes the state mean estimate of a periodic item m t , which can be obtained by Algorithm 2. e period length is represented by the Greek symbol η. For example, if the model takes a week (seven days) as a cycle, we have η � 6. e information propagation of the seventh day is predicted using historical data from the previous six days. e tail terms ] t , ω μt , ω δt , and ω m e t of the equations are the measurement noise, level trend noise, inclination trend noise, and mean estimate noise, respectively. e model representation of SEP 2 M in matrix form serves as an effective way to perform programming and the subsequent data experiments. Hence, we convert the SEP 2 M described in Definition 1 into the following matrix form: (i) e observation equation: (ii) e state equation: where we have the two coefficient matrices: Definition 2. A DLM is called a nonstationary event propagation prediction model (N -SEP 2 M) if it has the characteristic of local linear trend, seasonality, regression, and possesses the following state equation form: where Definition 2 presents a model suitable for analyzing the propagation of nonstationary events in social networks, where μ t , δ t , and m e t are similarly the level trend, the inclination trend, and periodic item estimate at time t, respectively. Parameters ] t , ω μt , ω δt , and ω m e t are the noises related to the system, level trend, inclination trend, and periodic item mean calculation, respectively. X t represents explanatory variables. e explanatory variable coefficient is expressed as the symbol β T . e compound expression β T X t denotes a random term in a nonstationary event that suffers a sudden occurrence or reversal. We convert the model N -SEP 2 M into the following matrix form: (i) e observation equation: (ii) e state equation: where the details of coefficient matrices are as follows:

Experiments
is section discusses the datasets, stationary and nonstationary identification, and the model evolution analysis of three hot events that appeared on the Weibo social network.

Event Datasets and Propagation Network Setting.
To estimate the effectiveness of the proposed models for popularity prediction, the survey collected the datasets from the Chinese popular social platform Sina Weibo 2, including three Weibo hot public opinion events with the three content categories of social, entertainment, and international news from 2015 to 2017. A brief description of the three events is presented as follows: (i) Chengdu female driver incident (Chengdu-Driver).
On the afternoon of May 3, 2015, a beating incident occurred near the Jiaozi overpass on Chengdu ird Ring Road. Ms. Lu was forced to stop by Zhang at the Jiaozi interchange for changing lanes and was later beaten and injured. e incident was broadcast on Weibo. People were first shocked by the ferocity of Zhang. However, public opinion turned to blame the female driver for driving too unruly and dangerously when the event video came out. e propagation of the event has formed the typical public opinion reversal effect. e datasets "Chengdu female driver was beaten.csv," "Baihe Bai derailed.csv," and "THADD incident.csv" refer to the social event (Chengdu-Driver), an entertainment event (Baihe), and an international news event (THAAD), respectively. e original hot event data is prepared by crawling user information pages involved in the three event topics, including the post contents, user attributes, follower relations, the number of comments and thumb up, post time, and forwarding (propagation) links. is study implements the data crawling and the proposed prediction models by the programming language Python version 3.7 and runs the codes on a Linux Server with Intel(R) Xeon(R) CPU (E5-2620 v4) and GeForce TITAN X GPU (12 GB memory). e number of valid entries is 95%, 96%, and 98% of the total results returning from our crawler system, respectively. e details about the event datasets are shown in Table 1. e trends for the development of the three events' datasets are shown in Figure 4.
Our experiments will evaluate and predict the popularity of the three hot events on the Weibo social network. e best environment setting is to get the full structure of Weibo as a map of the spread. However, Weibo allows only a small part of follower relations to be returned by crawlers. Hence, the relations that come from traditional Weibo crawling methods are quite incomplete. On the other hand, the size of Weibo network is enormous. e empirical treatment is to extract a subnet with common characteristics from the original Weibo network. We repeatedly remove the nodes with the degrees less than 2 or more than 1000. en, the community discovery algorithm reported in [51] is performed to find the subnet that preserves the neighbors and relations of the event-related users. e subnet has the typical social network characteristics, including the approximate power-law degree distribution ( Figure 5) and the high aggregation coefficient.

Stationary Sequence Identification and Social Intensity.
Historically, the statistical concept "volatility" has been used to describe the degree of return fluctuation on the investment in financial markets. Using this concept to distinguish the stationary and nonstationary events is recognized as an essential experimental step. e experiment requires to be explicit about exactly how to obtain the "volatility" of a hot event on Weibo social network. e calculation process is defined as follows.
(1) Calculate the percentage return: (2) Calculate the volatility (standard deviation) of sequence y t : where the sequence X t represents the values of an explanatory variable at different time t, which is also introduced in equation (25) at Section 4.3. y is the average value of the sequence y t . e forwarding behavior of users on Weibo has been playing an increasingly important role in helping researchers get the active degree of event discussion. e forwarding time series of an event is the concrete expression of the law about event propagation. Hence, our experiments consider the forwarding number of an event on Weibo at time t as the explanatory variable X t . en, the sequence y t of percentage return and the subsequent volatility can be calculated. We follow the subsequent steps to identify the stationary/nonstationary state of an event: (1) e life cycle of an event is divided into time-series intervals in days. (2) Count the forwarding numbers from 8 : 00 to 24 : 00 at the interval of 1 hour within a day (obtaining sequence X t for each day). (3) e percentage return is calculated to obtain the sequence y t according to equation (27). (4) Calculate the day-to-day volatility, denoted by V d according to equation (28). (5) e possible value of volatility is divided into several (set as nine) intervals. e forwarding numbers of the event in different volatility intervals are calculated by simple numerical statistics. (6) Estimate the ratio of the forwarding number to the total number of retweets with different volatility intervals. (7) Calculate the sum of the forwarding ratios with volatility values greater than 1. If the parameter is more than 75%, the event is decided to be nonstationary; otherwise, it is stationary.
e threshold of 75% is used to distinguish the application scopes of the two models SEP 2 M and N SEP 2 M when we need to represent, analyze, and predict the process of event popularity evolution on the Weibo social network. e annual report of hot searches on Weibo 3 shows that about 90% of the hot events in Weibo are nonstationary. e volatility distribution of an example using the Chengdu-Driver dataset is shown in Figure 6. Figure 6 presents the statistical relationship between the forwarding numbers and the volatility intervals for the event Chengdu-Driver. e forwarding longitudinal series is a measure of the depth at which an event post is forwarded on the Weibo social network. Series 1-4 shown in Figures 6(a) and 6(b) illustrate that the statistical results of forwarding numbers contain a minimum of one (resp., three) forwarding action and a maximum of four forwarding longitudinal actions. We plot the data distribution of volatility intervals in a descending order. e right label list of proportions shows the ratio of retweets with different volatility intervals to the total forwarding number. e cumulative line (orange color line) on the second axis identifies the cumulative value from the fluctuating high to the low point Discrete Dynamics in Nature and Society as a percentage of the total retweets based on the forwarding numbers.
e red point shown in Figure 6 describes the parameter value to identify the stationary state of the event. As a result, the propagation of social event Chengdu-Driver on the Weibo network is stationary, since the expression 0.78 > 0.75 is true. If the experiment ignores the forwarding data of series 1 and 2, the decision point calculation value shown in Figure 6(b) is 0.88. A decision point larger than 0.78 indicates that the vertical propagation of depth causes the event's principal fluctuation. Similar analyses are applied in the events Baihe and THAAD. We can finally obtain the decision points 0.83 and 0.91, which are presented in Figures 7(a) and 7(b), respectively. What is interesting about the data in these three points (0.78, 0.83, and 0.91) is that the international news event THAAD had the most significant fluctuations in the three Weibo events. e nonstationary state implies a large amount of discussion in a short period.

Correlation Analysis of Explanatory Variables.
Having explained what is meant by explanatory variables, we will now discuss how to find the best explanatory variable. A simple linear regression model is used to establish the quantitative relationships between event data and explanatory variables to determine their correlation scores. is study selects the best explanatory variable by comparing the Pearson Correlation Coefficients between all variables and the forwarding numbers at time series.
We refer to the percentage of forwarding numbers, comment numbers, and thumb-up numbers of an event at time t as the social intensity, comment intensity, and thumbup intensity, respectively. e main temporal explanatory variables involved in the analysis of the three experimental datasets included the following: comment number (CN), thumb-up number (TUN), social intensity (SI), comment intensity (CI), thumb-up intensity (TUI), and average comment length (AveL). All the above explanatory variables are time-dependent. Table 2 shows the correlation scores between the six explanatory variables and the forwarding numbers with respect to the three events. e experiment filtered out the explanatory variables with low correlation and temporal fluctuation. We determine the best one, that is, social intensity, for the three datasets according to the correlation coefficient comparison.
In general, the higher the correlation between the explanatory variable sequence and the target sequence, the better the prediction effect. Figure 8 shows the quantitative relationship between the event popularity evolution and the social intensity of the event Chengdu-Driver, and the dataset has been normalized.   Discrete Dynamics in Nature and Society 13 indexes adopted include the following: the determination coefficient R 2 , the root mean square error RMSE, and the accuracy of the predicted absolute error between 20% and 50% (namely, Precision@20 and Precision@50). e results of the comparison of the evaluation indicators of prediction models are shown in Table 3. e results of this study indicate that when the popularity prediction of a Weibo hot event only depends on the time information, the error between the prediction result and the real event propagation is usually large. By comparing three hot Weibo events, we can see from Figures 9, 10, and 11 that using highly correlated explanatory variables as the proposed model parameters can improve the accuracy and reliability of event popularity prediction.

Experimental Results and Case
We illustrate the experimental results by using event Chengdu-Driver as an example. e blue trend line recorded in Figure 9        16 Discrete Dynamics in Nature and Society values for propagating the event Chengdu-Driver which almost coincide with the real trend lines were recorded. A comparison of the two results exposed by the green and orange trend lines in Figure 9(a) reveals that the prediction accuracy of the NSEPP model with explanatory variables is significantly higher than that of models without considering explanatory variables. Furthermore, it can also be seen from the cumulative relative error recorded in Figure 9(b) that the prediction results are more efficient when the explanatory variables are introduced into the N -SEP 2 M model for the nonstationary event popularity evolution with burst traffic. e entertainment event Baihe ( Figure 10) and the news event THAAD (Figure 11) showed similar accuracy rate advantage of data fitting in the prediction results with explanatory variables. Comparing the model prediction results of fitting with SI and the original fitting values, the model presented in this study has obvious expected returns. In contrast to the prediction result of event Chengdu-Driver, however, a slightly larger error of the events Baihe and THAAD of the fitting value with SI was detected. e green trend line is not entirely close to the observation line in the prediction charts. e difference is due to the underlying quality of the three datasets. Together, these results provide important good insights for hot event propagations into the significance of combining dynamic linear models and highly correlated explanatory variables. e common feature of the three Weibo hotspot events is that they all have burst traffic characteristics. In order to verify the adaptability of these events to the parameters defined by the prediction models, the event THAAD with the highest volatility was selected to decompose the timeseries quantification values. Figure 11(a) represents the observation, fitting, and prediction of THAAD with burst traffic. Figures 12(a), 12(b), and 12(c) represent the trend μ t , periodicity m e t , and regression results y t of the burst flow event, respectively, reflecting the characteristics of its nonstationary time series. e decomposition results further show that the proposed dynamic linear model N -SEP 2 M (Def. ??) can accurately deal with the prediction task of hot event propagation on the Weibo social network.

Conclusion
is research is undertaken to design a time-series analysis framework and provide a generic solution for the evolutionary prediction of information popularity dynamics in social networks. is study also aims to alleviate the contradiction between stationary and nonstationary event propagation in a standard prediction model. e fact that all hot events in a social network are analyzed in a class of models without considering their volatility may not necessarily serve as an effective way to increase the prediction accuracy. Hence, the framework's investigation divides event propagation into two kinds (stationary and nonstationary) according to event volatility. is study defines two specific DLM models, namely, SEP 2 M and N -SEP 2 M, to analyze stationary and nonstationary propagation, respectively. is paper mainly reveals the superiority of using the N -SEP 2 M model to predict nonstationary popularity evolution. N -SEP 2 M is innovatively constructed by introducing the nonstationary feature parameters of local linear trend, seasonality, and regression. ree different parameter estimation methods, Kalman filter, Kalman smoothing, and maximum likelihood, are used to simplify the parameter adjustment process.
Experiments on three popular hot events that appeared on the Weibo social network with different topic categories confirm the effectiveness and superiority of the N -SEP 2 M model through the comparison of evaluation indexes R 2 , RMSE, Precision@20, and Precision@50. e experimental propagation trend lines also exhibit that the prediction accuracy of event popularity evolution can be significantly improved by introducing explanatory variables into the N -SEP 2 M model. e insights of model construction gained from this study may be of benefit in the complex hot event evolution analysis in social networks with time-series characteristics.

Data Availability
e data used to support the findings of this study are publicly available at https://github.com/YangMin-10/ Datasets.

Conflicts of Interest
e authors declare that they have no conflicts of Interest.