The Spread of Information in Virtual Communities

. With the growth of online commerce, companies have created virtual communities (VCs) where users can create posts and reply to posts about the company’s products. VCs can be represented as networks, with users as nodes and relationships between users as edges. Information propagates through edges. In VC studies, it is important to know how the number of topics concerning the product grows over time and what network features make a user more inﬂuential than others in the information-spreading process. The existing literature has not provided a quantitative method with which to determine key points during the topic emergence process. Also, few researchers have considered the link between multilayer physical features and the nodes’ spreading inﬂuence. In this paper, we present two new ideas to enrich network theory as applied to VCs: a novel application of an adjusted coeﬃcient of determination to topic growth and an adjustment to the Jaccard coeﬃcient to measure the connection between two users. A two-layer network model was ﬁrst used to study the spread of topics through a VC. A random forest method was then applied to rank various factors that might determine an individual user’s importance in topic spreading through a VC. Our research provides insightful ways for enterprises to mine information from VCs.


Introduction
Virtual communities (VCs) provide an interactive experience that, if positive, may instil customer loyalty [1]. ey enable consumers to learn the functions of products and follow up conveniently with buying online, as well as provide a channel for receiving customer feedback, which plays an important role in product innovation [2,3]. Mining information provided by consumers in VCs enables companies to adjust the next generation of their products to improve customer satisfaction [4].
Complex network theory has been a major tool in the study of the physical structure and dynamic processes of social, biological, and technological networks [5]. In analyses of information spreading, users are represented as nodes [6]; these nodes may reside in multiple possible states, depending on whether they have learned information and whether they can transmit it to a neighbour [7][8][9]. Among real-world VCs, social networks such as Weibo, WeChat, Twitter, and Facebook have different physical structures, leading to different patterns of topic transmission.
Consumer VCs are different from traditional social networks in that they centre around products. ey provide a real-time look into customers' experiences with a product from the date of release. Users in VCs may post their feedback for other users to view, which, in the best-case scenario, may encourage loyalty among existing consumers while encouraging new consumers to buy in.
us, understanding consumer VCs can provide insight into public opinion trends, helping the company maintain existing markets or develop new markets [10].
In consumer VCs, information is transmitted via posts, which are about particular topics [11]. Networks change over time; thus, time dimensions, i.e., temporal networks, have been incorporated into network analyses [12]. For consumer VCs, the identification of key time points during topic emergence can provide an angle for addressing the consulting service appropriately. Additionally, there are multiple ways for users in VCs to interact with one another, of which "replying" and "searching for interest" are common ways to get information. Mining the influential users and specifying the important network features are also critical for improving the efficiency of multilayer network information transformation [13].
us, this paper tries to answer the following questions: (i) Following the introduction of a new product, how does the number of topics concerning the product grow over time? (ii) How do topics spread through a VC? What features characterise the influence of individual users during the information-spreading process?
Here, we focused specifically on the Huawei P10/P10 Plus for our case study. Two new concepts are introduced. First, the coefficient of determination was applied to the growth of topics after a new product was introduced to yield the node sequential emergence coefficient of determination (NSECD), which was used to identify the moment when most of the growth had finished. Second, the Jaccard coefficient was adjusted to gain a new measure of similarity between two users in a VC. Subsequently, a two-layered network model, representing two ways of spreading information, was introduced to study the spread of a topic through a VC and identify the most influential users. e random forest method was then applied to rank the importance of various factors affecting a user's impact on spreading topics through a VC. Finally, some suggestions are provided for future research. e remainder of this paper is organised as follows. Section 2 provides a literature review on existing methods. Section 3 describes the Huawei P10/P10 Plus dataset and its preprocessing. Section 4 introduces the NSECD statistic for studying the growth of new topics after the introduction of a new product. Section 5 proposes the adjusted Jaccard coefficient and the two-layered network model. Later in the section, simulations carried out to identify key users in the network are described, with the random forest method used to find features important to users' information-spreading performance. Section 6 recaps suggestions for enterprise management of VCs and proposes directions for further research.

Literature Review
Online social networks exert a major influence on life today [14]. Sange et al. found that online social networks provide a platform for spreading both objective facts and fake news [15]. Park et al. noted the rapid growth of mobile devices such as smartphones and examined rapid information propagation in mobile social networks [16]. Up to now, the spreading of information has been investigated intensively in interdisciplinary fields [17].
Research into information propagation in VCs has two main foci: influence factor analysis and propagation path analysis. Influence factor analysis attempts to identify which factors make a node influential in a network; such factors may include gender, age, beliefs, etc. On the other hand, propagation path analysis studies the way in which information is transmitted through the network by, for instance, assigning weights to edges and setting transmission probabilities based on these weights.
We first provide examples of influence factor analysis. Li et al. developed a multinomial naive Bayes classifier, categorised microblog posts based on content, and found that the information type has a significant influence on propagation patterns in terms of scale and topological features [18]. Zeng and Zhu proposed an emotional model for information propagation based on the emotional states of network users [19]. Hsu developed an integrated conceptual model and explored the effects of brand-evangelism-related behavioural decisions of enterprises on VC members [20]. Wan et al. used a least squares support vector machine to study consumer electronics supply chains [21]. However, all of the above studies were mainly analyses of the influencing factors but did not consider the differences among VCs.
With regard to propagation form, Huo and Cheng established a modified ignorant-wiseman-spreader-stifler model to analyse the spread of rumours through a network [22]. Xu et al. proposed a new iterative algorithm called SpectralRank, which assumes that a node's propagation capability is proportional to the number of neighbouring nodes after adding a ground node to the network [23]. Shao et al. introduced the NL centrality algorithm to identify influential nodes in a network; the algorithm considers both the semilocal structure of a node and its topological position. [24]. Wang et al. proposed a method based on an integral k-shell to identify the influential nodes in a command-andcontrol network [25]. Escalante and Odehnal proposed a deterministic SIRS-type model for rumour propagation and applied it in simulations with two types of rumours: an original rumour, followed by a second counteracting rumour based on a complex network [26]. Li et al. presented the Potential Concentration Label method to help locate multiple sources of contagion under a susceptible-infectedrecovered model [27]. Zhang et al. introduced a susceptibleinfected-true-removed model of rumour spreading to account for members in a network who know or can discern the truth [28]. Xiong et al. introduced the location concept to a local social network model [29] and further extended it to the recommendation system via information spreading in a local-based social network [30,31]. In their recent research, Xiong et al. combined the location and temporal effects of a social network, proposing constructive advice on dynamic management [32]. Zhang et al. studied networks that can be subdivided into smaller groups called communities and proposed a node-ranking algorithm called AI Rank using two factors: attractive power (which measures the number of followers a node has compared to its neighbours) and initiating power (which accounts for the communities that a node's neighbours belong to) [33].
Although these studies considered node importance ranking together with the topology of the network, they generally treated networks as single-layered. is does not account for the fact that there may be more than one way for information to propagate through a network. Hence, a multilayer network model may be more appropriate. We propose a two-layered model in this paper.

Data Preprocessing and Description
e Pollen Club is the official VC for Huawei's products, including smartphones, laptops, and other electronic devices. Each user is assigned a unique identifier with which they can express their opinions about products freely, look through other users' posts, and reply to posts [34].
For our initial dataset, we selected 2000 web pages about the Huawei P10/P10 Plus, containing 2,392,035 posts about the product. After removing duplicate and spam posts, we retained 57,560 original posts and 826,328 reply posts by 129,362 users as our dataset. e data were acquired directly from club.huawei.com to avoid interview effects [35].
Next, core topics were extracted from the posts. In a previous study [36], 100 topics about phones were selected initially (see Table 1). After sorting out those with higher frequency, the remaining topics were grouped into three categories: system, software, and hardware, according to their features, as shown in Table 2. e remainder of this paper focuses on 61 of these topics. ese topics did not all appear on the first day of our dataset, but emerged over the course of the study as users bought and used the product and formulated questions about the product. In the next section, we will discuss the emergence of topics.

Dynamic Analysis of Topic Emergence
According to the classic product lifecycle, a new product goes through three stages: emergence, growth, and maturation. Emergence refers to the period before the product's launch [37]. Growth refers to a period of high consumer activity after the product launch, when consumers who had been eagerly awaiting the launch are ready to buy the product. Maturity refers to a period of low consumer activity afterwards, when consumers may continue to buy the product but do so at a slower pace because enthusiastic consumers have already done so.
For VCs, the transition point from growth to maturation is of interest to the company, because it signals the point after which fewer resources should be needed to monitor and respond to VC posts concerning the product. e purpose of this section is to introduce a statistic that will identify such a transition point.
For this purpose, we used the growth in the cumulative number of topics from the 61 topics up to a given date to measure the VC's interest in the product. We could, for instance, select the date at which the cumulative number of topics reached 90% or 95% of 61. However, any such choice would involve some uncertainty regarding which threshold to use. Instead, we propose a transition point that avoids such an arbitrary threshold choice.
Based on the adjusted coefficient of determination in statistics [38], we define the NSECD as follows: where r t is the NSECD value on the t th day, n t is the cumulative number of new topics on the t th day, n is the total number of topics, and T is the total number of days.
As t increases, T/(T − t + 1) increases while 1 − (n t /n + 1) decreases. We expect that in a typical dataset, strong growth in new topics early in the study period will lead r t to increase initially while faster growth in the factor T/(T − t + 1) together with a saturation of new topics will lead to r t decreasing later on. Our key moment will be when this function reaches its maximum; i.e., t * � arg max t r t .
(2) Figure 1 shows the NSECD on each day in our dataset. In this dataset, it was calculated that t * � 33. Accordingly, r t * � 0.8457.
To summarise the lifecycle of topic growth in our case study, the period from the start of the study to the product launch date (Day 9) was considered the "emergence" stage. Twelve new topics were added during this stage, reflecting early interest in the Huawei P10/P10 Plus. e period from the product launch date to our key moment (Day 33) was the "growth" stage. e period from our key moment to the end of the study was considered the "maturity" stage.
In Figure 2, 12 new topics were added during the "emergence" stage, all on Day 9, the final day of the stage.
is reflected early interest in the Huawei P10/P10 Plus. e "growth" stage continued the trend of new topics from the last day of the emergence stage. New topics appeared rapidly early in this stage, as users shared their opinions towards the product from different aspects, but the rate at which new topics appeared slowed towards the end. By the end of this stage, 58 topics had emerged, 95.08% of the total number of topics considered. e "maturity" stage witnessed even slower growth in new topics compared to the earlier stages, as most topics had already appeared earlier.

Network Modelling
In this section, we first introduce a two-layer network model and then perform information-propagation simulations to determine which users are the most effective at spreading information in a VC.

Structure of the Network.
Here, we establish a twolayered network model for VCs. Each user is represented by a node that occurs on both layers. e two layers correspond to two ways in which a user in a VC may interact with another user: by replying to their posts or by searching for their posts. e first layer is the "flow of information by replies (FIR)" network (denoted as FIR(V 1 , E 1 , W 1 )). Given two users U i and U j , let w 1 ij be the number of times U j replied to a post by U i within the dataset. If w 1 ij > 0, then an arrow a 1 ij from i to j is drawn. e FIR network consists of V 1 , the set of all users, W 1 , the set of all w 1 ij weights, and E 1 , the set of all a 1 ij arrows. e second layer is called the "flow of information by interest (FII)" network (denoted as FII(V 2 , E 2 , W 2 )). It is inspired by the idea that two users are likely to search for each other's posts, only when they share the same interests.  Given two users U i and U j , we can construct a measure w 2 ij representing the commonality of interests and draw an edge a 2 ij between them when w 2 ij is above some preset threshold ε. e FII network consists of V 2 , the set of all users, W 2 , the set of all w 2 ij weights, and E 2 , the set of all a 1 ij arrows. It remains to define each w 2 ij weight.

Complexity
Let N be the total number of topics and let the topics have a fixed order. en, let F i are zero. As such, a Jaccard coefficient is taken into consideration. Let , the set of all topics for which U i has at least one main post. By definition, the Jaccard coefficient of can be calculated as follows: where #(R i ∩ R j ) and #(R i ∪ R j ) denote the number of elements in the intersection and union of topics mentioned by users U i and U j , respectively. e disadvantage of the Jaccard coefficient is that it does not distinguish a pair of casual users who post about only one topic, which happens to be the same, and a pair of enthusiastic users who post about many shared topics. For example, consider the following two situations: Situation 1: users U i and U j post only on "system" and "updates." Situation 2: users U i and U j post only on "system." Based on equation (3), the Jaccard coefficient assigns a weight of 1 to both situations. However, this may not be appropriate, because the users in Situation 1 may be more active in the VC and, hence, more likely to exchange information by searching.
us, we propose the following adjusted Jaccard coefficient: In our case study, N � 61. Based on equation (3), ρ a ij � 0.0328 in Situation 1 and ρ a ij � 0.0164 in Situation 2. Like the Jaccard coefficient, the adjusted Jaccard coefficient is always between 0 and 1, and is 0 when both users share no common topics. However, unlike the classical Jaccard coefficient, it is 1 only when both users share all topics.
In Figure 3, we plot the distribution of values of ρ a ij over all pairs of users in our case study. Figures 3(a) and 3(b) show that the overwhelming majority of pairs of users were associated with a small value of ρ a ij . A small ρ a ij means that there is little possibility for information spreading. In our  analysis, we used a threshold value of ε � 0.2. at is, an edge a 2 ij was drawn between User U i and User U j only when ρ a ij > 0.2. is made the FII network a sparse network. Notably, the difference in the range longitudinal axis between Figures 3(a) and 3(b) is caused by the difference in the bar intervals.
Gephi software was used to draw illustrations of both layers for our case study. e results are shown in Figure 4.
To illustrate how our two-layer network model works, consider the following simple example with seven users, as shown in Figure 5.
Here, a-g represent the seven users. Users may send or receive information from the FIR or FII networks simultaneously. For example, User a can receive information from User b or c in the FIR network, or from User b or d in the FII network. User a can send information to User b or d in the FII network. Information can spread through both channels simultaneously. e larger the weight of an edge, the more likely information is to flow through it at any given step in both layers.

Information Propagation and User
Importance. Next, the two-layered network model will be expanded with information transmission mechanisms, to simulate the flow of information through the VC. Suppose we are interested in the propagation of a specific piece of information through the network.
e information-propagation model will be based on a simple two-state framework; that is, at any given time, a node is in one of two possible states: (1) A susceptible state, representing the states of users who do not receive the information (2) An infected state, indicating the states of users who receive the information Time is treated as discrete. At each time point t, infected nodes have a probability of transmitting the information to their susceptible out-neighbours. e spread of the information is thus a random process. e mathematical model of this process is described below.
For convenience, susceptible and infected states are denoted by α and β, respectively. We denote the FIR layer and the FII layer as l � 1 and l � 2, respectively. e key notations are listed as follows: (1) For User i(i � 1, 2, 3, . . . , n), each day t − 1 except for the last step and each layer l p l, t− 1 i is a draw from U(0, 1). If User i is infected at t − 1, the value of this draw will decide which of the i's out-neighbours on the layer l become infected at time t. More probable infections are always prioritised over less probable infections. A low value of p l, t− 1 i means that the node will be easily infected, whereas a high value means that it will be difficult to infect. e values will depend only on the network structure from the previous section.
Note that out-neighbours and in-neighbours of node i are different only when the layer's edges are directed, i.e., the FIR network in this case. If the layer's edges are not directed, out-neighbours and in-neighbours are the same and are simply called neighbours.
At the starting time t � 1, a single node i 0 is infected while all others are susceptible; i.e., Inductively, assume we know H t− 1 i for all i. en, the node states of the next day are given as where r 1,t i and r 2,t i can be determined by  and where values of p l ji are calculated as follows: is means that infected nodes stay infected, whereas noninfected nodes become infected if, on some layers, one of its infected in-neighbours has a strong enough connection to overcome the given node's threshold at that time and layer. e information-spreading process begins with a single node i 0 infected, while all other nodes are susceptible. Any node can be used as the starting node. e process ends when either no infected node has any noninfected out-neighbours, or t has reached some specified time limit. In our simulations, we used a time limit of t � 10. Note that because the spreading process depends on random draws (p l, t− 1 i ), the process itself will be random.
A standard way to measure node i 0 's importance in a network is by the extent of the infection that begins at i 0 [39]. To this end, we define the information-spreading rate of a node i 0 as C(i 0 ) � n i 0 /n, where n i 0 is the number of infected nodes at the end of the spreading process and n is the total number of nodes. Because the spread of information depends on random variables (p l, t− 1 j ), C(i) will itself be a random variable. Its expectation is too complicated to calculate explicitly; however, it can be estimated by repeated sampling as follows: where N 0 is the number of trials used, i.e., the number of simulations with i as the starting node, and C k (i) is the value obtained in the k th trial. Due to the large size of our dataset and the resulting high computation time, N 0 was set as 30.
Users were then ranked according to their mean information-spreading rate across these 30 trials. e simulation procedure implemented in MATLAB (MathWorks, Natick, MA, USA) is as follows. Let i 0 be the node where the information starts. We track the spread of information throughout the network through the set of infected nodes, denoted as I, and the set of noninfected nodes, denoted as S: Step 1: begin with the network nodes V � 1, 2, . . . , n { } and edges E � a 1 ij , a 2 ij with weights W � w 1 ij , w 2 ij and starting node i 0 .
Step 3b: for i in zI, (1) draw p 1, t i and p 2, t i from U(0, 1), and (2) for each jεN(i) ∩ S, calculate p l ij as above. If p l ij > p l, t i , insert j into I and remove j from S.
An example of a two-layered network and an information-spreading history with node a as the starting node is provided in Figure 6.
Notice that node a cannot spread information to node b in the FIR layer but may do so via the FII layer. From there, the information can spread from node b to c via the FIR layer. For a different spreading path, notice that a cannot spread to d in the FIR layer but may do so through the FII layer. From there, the information can spread to e through the FIR layer. e algorithm was run on this simple example once for each of the nodes as the starting node. Table 3 shows the average spreading rate with examples in Figure 5 across 30 trials.
In their corresponding simulations Nodes a, b, c, and d had the highest mean spreading rates, with a possibility of spreading information to all nodes, except for the isolated node g. Of these four, node b performed best in terms of average spreading rate. Also, Nodes c and d were more efficient by 1% compared to node a, although node a had more links than those two. is suggests that c and d have special positions in this network, which matches the graph. e algorithm was then applied to the real Pollen Club dataset described in Section 3. e results for the top 20 of 129,362 users, as ranked by mean spreading rate across 30 trials in MATLAB, are summarised in Table 4. e standard deviations were not too large for only 30 trials and could likely be shrunk further by running more trials as needed.
In Table 4, IUG and OUG stand for "intermediary user group" and "ordinary user group," respectively [36]. e OUG refers to customers who bought Huawei products and registered for the Pollen Club. e IUG refers to customers who received official training from Huawei and are willing to answer questions from other customers.
Specially, one of the IUG members is on the top of the list, indicating that the Pollen Club is organised by the customers themselves, saving the company and the effort of doing so. e number next to each OUG member is the user level. ere are 12 levels in total, with higher levels indicating greater user experience. Users can advance their levels by joining activities in the Pollen Club. As can be seen, except for User 4948, the OUG members in our top 20 generally have high user levels. is shows that our method of ranking a user's spreading rate aligns well with Huawei's own method of ranking users.
To prove our model's effectiveness, comparisons were conducted using the probability 1 transformation model shown in Appendix. e results using our model were more consistent with trends observed in the real data.

Feature Selection of the Spreading Process.
In this section, we investigate the relationships between spreading rate and network features that can be computed simply, without the 8 Complexity need for repeated simulations. Twenty-two network features [40] (denoted as x 1 through x 22 , respectively) of the twolayer model for a node i are considered. Ten are for the FII layer, and 12 are associated with the FIR layer. e 22 features are listed in Table 5.
Each feature x 1 through x 22 is normalised to have a mean of 0 and a variance of 1. We then ask which of these features can predict the spreading rate. is provides insight into which features may cause a node to spread information more efficiently through the networks. We ran a random forest algorithm [41] by Breiman with the code referenced therein using scikit-learn [42]. Random forest is an ensemble method used to model regression in a nonlinear way.
Recall that the random forest algorithm consists of randomly selecting a subset of the users and a subset of the network features, forming a tree by selecting at each node the network feature and a boundary value for that feature.
is splits the users into two branches at each such node, and continuing until in all branches, the number of users is    x 2 e sum of the weights of all edges containing node i Eccentricity x 3 e maximum distance from node i to any other node Closeness centrality x 4 e reciprocal of the sum of the shortest distances from node i to all other nodes Harmonic closeness centrality x 5 e number of pairs of distinct nodes j; k both different from node i for which the shortest path between j and k passes through node i Betweenness centrality x 6 e number of pairs of distinct nodes j; k both different from node i for which the shortest path between j and k passes through node i Hub score x 7 Let A be the adjacency matrix of the network. e hub score of node i is the i th component of the eigenvector corresponding to the maximum eigenvalue of AτA Authority score x 8 It equals to the weighted sum of its neighbours' hub score. at is, It is given by the proportion of links between the vertices within node i's neighbourhood divided by the number of links that could possibly exist between them Eigen centrality1 x 10 e same as hub score except for we use the matrix adjacency matrix a instead of AτA here FRI Indegree x 11 e number of number of edges ending at node i Outdegree x 12 e number of number of edges starting at node i Weighted indegree x 13 e sum of the weight of all edges ending at node i Weighted outdegree x 14 e number of number of edges ending at node i Eccentricity x 15 Same as x 3 Closeness centrality x 16 Same as x 4 Harmonic closeness centrality x 17 Same as x 5 Betweenness centrality x 18 Same as x 6 Hub score x 19 Same as x 7 Authority score x 20 Same as x 8 Page rank coefficient x 21 An algorithm introduced by Google to rank web pages. It does so by estimating the probability that a person randomly clicking on links will arrive at the given page Local clustering coefficient x 22 It is given by the proportion of links between the vertices within node's neighbourhood divided by the number of links that could possibly exist between them at most some threshold value. Given a new vector of values for the features, each tree is used to predict a value for the spreading rate. e forest as a whole then makes a prediction by averaging the values of each of the individual trees' predictions. e key parameters in this process are follows: (i) ntree: the number of trees used. Increasing this value should decrease variance without leading to overfitting [43]. As suggested by reference [44], the number of trees was set to 500.
(ii) m ′ : the number of features selected in each tree. We followed the suggestion in [44] to take m ′ � log 2 m ≈ 4, where m is the total number of features.
(iii) n ′ : the number of users to sample in each tree. We used all the users, i.e., n ′ � n � 129362. (iv) leaf ′ : the maximum leaf size, which controls when the construction of each tree halts. We used leaf ′ � 1, i.e., we continued splitting until there was only one user in each branch.
Recall that after running the random forest algorithm on a training set, it outputs a regression function. Given a new point x � (x 1 , . . . , x 22 ), the regression function outputs a predicted adjusted spreading rate f(x), which can then be compared with the spreading rates from the simulations. We performed five-fold cross-validation on the Pollen Club dataset together with the spreading rates. Table 6 shows the resulting R 2 values when these regression functions were tested against the training set and the testing set. e random forest algorithm can also be used to rank the importance of network features. Each tree ranks its m ′ selected features according to the decrease in variance at the corresponding node. at is, the variance in spreading rates at the parent node is compared to the sum of variances at the two child nodes; the greater the decrease, the more important that feature according to that tree. During the same process, the decreasing variance can be calculated for each tree. Finally, the decreasing variance for each of the m features is summed as the corresponding feature importance.   7  14  14  14  14  14  14  14  14  14  14  8  18  18  18  18  18  18  18  18  18  18  9  20  20  20  20  20  20  20  20  20  20  10  13  13  13  13  10  10  13  13  13  13  11  11  11  11  11  5  7  11  11  11  10  12  10  21  21  10  13  8  21  21  21  11  13  7  7  10  21  7  11  10  10  10  7  14  21  5  8  7  11  5  9  7  7  1  15  8  10  5  8  8  13  7  5  5  9  16  1  4  7  5  21  4  5  8  8  2  17 2 8 4 4 to a p value of 3.8064 × 10 − 11 , indicating very high confidence in a genuine agreement in the rankings.
Lastly, we obtained a single overall ranking based on the whole dataset. e results are shown in Figure 7. e top nine features all belong to the FIR layer. e most important features are closeness centrality (x 16 ) and harmonic closeness centrality (x 17 ) in the FIR layer. In the FII layer, the most important feature is eigencentrality; it indicates that the user in the central position in the FIR network can affect the information spreading. Also, the quality of the neighbours in the FII network plays an important role in transformation.

Conclusions
In this paper, we proposed the NSECD to identify the key moment in topic growth in a VC after a new product is introduced. A two-layer model was developed for assessing information propagation in a VC, where information can flow among users either by replies to posts (the FIR layer) or searching for topics of common interest (the FII layer). We applied this model to our case study, which focused on the P10/P10 Plus device in Huawei's Pollen Club, to identify which users were most effective at spreading information through the network. Lastly, we compared these results with commonly used network features using a random forest algorithm and found that spreading effectiveness correlated best with closeness centrality and harmonic closeness centrality in the FIR layer and eigencentrality in the FII layer.
We have two suggestions for how our model may be improved in future research. First, the infectiousness formula in the FIR layer can be modified to consider not just the quantity of postreplies but also their quality. For instance, natural language analysis [46] could be used to score the quality of posts. Second, the network model can be extended to have more than two layers. For example, many VCs enable users to follow other users; thus, a third layer could be used to capture these follower relations.
In conclusion, this research introduces new concepts for network theory and provides suggestions for how companies could manage their VCs.

Appendix
Comparisons with a probability 1 transformation model: the probability 1 transformation model is defined as follows. e information-propagation model will also be based on a simple two-state framework; that is, at any given time, a node is in one of two possible states. In contrast to the model described in Section 5.2, if node i ′ s state is infected at time t, the states of neighbours (i.e. N(i) � ∪ l N l (i)) in all layers of node i will turn into infected states. is means that once infected, it will transform the information to all neighbours with probability 1 (Figure 8).
To illustrate this clearly, the same example used in Figure 5 in Section 5.2 is demonstrated. If node a is chosen as the initial source of the information, the information will spread to all other nodes in both layers, except for the isolated node, in four steps. Because the probability 1 model does not involve uncertainty, it can be implemented with one time step. To coordinate with our model as described in the main text, the maximum spreading time was set as 10.Where the meaning of the headline of the table is the same as that of Table 4. e results shown in Table 8 suggest that the lower-grade OUG members are more influential, which contradicts the meaning of the grade.
is is due to the probability 1 model, which considers only the degree of the nodes and ignores the weight preference and uncertainty. Additionally, our model was more effective in predicting how information spreads among the nodes, compared with the probability 1 model.