Recommending Mobile Microblog Users via a Tensor Factorization Based on User Cluster Approach

1College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China 2Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, Fuzhou, China 3Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, China 4College of Electronics and Information Science, Fujian Jiangxia University, Fuzhou, China


Introduction
Microblogging services, such as Twitter or Weibo, have been one of the most popular platforms for individuals to exchange information by posting messages or comments in up to 140 characters.With the rapid growth of mobile devices, microblog has created mobile applications to provide their users instant and real-time access from anywhere they can access to the Internet.For example, as of September 2017, the number of monthly active users in Sina Weibo platform is more than 376 million, in which about 92% users are authenticated through mobile phone and/or tablet.A large amount of valuable content exists in the microblog generated data.However, as a result of the rapid increasing population on microblog platform, most users are confronted with the serious problem of information overload [1].It is extremely difficult to find desirable information using mobile devices.In this situation, recommending relevant users for alleviating the flooding of information appears to be very significant for the users [2].
User influence can provide valuable clue about her preference and thus is indispensable for recommending microblog users in mobile social network [3].Consequently, incorporating user influence into recommender systems has demonstrated to improve recommendation performance and receives a lot of attention.Li et al. [2] considered social influences and their indirect structural relationships and proposed a Topic-level Social Influence-based mircoblog recommendation model to make user prediction.Jiang et al. [4] proved that users' decisions on information adoption can be affected by individual preference and interpersonal influence and then integrated these two factors to construct a scalable algorithm for online behavior prediction.Chen et al. [5] took advantage of tweet content, user social relations, and explicit features and then proposed a collaborative ranking model for tweet recommendation task.Yan et al. [6] presented a graph-theoretic method to rank tweets and their authors simultaneously by utilizing several networks, i.e., user network, tweet network, and the network that ties the two together.Therefore, it is significant to analyze user 2 Wireless Communications and Mobile Computing influence in mobile social network and integrate it into the recommendation framework.
There exist several pioneer studies on user influence analysis in mobile microblog platform.Velissarios et al. [7] proposed four different metrics for emphasizing Twitter content features and the behavior of each user's followers and then identified influential users through the comprehensive metrics considering user's affiliation and her interest rate.Mao et al. [8] introduced a learning-based method for analyzing and measuring users' social influence via predicting users' capability of propagating information.Both information extracted from social network structures and user behavior factors were combined in the method to gain a better performance.Xia et al. [9] explained the propagation mechanism of influence in terms of the diffusion of users' emotion.David et al. [10] analyzed the probabilities of one user activated by another user.Then, they combine that user's other features to obtain influence score.Cai et al. [11] proposed an OOLAM model to measure user opinion influence, they separated users interaction graph into two parts, positive graph and negative graph.They ranked users with a PageRank analogous algorithm.These methods reviewed above explored user influence from the perspective of users, which had low accuracy in specific topics.
Recently, various studies are involved in investigating topic level user influence.Those studies showed that most information was created and diffused in terms of topic.User influence can be measured more elaborately from the point of the topic.Therefore, topic level user influence analysis has received increasing attention from researchers.Weng et al. [12] proposed the TwitterRank to calculate user influence score according to the graph structure and topic similarity.Cui et al. [13] introduced item level influence using probabilistic hybrid factor matrix factorization.Chen et al. [14] proposed the MIRC algorithm which can distinguish users in different groups.Their experimental results showed that different influence roles may have stronger influence in their own role level.Wang et al. [15] calculated user influence with four features, i.e., Expert, Leader, Social, and Similar, and then applied user influence to group recommendations.Wei et al. [16] took users' opinion and topic relevance into consideration, and then predicted user influence according to the latent factors resulting from the tensor factorization.
However, most studies for user influence analysis on topic level only consider users' explicit features which can be obtained from users' profile directly [14,15].In particular, these existing works neglect the temporal characteristic which can be obtained from the interactions [17].In addition, the tensor factorization algorithm of user influence analysis tends to give low propagation ability users a high ranking score, since it reduces dimensionality by retaining the critical factors.In this paper, a Tensor Factorization based on User Cluster (TFUC) model [18] is proposed for recommending users according to a specific topic.The TFUC model firstly clusters influential users into a certain groups according to their temporal characteristic.Then, we measure users' influence scores by the temporal restrained CP decomposition on the influential clusters.Finally, both user's influence and content similarity are integrated for recommending users for a given topic.The experimental results in Sina Weibo dataset show that user influence ranking precision of TFUC is better than existing models such as TwitterRank, OOLAM, and HF CP ALS.Moreover, our proposed TFUC model can significantly outperform the baseline methods according to recommendation precision.This is an extension of our previous work [18], in which we proposed a tensor factorization based user influence analysis method.In this paper, we addressed the problem of recommending users in mobile social network and claimed that user influence is a very important factor for user recommendation.We expanded the experiment dataset and added the experiments on recommendation.To summarize, the main contributes of the work are listed as follows.
(1) The latent influence users are identified by a neural network clustering model.This model can filter the marketing users with low influence before constructing tensor, which is proven to significantly enhance the recommendation effect.
(2) TFUC model is proposed by incorporating temporal features, which can improve user recommendation accuracy.Particularly, our method integrates temporal features by tensor model, predicts user influence using temporal restrained CP decomposition, and finally recommends users considering both user influence and content similarity.
(3) We conduct extensive experiments using real-world Tecent Weibo dataset to verify the effectiveness of our proposed recommendation approach.The experimental results suggest that the proposed method can considerably improve recommendation precision and outperform the baseline approaches.
The rest of the paper is organized as follows.The recommendation problem is defined in Section 2. The proposed model for user recommendation is presented in Section 3. Experiments are conducted in Section 4. We conclude the work in Section 5.

Problem Formulation
It is well known that people tend to trust a user with high social influence in social network.Therefore, we apply users influence analysis to recommend items for users according to different topic in social media.In this paper, the items refer to users.
The goal of this work is to calculate the similarity among items and users and recommend the most similar items for users.In particular, we apply the influence scores of items while calculating the similarity to verify whether the higher influence score the user has, the more likely his recommendation is to be accepted.To obtain these influence scores, we introduce some necessary features of users, such as the number of fans and the number of posts.Therefore, we let  = { 1 ,  2 , . . .,  m } represent users' fan characteristic and  = { 1 ,  2 , . . .,   } represent users' post characteristic.Every interaction between   and   contains the time of when it takes place, so we present this data as  = {(int 1 ,  1 ), (int 2 ,  2 ), . . ., (int  ,   )}.
Referring to Varun's theory [19], we can hardly get user' influence score from single aspect.Thus, we analyze users' influence from four aspects as follows.
(1) Users' propagation ability: getting propagation ability of users is an important purpose in social networks.This ability is usually calculated from the accumulation of time in document collection .We denote this ability of   as   (  ,   , ).
(2) Users' opinion strength [20]: one's opinion strength captures his whole tendency and effectiveness in social network.By calculating all of the users' opinion polar who has interacted with   , we can get an opinion score of   .We present this score as   (  ,   , ) which can be analyzed from the document collection .
(3) Users' fans activity [21]: users with higher levels of activity may contribute more influence to other users in microblog social network.In our work, we regard the number of articles which are posted by   as his activity.We can obtain   's global fans activity by accumulating all   's activity who has ever interacted with any of   .Formally, we define   (  ,   ,   ) as   's global fans activity, where   is the collection of users' post features.
(4) Users' network centrality: according to [9,22], users who have higher influence may have more number of fans.If a user's fans have more fans, it means that the information posted by this user may spread wider.This spreading effect is known as the network centrality and denote as  V (  ,   ,   ).
Overall, we formalize user influence analysis as follows: given a topic , the goal is to find a mapping Inf  (  ,   ,   ,   , ) → (  ,   ,  V ).Users influence scores are calculated by aggregating four users' features   ,   ,   ,  V .After calculating all basic users' influence scores, we can obtain a user influence ranking list sorted by influence scores.

Recommendation with User Influence Analysis
In this section, we propose a user influence analysis model [18] and then integrate it into recommendation.A user with high influence can receive a large number of comments in a short time.A user prefers accepting influential users (referred to items) when he is receiving recommended users by recommendation system.Therefore, the performance of recommendation system will be improved by involving the influence of items.Since the factorization based method performs poorly at low ranking users, we design a two-steps method for influence analysis.In the first step, low influence score clusters are identified by a neural network clustering method.In the first step, user influence is predicted by a tensor factorization method.

Neural Network Clustering Model.
Users' global influence consists of multiple individual influence features, i.e., propagation ability, opinion strength, fans activity, and network centrality.The users with higher influence rank would have more comment and stronger opinion strength and are more centrical in the network.On this basis, we first partition data into clusters and filter users with low influence in   .We firstly describe how we obtain those four users' features.
(1) Let  denote the number of users who has interacted with   .Within a time window , we can get the delay between the time of   's first interaction happened in  and the time of   interacted with   according to [23] as follows: assume that the delay , where  V is the transmission rate parameter.The transmission rate parameter captures the capability that how wide a user can reach in the network and thus the computing process is where  are the basic users and  are the users who have interacted with ' , the indicator function ( = ) is 1 if  =  is true and 0 otherwise.Equation (2) result in the fact that the total number of times   has interacted with   and (3) captures a time accumulation of those interactions.After calculating      , we can infer time accumulation of  by following aggregate function: (2) Each user would show an opinion polar to an interbehavior which can be inferred from the interaction between him and basic users.Therefore, we can get   's global opinion strength by accumulating all opinion polar of his interactions.We utilize (5) to obtain the opinion strength of   : the indicator function (  ) is -1 if   has ever expressed a negative interbehavior and 1 if   has ever expressed a nonnegative interbehavior.
(3) As we defined previously,   's fans activity is related to the total number of articles that are posted by all of his fans.
Based on this definition, we can obtain   's fans activity as follows: (4) Recall that the number of fans of user   is available directly from   ; we calculate user   's network centrality as follows: We now discuss how to partition users in   into clusters according to those four influence features.The input samples of our method are   .Each sample   involves four features which we obtain previously.We denote each sample as Y = [ 1 ,  2 ,  3 ,  4 ], where  1 - 4 is   (  ),   (  ),   (  ),   (  ), respectively.Let   denote multiple clustering centers.Each center   has four elements, i.e., [ 1 ,  2 ,  3 ,  4 ].For the clustering problem, the loss function is where   is the clustering center of  and   is the weight of between input and interlayer.We update each   using stochastic gradient descent. where Bringing ( 10) into (9), we have We update clustering centers for each batch as where    (  ) is an indicator function for clustering center   and the result is 1 if   belongs to the cluster   and 0 otherwise.The denominator in ( 12) is a counting function which returns the number of samples in cluster   .

Construction of Tensor User Influence
Model.We assign each cluster to a specific influence category.Specifically, the assignment with most of latent influential users in   is selected to construct the tensor model.Users in this cluster are denoted as    , where    ⊆   .Our users influence model is represented by a 3-order tensor X ∈  ×× , where  is the number of users in    ,  is the number of comment users in   , and  is the number of influence features.Tensor decomposition is generally used to predict the distribution of data and the latent features of data.
Tensor is used widely in many research area, such as weather forecast, event prediction [24], information recommendation [25], and picture processing [26][27][28].Finally, we take these influence features into each tensor slice.
(1) The opinion slice of users: this slice indicates every users' interaction opinion in   on    in detail; i.e., where (  ) is an indicator function as same as the function in (5),   ∈    .
(2) The fans activity slice of users: in this slice, users who have ever interacted with   would have a activity influence upon   .Thus every element in this slice can be represent as (3) The centrality slice of users: as mentioned in Section 3, we present users' network centrality by his diffusion ability which can be presented by his total number of neighbours; i.e.,

Factorization of Tensor User Influence Model.
For the tensor X ∈  ×× , the loss function of rank-R CP decomposition [29,30] is The corresponding objective function for stochastic optimization problem is min ,,,  (X; A, B, C) However, temporal influence feature neglects this problem.Thus, a time constraint is added to the user matrix.So the influence score of users whose propagation ability is strong will increase and the score of users who postfrequently receive few comments.The new loss function is written as where Q is the time constraint matrix which can be obtained from (4).Q is diagonal and the main diagonal element is where   is the users in    .
The object function is min Following the method proposed in [30], the gradient of ( 18) is According to the theory proposed by Acar et al. [31], we can get that where We can obtain a rule for updating A by substituting (21) into stochastic gradient descent method as follows: where  is the step size.The updating rules of B are similar to C. We just give the updating rule of B due to the space limitation.
3.4.Measurement of Users Influence.We now discuss how to calculate users' influence score by utilizing the result of the tensor decomposition.Users' influence can be calculated from three different influence scores.
(1) Score of users' opinion strength: (2) Score of users' fans activity: (3) Score of network centrality: where  X is the expectation of X.We unify each influence score using min-max normalized method, respectively.And then, we use final influence score by combining these three normalized scores as follows: We add a user topic similarity metric to the combining function to increase users' influence score whose topic similarity is higher.This topic similarity metric will be explained in later sections.

Recommendation Model with User Social Influence.
In this section, we recommend items for users by using contentbased recommendation algorithm in Tencent Weibo dataset.The items in this dataset are the person, organization, or group in the real world.Initially, we obtain the preferences and interests of the users whom should receive the recommendation.Users' preferences and interests are analyzed from the articles and comments of them.After that, we establish the users characteristic model based on these preferences and interests.The other essential processes are establishing the items characteristic model.In order to adapt to the dataset, we use the preferences and interests of the items characteristic.Based on these two models, we calculate the similarity between the users and the items.Furthermore, we combine the ranking indicators of items into the similarity and call it influence-similarity.Finally, we recommend items for users according to the influence-similarity.
There are two core parts for the above content-based recommendation process: users characteristic model and items characteristic model.Since the users and the items are all the individual users in Tecent Weibo dataset, we represent every user  to a characteristic vector by using the TF-IDF method.Formally, we denote user vector as () = [ 1 :  1 ,  2 :  2 , . . .,   :   ], where   is a word that is extracted from the articles and comments which user  has ever posted, and   is the corresponding weight of the word   in the text collection.The weight is calculated by TF-IDF method as follows: where    is the number of times word   appearing in text , ∑     is the total number of words that text  contains,  is the total number of texts in the dataset, and   is the number of texts that contain word   .The next step is calculate the cosine similarity between the users and items.For example, when calculating the similarity between user    and item    , we have However, when (33) was adopted to calculate the similarity between user    and item    , it does not take the influence of item    into consideration.Therefore, we add the influence ranking indicator of item    into the original cosine similarity.Thus, the item of higher influence score could have higher probability of being recommended.The cosine similarity with the influence of items is calculate as follows: where [(   )] is the influence ranking indicator of item    in its topic areas.
For Sina Weibo dataset, we first crawled 2015 basic users in different topics, including law, basketball, economy, and health.We crawled these users' information and all articles posted by these users from October 31, 2016, to December 1, 2016.The basic statistics of Sina Weibo dataset are showed in Table 1.We annotate users' influence ranking manually according to [16].In this dataset, the interaction between two users is present as comment.If   commented in   's articles, there generate an interaction between them.There exists a delay between the time   post article and the time   commented on this article.Therefore, we can obtain the temporal characteristic based on this delay.Besides, the topic similarity metric in ( 29) is obtained the same as [16].
For Tencent Weibo dataset, we obtain it from KDD Cup 2012, Track 1.This dataset contains about 6095 high influence users in different topics.These users are called "Item" in this dataset.There are about 73,209,277 recommendation logs in this dataset.The recommendation is send to every user corresponding to a user profile such as his gender, the number of articles he posted, and keywords extracted from all his articles.We can infer user's number of fans from the relational network in this dataset.In this dataset, the interaction between two users is present as recommendation.This interaction contains two significant information, i.e., acceptability and time stamp.Thus, we can obtain   's opinion strength and temporal characteristic according to (5) and (3), respectively.For experiment convenience, we choose high influence users in four topics and the time window is from October 12, 2011, to October 13, 2011.Table 2 shows the basic statistics of this dataset.In this dataset, the topic similarity metric in (29) is obtained according to the Jaccard similarity of their characteristic set.

Baseline.
We compared TFUC with the following baselines: (i) TwitterRank [12], which calculated user influence according to the users' interactions in a certain topic.
(ii) TwitterRank C, in which we apply TwitterRank to calculated user influence based on the latent influential users cluster which obtained by our cluster model.
(iii) OOLAM [11], which is a PageRank analogous method in which interactions are divided into positive and negative parts so that users' opinion influence is calculated in positive and negative parts, respectively.
(iv) OOLAM c, in which we use latent influential users to construct positive and negative graph, respectively.
(v) OOLAM SM, in which the users' topic similarity is taken into consideration in OOLAM.(vii) HF CP ALS [16], which is a tensor model, in which users' opinion and topic relevance are taken into consideration.
(viii) HF CP ALS C, in which the cluster model is added in HF CP ALS.
(ix) CP SGD, in which low influence users are not filtered when we construct the user's tensor.
Besides, we need to verify whether the performance of recommendation system with influence has a better performance than the recommendation system without; we choose a simple recommendation algorithm, i.e., content-base(BC) algorithm to be the baseline.where  denotes -th rank and  denotes the number of users.
where  is a certain topic and  denotes the number of topics.The two recommended precision evaluations are as follows: where  is a certain topic,  is the number of topics, and  represents the number of users that need to be recommended.  is an accepted index, the value of it is 1 if the -th user was once accepted when he was recommended to other users and 0 otherwise.Equation (38) is the average precision in a single topic; it reflects the performance of the model in a single topic.Equation (39) reflects the overall performance of the model in all topics.The higher the   is, the users of higher influence score are more likely to be accept by other users.

Precision Results of User Influence Ranking.
The @ of different methods in Sina Weibo dataset is shown in Table 3.The @ of our method is optimal except for the @20 in law topic.To analyze the experimental results in more detail, we compare our method with each baseline separately.It can be seen from Table 3 that our proposed method outperforms TwitterRank, which verifies that a user with strong opinion strength, many activity fans, and high propagation ability would be influential.The precision of our method is at least 10% higher than that of OOLAM, which demonstrates that a user with high propagation ability and high topic similarity would has a high influence score.The temporal features are neglected in OOLAM SM, so that it performs worse than our method.HF CP ALS also did not take temporal features into consideration, so that the user with high propagation ability would not get a high influence score.Comparing to CPSGD, the precision of our method has improved at least 10%, which means that filtering some low influential users can improve the performance.
Furthermore, we also calculated the  and P for each method.Figure 1 shows the precision in different .The  is higher when the area under the curve is larger.The detail  and  of each method can be seen in Table 4.  of our method is better than other baselines except for the  of OOLAM in basketball topic and the MAP of our method is best among all methods.We can conclude that our method performs better than other baselines.

User Influence in Recommendation.
In the previous section, we proved that TFUC outperforms other baselines.In this section, we apply TFUC in retrieving users' influence scores in Tencent Weibo dataset.After that, we rank users according to these scores.For each basic users in this dataset, we obtain a recommendation result by counting all   's recommendation logs.If his recommendation was once accepted by other users successfully, his recommendation result could be present as 1 and 0 otherwise.By calculating the correlation coefficient between the influence ranking list and the result list, we could tell which influence analysis method we used in recommendation is closer to the practical situation.
In recommendation task, we also first recognize latent influential users by TFUC model.Now we discuss how to obtain user features to generate clustering model.
(1) As mentioned in datasets description, the interaction between two users is presented as recommendation in Tencent Weibo dataset and this interaction contains time information.Therefore, we can obtain basic users' propagation ability from this time information by (3).
(2) Due to lack of direct opinion information in Tencent Weibo dataset, we present the acceptance of the recommendations as the opinion of a user to the basic users.In this case, we can obtain basic users' opinion strength according to (5).
(3) The users' fans activity and network centrality are obtained similar to ( 6) and (7).
After getting these four users' features, TFUC model partitions users into different clusters and constructs tensor model based on the users whom in the latent influential  user cluster.After decomposing the tensor model, all users' influence score can be predicted according to (29).
To calculate the precision of the recommendations, we need to obtain the successful accepted list.In Tencent Weibo dataset, if a   accepted the recommendation of whose entity is   , the system will record this interaction into recommendation logs.By calculating which user is successfully accepted by other users in logs file, we can get an acceptance list.
Finally, we can get a recommendations precision by compare the influence list and the acceptance list according to (39). Figure 2 and Table 5 show the results.
It can be seen from Figure 2 and Table 5 that the recommend results which combine with the users influence ranking list obtained from our method have the similar performance to the OOLAM method in each topic.However, our method shows a better overall performance in four topics.This result illustrates that when the influence scores which add temporal characteristic and topic similarity were applied in the recommendation system, the items with a higher influence are more likely to be accept.The   value of our method has promoted 2% to 8% than OOLAM SM.The result above reflects that when the temporal characteristic is considered, the items influence ranking list is better adapted to the actual recommendation results.Our method has a higher recommendation precision than the method HF CP ALS which also confirmed the above conclusion.Compared with CP SGD methods, the   value of our method has improved in every topic.This is due to the fact that we filter the impact of low influence but high activity marketing items of which the recommendations have a low probability to be accepted.
Based on the analysis above, we can conclude that the high influence items obtained from our method have a wider range of probability to be accepted by users.Therefore, we combine the recommendation system with the item influence obtained from our method.Firstly, we calculate the recommendations result list from rec log train dataset in topics "1.6.2.1", "1.1.2.1", "1.2.2.1", and "1.12.4.5".We next calculate the influence-similarity between the users and items in this recommendations result list.Then, we choose the top 100 most similar results as our recommendations for users and calculate the average recommended precision in each topic.Finally, we obtain the   by fusing the average recommended precision of each topic.The   of contentbased recommendation method is 14.5.The   improves to 15.5 when TFUC is integrated.This indicates that the performance of the recommendation system can be improved when the influence is considered.

Conclusion
This paper focuses a recommendation task in which users' influence analysis is involved in microblogs.We introduce a two-steps method for influence analysis.Firstly, users are partitioned into influential part and uninfluential part.And then, we expect CP decomposition with stochastic gradient descent method to expedite decomposition.In addition, a time constraint matrix is also involved in the user factor matrix during the decomposition.Finally, we apply TFUC model to recommend items for users according to and ⊙ is the Khatri-Rao product between C and B. In the same way, we can getT(A, C), T(A, B), Y(A, ⋅, C), and Y(A, B, ⋅).

Figure 1 :
Figure 1: Precision in different position for various methods.

Figure 2 :
Figure 2: The average precision of the recommendations.

Table 3 :
Ranking precision for various methods in four topics.
is the set of real top- users and   is the predicted set of top- users.

Table 4 :
AP and MAP comparisons between various methods.

Table 5 :
The mean precision of the recommendations.