^{1}

^{2}

^{3}

^{4}

^{1}

^{1}

^{2}

^{3}

^{4}

With the development of social networks, microblog has become the major social communication tool. There is a lot of valuable information such as personal preference, public opinion, and marketing in microblog. Consequently, research on user interest prediction in microblog has a positive practical significance. In fact, how to extract information associated with user interest orientation from the constantly updated blog posts is not so easy. Existing prediction approaches based on probabilistic factor analysis use blog posts published by user to predict user interest. However, these methods are not very effective for the users who post less but browse more. In this paper, we propose a new prediction model, which is called SHMF, using social hub matrix factorization. SHMF constructs the interest prediction model by combining the information of blogs posts published by both user and direct neighbors in user’s social hub. Our proposed model predicts user interest by integrating user’s historical behavior and temporal factor as well as user’s friendships, thus achieving accurate forecasts of user’s future interests. The experimental results on Sina Weibo show the efficiency and effectiveness of our proposed model.

Online microblog systems such as Sina Weibo, Twitter, and Facebook provide a convenient platform for users to share their information. The number of such social media users showed exponential growth in last decade. A recent snapshot of the friendship network Facebook indicated that there are over 1 billion users in it. These social networks are becoming not only effective means to connect their friends but also powerful information dissemination and marketing platforms to spread ideas, fads, and political opinions.

Microblog contains a vast amount of information, and topics of users and user groups always change with hotspot at home and abroad or over time. In this context, research on user interest prediction is useful in network marketing, public opinion analysis, or even public security [

It should be noted that user interest prediction is different from user interest detection, as the latter mainly focuses on mining users’ current interests. Interest prediction remains a relatively understudied problem that poses two main challenges. First, user interest in microblog changes over time or time interval. In the time-aware prediction model, user’s temporal preference is an important aspect. Furthermore, long-term preference and short-term preference will result in different prediction result. Second, user interest is a dynamic phenomenon; it maybe migrates due to the topic migration of one’s social hub. In the real world, capturing user’s friendship and their topics is difficult.

Recently, a lot of models for prediction have been investigated [

In fact, we observed several interesting phenomena. There exist some users who publish less but browse more blog posts and we call them silent type users. Such users may have very explicit interest and just may be prudent to express their ideas. And they do publish their opinion at an appropriate moment. However, existing prediction models always fail to predict their interests. Another kind of users expands their social hubs by focusing on new friends’ topics they are interested in. We call them interactive type users. In other words, the interest of such users can be represented by the interest of direct neighbors in their social hubs to some extent. Obviously, prediction models ignoring the impact of this interactive property always result in incomplete forecast.

In order to overcome the shortcomings of existing works, combining our observations about microblog, this paper proposes a social hub matrix factorization-based model for user interest prediction model in microblog, which is called SHMF. SHMF incorporates the impact of user’s social hub on user’s interests in our model to improve the quality of prediction. The experimental results on Sina Weibo dataset show that our approach improves the prediction accuracy and the performance efficiency.

The rest of this paper is organized as follows. The related work is discussed in Section

With regard to user interest prediction in microblog, there are a series of mature methods that are based on probability matrix factorization of probabilistic graph model. Probabilistic graph model is a kind of model which can concisely express complex probability distribution, effectively calculate the edge and condition distribution, and conveniently learn the parameters and hyperparameters in probability model [

In 2008, Salakhutdinov and Mnih [

The above studies neglect the impact of the information of the blogs posted by others in their social hub on the user’s future interest and behavior, when they establish the Weibo user interest prediction model. Aiming at this problem, in this paper, we propose a new user interest prediction model (SHMF) based on PMF, which combines user’s history behavior, user’s social trust relationship, and the impact of the information of the users’ social hub on the user’s interests in the future. And it designs experiments on the Sina microblog real dataset to prove that this prediction model and the algorithm of the model are superior to the previous prediction model in top-

In this section, we give the notations that will be used in the following discussions. In prediction model, we have a set of users

The users’ interests expressed by user-topic matrix are given in

In microblog, each user can follow others whom he is interested in; then users’ friendships can be described as a user-user matrix

Generally, user interest prediction model is to generate a user-interest matrix in the next time segment. The basic matrix factorization (MF) approach finds the approximate matrix of the original matrix in the low-rank space as a predictive approximation matrix. It has been proven to be effective to learn the latent characteristics of users and topics and predict the scores using these latent characteristics. The conditional probability of the known scores is defined as

As is shown in (

In fact, the relations among users in social network architecture play an important role in users’ behaviors [

Figure

Graphical model of SocialMF.

The user-topic matrices in PMF and SocialMF model are all constructed from the user’s historical behavior information and do not take time influence into account. Meanwhile TS-PMF model incorporates characteristics of the user interest over time and adds the exponential decay function to analyze the user-topic matrices [

Adding the exponential decay function to analyze the change of user interest, the computing formulation is listed as follows:

The user’s latent feature vector is affected by his historical interests and his friends’ interests. Therefore, the conditional distribution probability of users’ latent features can be expressed like this:

Now, through a Bayesian inference, we have the following equation for the posterior probability over latent features of users and topics:

Maximizing the log of the posterior distribution with regard to

In this section, we present our model, SHMF, to incorporate impact of user’s social hub into MF approach for prediction. SHMF combines user’s historical behavior, social trust relationship, and blog articles posted by friends in user’s social hub.

Based on the above hypothesis, we have

Therefore, the conditional distribution probability of users’ latent features can be expressed as follows:

Through a Bayesian inference, we have the following equation for the posterior probability over latent features of users and topics:

The log of the posterior distribution for SHMF at time point

Maximizing the log of the posterior distribution with regard to

In (

In order to reduce the computational complexity, stochastic gradient descent is used to optimize the local optimum of the loss function, as shown in (

SHMF model provides an effective way to predict users’ interests. The procedure of prediction will be described with two algorithms in Section

To evaluate the effectiveness and efficiency of our approach, we implemented a prototype system of user interest prediction. According to SHMF model and its variant, we provide two algorithms with different parameters and procedures.

The architecture of our implementation is illustrated in Figure

The framework of predicting users’ interests.

SHMF integrates user’s history behavior, user’s social trust relationship, and the impact of the information of user’s social hub. The process of predicting users’ interests with SHMF is described in Algorithm

Dataset:

The dimension of the latent feature:

Parameters:

An updating parameter:

Convergence parameter:

The maximum number of iterations:

The user-topic matrix in time segment

(1)

(2)

(3) initialize

(4)

(5) Compute the mean matrices

(6)

(7)

(8) compute the gradient descent in Eq. (

(9) updating in Eq. (

(10) compute

(11)

(12) break

(13)

(14)

(15)

(16)

(17)

(18)

(19)

(20) predict

We used the dataset from 1 May 2016 to 31 May 2016, which we downloaded from Sina Weibo. This dataset includes more than 20 million microblog messages, time-stamps, and user-to-user relationships.

The basic idea of traditional collaborative filtering is that similar users make similar choices, or similar options are chosen by similar groups of users [

Taking into account the complexity of the calculation, the selection of users is very important in the microblog user interest prediction. In a month, different users will post different numbers of microblogs. Someone only posts one, but someone posts tens of thousands. For such users who post little of microblog in a month, personal microblog information and social hub microblog information are unable to describe their interests. However, for the users who post lots of microblogs in a month, they mostly are enterprises and institutions of the official microblog or commercial procurement service, and it is meaningless to predict user’s interest based on those users. To do this, we perform a statistical analysis on the dataset from Sina Weibo and find that the number of microblogs posted by most users is 100 or less as shown in Figures

Statistical analysis of the dataset from Sina Weibo.

After getting the user’s blog information, we train the LDA model and use it to automatically classify the blogs posted by users and the blogs posted by others in user’s social hub, and the number of topics is calculated by the perplexity. According to perplexity-numbers of topics curve shown in Figure

Perplexity-numbers of topics curve.

In this section, effectiveness and efficiency of our SHMF model are evaluated. We conduct experiments on Intel Core i7 processor with 4 cores running at frequency of 3.60 GHz, 24 GB memory, and 1TB hard disk. The programs are run on Windows 7 Professional and Anaconda 4.1.1 (64-bit).

We first present evaluation metrics used throughout our experiments. Next, we employ the variable-controlling approach to adjust the parameters of SHMF model and the other three models. Then the prediction accuracy and the performance overhead of our model are compared with results of the other models. Finally, we will analyze the experimental results.

Because of the great uncertainty of the behavior of user posting blogs, the recall rate has little practical significance in this issue, and in the real life users pay more attention to the top-

We set up three experiments, PMF [

First, the variable-controlling approach was used to adjust the parameters to better values, and then we compare their top-

Impact of different values of different parameters in the PMF model on performance of user interest prediction.

According to Figure

Impact of different values of different parameters in the SocialMF model on performance of user interest prediction.

Based on Figure

Impact of different values of different parameters in the TS-PMF model on performance of user interest prediction.

From Figure

Impact of different values of different parameters in the SHMF model on performance of user interest prediction.

According to Figure

By adjusting the model parameters of five experiments, the average accuracy of the five models under most parameters is shown in Table

Precision of SHMF.

Pre_avg | |
---|---|

PMF | 17.35% |

SocialMF | 17.37% |

TS-PMF | 17.91% |

SHMF | 18.67% |

It can be seen from Table

Performance of SHMF.

Run-time (s) | |
---|---|

PMF | 698.618 |

SocialMF | 1227.088 |

TS-PMF | 1721.555 |

SHMF | 2080.513 |

It is found from Table

Based on the work of the prediction of microblog users’ interest, this paper analyzes the information of microblog users’ social hub and puts forward the SHMF model, which greatly improves the top-

For the future work of microblog users’ interest prediction, further research on the expression of interest should be carried out to achieve more accurate representation, which determines the upper limit of interest prediction. In the prediction algorithm, we should add more techniques, such as Bayesian analysis, to solve the multiparameter problem by analyzing the relationship between the parameters and the actual meaning.

The user-topic matrix in time

The user’s social hub-topic matrix in time

The user-user matrix

The hub-hub matrix

The users’ latent feature space in time

The topics’ latent feature space in time

The users’ latent feature space in social hub in time

The topics’ latent feature space in social hub in time

The final users’ latent feature space in time

The final topics’ latent feature space in time

The mean matrix of

The mean matrix of

The mean matrix of

The mean matrix of

A weight that indicates how important the whole previous time points are to the current one

The kernel parameter

The dimension of latent feature space

A weight that indicates how important the user’s social hub information is to the user’s interest

The impact of the users’ latent feature vectors on users’ interests

The impact of the social hubs’ latent feature vectors on users’ interests

The impact of the topics of the blogs posted by users on users’ interests

The impact of the topics of the blogs posted by others in users’ social hub on users’ interests

The impact of the users’ relationships on users’ interests

The impact of the social hubs’ relationships on users’ interests.

The authors declare that there are no conflicts of interest regarding the publication of this paper.

This work is supported by the National Natural Science Foundation of China (31371340) and the National Key Technologies Research and Development Program of China (no. 2016YFB0502604).