Weibo services, provided by the service providers, is simple and changeless. The research based on the content of microblog reflects the user’s personalized features. The method has important significance to improve user satisfaction and expand the scale of users. First, the interest classification problem called multiclass classification algorithm is proposed based on improving support vector machine of binary tree. Second, an improved model of mixed interest based on implicit feedback is proposed. This method is based on the shortcomings of the establishment of the interest model and the drift strategy in update phase among existing users. The improved model is applied to the user modeling of personalization, improving the authenticity and accuracy of the personalized modeling.

The main purpose of the interest classification method is to classify the interest information and this interest classification information will be used to create the personalized interest model of weibo users. Because weibo users’ interests in certain topics will continue for a long time, such as athletes’ interests in sports which will be long term and even last for a lifetime, so the theme of interest in these categories in the general case will be a long term, and weibo users will often update weibo in association with the interest in this subject. Weibo users may, however, only in a specific time period focus on a particular interest in the category; it will be shown that the user within a period of time frequently updated weibo in relation to a topic, but after this period of time, the user will seldom update weibo in relation to this topic. When the World Cup is held, the user will focus on information related to the World Cup and update the weibo about the World Cup; after the end of the World Cup, the user pay little attention to the interest category, so they seldom update weibo in association with this category.

Short-term interest model only considers the user’s immediate interest, ignoring the long-standing interest. These interests of users are formed for a long time. Although some interest categories of weibo hardly renew, these interest categories should not be excluded by the algorithm. Long-term interest model focuses too much on the time factor, neglecting to take the initiative to find new users’ interests. Aiming at the shortcomings and deficiencies of short-term and long-term interest model, this paper proposes a hybrid model to optimize the interest model, through the comparison showing that this model reflects the real user’s interest even more.

In order to have a more accurate representation for users’ interests, the text takes feature vector to represent the microblog information. Text feature vector is composed of feature words and their corresponding weights, which is represented in terms of their ability to document the importance of that feature words in a document for theirself. In other words, the more important a feature word is in the document, the higher its weights are. Currently, the researches in which many researchers study the term weights are relatively mature, and TF-IDF is now widely used as the calculation method [

Among them,

Currently, feature word weight algorithms have got some relatively mature calculation methods, but these methods still have many shortcomings and deficiencies. Many domestic and foreign researchers have been conducting research related aspects, and some researchers have found some reasonable weight algorithm. The weight of feature words is to be calculated based on the location where feature words are in the document and words’ frequency:

The calculation of weibo feature word weight.

Statistics weibo’s number

First, find the feature words’ set,

Calculate document frequency

Use the method of TF-IDF-MI to calculate each feature word weights in a candidate set of feature words:

Among them,

User interest model not only records the interest content, but also needs to record other information, such as interest update or time’s creation and interest weights, in order to provide personalized service. For the user interest model, how to store the user interest model is very important. The interest tree is used to store the user interest model.

In this paper, the user interest model (including the long-term interest model, the short-term interest model, and the mixed interest model of optimization) uses the vector space model to represent. Vector space model is a user interest model with

The logic structure of user interest model.

The root node of the tree structure indicates the user node; the middle layer represents the user interest categories nodes; the node at the bottom represents feature word node on each interest category. Counting the number of all weibo in interest category in this time period, the total number is recorded as

From the discussion of the linear discriminant function, it can be drawn in this paper that the text is set linearly separable so that the text can always find the correct solution about division of the sample. In general, this paper can get infinitely many solutions, but it does not determine which is the optimal solution. The threshold for support vector machine is to separate the two types of vector texts set for the maximum interval.

As shown in Figure

The two sets of linear category.

In this paper, the user interest category problems come down to text multiclass classification problems. One-to-many method first needs to train

Directed acyclic graph SVM method.

Multiclass classification algorithm of support vector machine based on binary tree consists of training process and classification process, and the following description is based on training process and classification process of multiclass classification algorithm of support vector machine based on binary tree.

The algorithm is based on training process of multiclass classification algorithm of support vector machine based on binary tree.

Calculate the category total of the training data set

Construct a binary tree node.

If

The first

Each training set of the training data subset CateA is marked as −1, and each training set of the training data subset CateB is marked as +1; then apply training data subsets of two types of CateA and CateB to construct a second-class classifier of support vector machines.

Training data subset of CateA and training data subset of CateB repeat Steps

If

A category in the second-class training data is marked as 1 and the other category is marked as + 1; train a second-class classifier of support vector machine (SVM).

The category label of the training data is set as a leaf node of binary tree structure; this category label is the classification label when test data goes into the leaf node.

When the training is completed, the second-class classifier of support vector machine is regarded as an intermediate node of binary tree structure, Step

First, we define that class distance

Among them,

Automatic text classification refers to the use of a computer program to determine the text category process according to the content of the text under a given system of classification. Automatic text classification is intended to estimate relationship of dependency between the input and output of the system based on the known training data set, so that the unknown output can make accurate predictions as possible [

User's interest in real life often tends to slowly change as time goes on. The user will gradually forget a once interest category and at the same time slowly find interest in a new category. In this paper, the change process of the user's interest is referred to as “interest drift.” Interest drift phenomenon makes the user's interest model change accordingly with the passage of time. Therefore, the drift strategy of user’s interest model should be considered in the study of user’s interest model. In the study of the user models, there are two methods of interest drift that researchers frequently used: the first is the use of a sliding time window model to represent user interest model. This way lays too much emphasis on real-time user interests, ignoring the persistent performance. The second is the use of forgetting function to attenuate samples. This way lays too much emphasis on forgotten strategy, ignoring to discover new interests for users [

This paper directs the defects of interest drift strategy of the existing user interest model in establishing and updating stage, putting forward improvement strategies of interest model. First, we create a model of user interest vector and propose the user model attenuation algorithm and then analyze defects of interest drift strategy of the current user. Finally, the improved user interest model is proposed in view of these drawbacks.

Human memory follows the laws of forgotten nature, and it means that human memory gradually weakens with the passage of time. This paper assumes that attenuation of user interest also follows the same laws of forgotten nature like memory, and it means that user interest gradually weakens with the passage of time, and attenuation is fast before it is slow [

Therefore, this paper introduces the concept of forgetting factor to forget the user interest model. When updating the user interest model, users not only add the newest interest categories to the user interest model, but also adjust the weight of existing interest categories in the interest models. It means to fix feature word weights of the interest category by forgetting factor and to gradually eliminate those “old” features words that are no longer in use.

Forgetting factor

Among them,

As shown in Figure

The attenuation curve of interest degrees.

In recent years, researches of the user interest drifting technology mostly adopt the strategy of time window attenuation algorithm. This strategy is controlled by time, and any interest attenuation is equal. Grabtree and Soltysiak adopt the time window method [

In order to better tap user interest from the user data, many researchers try to give a lower weight to the stale data, give the higher weight to the new data, and use all the user data to model, avoiding the disadvantages of time window method. Koychev and Schwab believe user interest attenuation is similar to natural forgotten regularity [

The framework of forgotten model.

Although the forgetting process model can handle the changing process of user interest in a longer period of time, it is difficult to respond to sudden changes of the short-term user interest. In order to solve this problem, this paper puts forward the hybrid model method which is combining the short-term interest model with long-term interest model. As is shown in Figure

The framework of mixed interest model.

Short-term interest model drift strategy: short-term interest corresponds to user’s current interest, which is active and frequently changing. The short-term interest updating method requires that system can quickly respond, so it adopts the tactics of sliding time window. The window data is the user's current interest, and short-term interest model is also updated along with the user’s updating weibo.

The formula is as follows:

Among them,

Long-term interest model drift strategy states the following: firstly, long-term interest reflects the interests of the user for a long time, it is relatively fixed. However, the user's interest in certain interest category will be forgotten and the degree of the interest category will gradually decrease over time. When a user's long-term interest drifts, this paper uses attenuation algorithm of user’s interest model 4.3.2 to update the user's long-term interest model; secondly, the updated interest model is combined with the new interest model; finally, the combined interest model is the latest long-term interest model of the user. Long-term strategic interest drift is calculated as follows:

Among them,

Among them,

In experiment, this paper selects crawl data from sina weibo to establish personalized model for the user. This experiment selects Brooklyn to establish personalized model for users. Collecting the Brooklyn's updated 526 weibo in the recent period of time, 423 useful microblogs are extracted and analysised before mining. Then the text of the micro-blogs are handled, respectively. First, extracting the data of the first 15 days is used for initializing short-term interest model, and extracting the data of the first 30 days is used for initializing long-term interest model; then extracting data is used to update the short-term interest model every 15 days. The long interest model is updated in every 30 days and experiments are performed on each time point, respectively. Finally, the impact of the long-term interest model and the short-term interest model on user interest model at various time points calculates the proportion of short-term interest and long-term interest at various points of time and then gets optimized mixed interest model.

This paper takes a variety of combination tests to the half-life

Short-term interest weight of each interest category in each period of time is shown in Table

The short-term interest model of Brooklyn.

Timestamp | Traffic | Sports | Military | Medicine | Politics | Education | Environment | Economic | Art | IT |
---|---|---|---|---|---|---|---|---|---|---|

January 10 | 0 | 80.6 | 54.1 | 3.2 | 0 | 0 | 0 | 70.8 | 53.4 | 0 |

January 25 | 0 | 50.1 | 10.6 | 0 | 69.3 | 0 | 0 | 36.5 | 31.9 | 90.6 |

February 10 | 0 | 60.5 | 2.8 | 0 | 59.9 | 45.6 | 24.3 | 19.4 | 6.3 | 15.6 |

February 23 | 4.9 | 75.3 | 0.3 | 4.9 | 66.2 | 8.9 | 39.8 | 4.2 | 1.2 | 55.9 |

March 10 | 0 | 63.8 | 0.1 | 1.6 | 47.5 | 18.6 | 6.7 | 0.5 | 80.6 | 57.3 |

March 26 | 0.3 | 59.7 | 0 | 0 | 8.9 | 39.5 | 26.8 | 0.1 | 15.2 | 70.7 |

In the experiment, the parameters of the selected model are as follows:

The comparison diagram of different interest drift strategy experimental results.

It can be seen from the experimental results that the sliding time window model, namely, short-term interest model, only refers to user’s updated weibo of the recent 15 days, so the model can accurately grasp the short-term interests of users. When users are more concerned about the short-term interest, sliding time window model performance is slightly better than forgetting policy model. Because sliding time window model takes into account long-term interests of the users, the overall performance of the model is the worst.

The results can be seen from Figure

This paper puts forward an improved mixed interest model based on implicit feedback. The research and development of the current personalized modeling technology are reviewed. This paper introduces the common user modeling method and highlights the interest classification method based on support vector machine (SVM) and interest drift method combining sliding time window with forgotten strategy. Experiment result shows that the proposed method achieves personalized modeling of weibo users, which has good performance and scalability.

The authors declare that there is no conflict of interests regarding the publication of this paper.

This work was supported by the National Natural Science Foundation of China (Project no. 71171068). The authors highly appreciate the above financial supports.