A Probabilistic Recommendation Method Inspired by Latent Dirichlet Allocation Model

The recent decade haswitnessed an increasing popularity of recommendation systems,which help users acquire relevant knowledge, commodities, and services from an overwhelming information ocean on the Internet. Latent Dirichlet Allocation (LDA), originally presented as a graphical model for text topic discovery, now has found its application in many other disciplines. In this paper, we propose an LDA-inspired probabilistic recommendation method by taking the user-item collecting behavior as a two-step process: every user first becomes a member of one latent user-group at a certain probability and each user-group will then collect various items with different probabilities. Gibbs sampling is employed to approximate all the probabilities in the two-step process. The experiment results on three real-world data sets MovieLens, Netflix, and Last.fm show that our method exhibits a competitive performance on precision, coverage, and diversity in comparison with the other four typical recommendation methods. Moreover, we present an approximate strategy to reduce the computing complexity of ourmethodwith a slight degradation of the performance.


Introduction
The advent of Internet has confronted us with an exploding information era.We find that it is very difficult to select the relevant ones from countless candidates on the e-commerce websites.As an automatic way to help people make right decisions under the information overload, the recommendation system has become a significant issue for both academic and industrial communities.
During the last decade, lots of recommendation methods have been proposed, including collaborative filtering methods [1,2], content-based methods [3], spectral methods [4,5], and iterative refinement methods [6,7].These methods are all based on the computation of either user similarity or item similarity or both.Recently, some network-based recommendation methods have been proposed to mine the latent relevance of users and items, such as the methods based on mass diffusion or association rules [8,9].
Latent Dirichlet Allocation (LDA) was first presented as a graphical model for text topic discovery by Blei et al. in 2003 [10], which can be used to find the inherent relation of words and generate document set through the model.LDA has been widely used in document analysis [11][12][13], document classification, and document clustering [14][15][16].LDA was first introduced into recommender systems for analyzing the context in content-based methods [17].Now in tag-based recommendation systems, LDA is widely used to find the latent relation between keywords of item description and item tags created by users, such that the items can be recommended based on the tags [18][19][20].For instance, Kang et al. [21] proposed an LA-LDA model which considers not only the tags created by the target user but also the tags created by his/her friends in the social network to extend the scope of candidate tags created by the target user.
In this paper, we propose a new content-unaware probabilistic recommendation method inspired by LDA model.Users' collecting behaviors are probabilistic events, in which one user belongs to multiple user-groups and users in each user-group have different collecting preferences.In our method, the collecting process is regarded as two joined probabilistic processes intermediated by the user-group; that is, every user is a member of one latent user-group at a certain probability, while each user-group will collect various items with different probabilities.
Calculating the probabilities on the entire data set is time-consuming and space-consuming.In order to reduce the computing complexity of our method, we introduce an approximate strategy with a slight degradation of the performance, which samples a part of the data set to build a rough probabilistic recommendation model.
Many products on an e-commerce website are not popular; that is, the sale of every single product lies in the tail of sale curve, but the sales of all these unpopular products constitute a big portion of the whole income.That is the so-called long tail phenomenon.Therefore, a good recommender system must focus on both the accuracy and the diversity.The experiment results on three real-world data sets, MovieLens, Netflix, and Last.fm, show that our method exhibits a competitive performance not only on the precision and the coverage but also on the diversity.

Recommendation Model.
People have different and multiple inner attributes, including physiological characteristics, preferences, taboos, and religious beliefs.These attributes can be clustered into lots of user-groups which can represent users with similar attributes.Actually, a user does not belong to only one user-group.For example, user  is a male and he is a high school student, minor, Chinese, and a Christian as well.One user belongs to multiple user-groups while users in different user-groups have different habits.For instance, users in the user-groups which contain the attribute of "elder" are more likely to buy health care products and presbyopia glasses than those who belong to the user-groups containing the attribute of "younger." In our recommendation model, we put forward two assumptions: (1) the users' collecting behaviors are probabilistic events; (2) one user belongs to multiple user-groups and users in the same user-group have similar collecting preference.
The collecting action of users on items is therefore considered as a two-step probabilistic process; that is, users are observed as members of several latent user-groups and users in user-groups will collect items based on the groupitem probability distributions.Here we assume that  is a user-group which a user belongs to;  is the probability vector between users and user-groups.Each column of the vector represents the probability that a user belongs to this usergroup.The probability that a user  belongs to user-group  can be expressed as ( | ) =    ;  is the commodity probability vector and each column of vector represents the probability that the users of current user-group will collect this item, while the probability that a user who belongs to user-group  will buy item  can be expressed as ( | ,   ) =    .In fact,  reflects the degree of association between users and user-groups and  shows which item the users who belong to this user-group are more likely to buy.For example, a user who loves both basketball and music belongs to two user-groups, but he prefers to play basketball rather than listening to music.The association intensity between user and user-group could be demonstrated by the probability that the user belongs to each user-group.For a student, he may not care about household items but usually buy books for study.Based on the assumption above, if there are  groups, the probability for user  to buy item  can be expressed as As long as  and  are calculated, the collecting probability vector of the users can be computed.We can get a list ranked in order of the probabilities.Deducing from the list, a proper recommendation can be given to the users.In fact, it is not easy to calculate  and  directly.
Considering that, the Latent Dirichlet Allocation (LDA) is a probabilistic model that uses a latent topic to bridge documents and words.Using the latent topic, LDA constructs the documents via two probabilistic processes that chooses a topic after the first probability prediction and then collects words from the attributes of the topic-word according to the topic.Inspired by the LDA, the structure of our recommendation model is designed as a three-layer structure of Bayesian, that is, the user layer, followed by the usergroup layer and the item layer.To construct it, parameters are used in pairs.The recommendation model is determined by the hyper parameters  and , in which  describes the relative intensity between user-groups, ∼Dirichlet (), and  reflecting the probability distribution of each user-group, ∼ Dirichlet ().The complete graphical model representation of LDA for probabilistic recommendation model is as shown in Figure 1.Indeed, we can construct the model without the items or user descriptions.So it is a content-unaware probabilistic recommendation model.
In our method, the probability that a user  has an attribute  is expressed as ( | , ) =    ; the probability that users who have attribute  purchased items  is expressed as ( | , , ) =    ; and the probability that user  purchases item  is expressed as (2)

Parameters Estimation.
There are many approximate inference strategies to estimate parameters  and  in LDA, such as Laplace approximate, variational inference, Gibbs sampling, and expectation propagation.Griffiths and Steyvers put forward that the perplexity and speed of Gibbs sampling method is better than those of other methods [11].
Since the structure of our method is similar to that of LDA, we chose Gibbs sampling algorithm to estimate parameters  and  as well.Gibbs sampling is a simple MCMC (Markov chain Monte Carlo) method.It constructs a Markov chain which converges to the target distribution and samples.Each state of the Markov chain represents the value of the usergroup, and the sampling variable is an implicit variable, which is assigned to items collected by users.The transition between states follows a simple rule.By sampling on the current values of all variables and data set of users' purchases, the chain can translate to the next state.
Here, we use the posterior distribution (  | , ,   , , ), which is calculated by counting the user-groups assigned to items, as the transition probability of user-group shifts from  to   for item  which is purchased by user , as shown in Here,  ( * ) − is a count that does not include the assignment of item ; −, is the number of times that item  has been assigned to user-group ;  ( * ) −, represents the number of times that all items have been assigned to user-group ;  ()  −, is the number of times that items purchased by user  have been assigned to user-group ;  ()  −, * represents the quantity of items that user  had purchased.
When the Markov chain is near the target distribution after adequate iterations, we recorded its current values of the implicit variable  and used it to estimate  and  as shown in ( 4) and (5): Here  ()  is the number of times item  has been assigned to user-group ;  ( * )  represents the number of times that all items have been assigned to user-group ;  ()   is the number of times that items purchased by user  have been assigned to user-group ; and  ()   * represents the quantity of items that user  has purchased.

Approximate Model.
Actually, the data set is updated every day.It is not only time-consuming but also spaceconsuming to use the entire data set to structure the recommendation model.To save time and space, we prefer to model with less data and the recommended items are only listed when required instead of preparing them in advance.In this paper, we present an approximation method to structure the approximate model of the probabilistic recommendation.
In the approximation method, we sample part of the users' collection data from the data set to structure an imprecise probabilistic recommendation model.The imprecise model will serve as a guide to create a recommendation list from two sides.On one hand, the latent user-group vector  will be initialed by using the parameter  of imprecise model.On the other hand, the transition probability (  | , ,   , , ) is defined as the product of constant  calculated in the imprecise model and iteration parameter , as shown in The processes of the approximate method are described as follows.
(1) Choose part of users from the data set for constructing the approximate model, called approximating data.(2) Use the approximating data to initiate the Markov chain: random user-group  from 1 to  is assigned to each item  collected by user .
(3) Use (3) to iterate the Markov chain until it is converged.Equation (4) will be used to work out the  and then used to construct the approximate model.
(4) When user  needs a recommendation list, usergroup  is assigned to each item  collected by user .The parameter , collecting probability of the user-group, which was consequently worked out in Step (3), will be used to initialize  from 1 to  probabilistically.This is the initial state of the Markov chain for user .
(5) Use ( 6) to iterate for appreciable number of times, and we denote the result as , also called burn-in space.
It is thought that the Markov chain is near the target distribution.Then, we record the current values of .
(6) Sample once in a certain number of times  which is called thinning space [22].According to (5), we can estimate , the purchasing probability of user  in each user-group.
(7) By ranking the product of  and , the recommendation list can be provided.
The time consumption of the approximate model depends on the size of approximating data.Indeed, the performance of the approximate model depends on the size as well.In the experiment, we use different percentages of data as approximation data to find the optimal size.Approximate model is, however, imprecise owing to the use of data locality.Meanwhile, the performance oscillates when different data is chosen to do approximate modeling.Different strategies (random: randomly choose users within the entire data; item degree: according to each user's average degree of items sampled proportionally; user degree: according to the average degree of user sampled proportionally; quick classification: use a quick classification method to classify the users and then sample proportionally) are compared to find out the user distribution offered by which strategy is most similar to that of the entire data and has the same tendency of performance.In the experiment, we use the average value and the upper bound value to represent the performance of the approximate model.1) are used to evaluate the performance of the proposed LDAbased recommender method.The first data set Movielens is provided by GroupLens Project at the University of Minnesota.The second data set Netflix is a randomly selected subset of the huge data set released by the DVD rental company Netflix for its Netflix Prize.The third data set Last.fm was released in the framework of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011).According to the chronological order of the data in MovieLens, we chose the early 80 percent of the data as the training set and the later 20 percent as the probe set.For the Netflix and Last.fm date sets, the data is randomly selected into two parts: the training set contains 90% of the data and the remaining 10% of the data constitutes the probe set in the experiment.

Evaluation Metrics and Comparison
Methods.Three evaluation metrics were used to assess the recommendation's effect in the experiment: precision, coverage, and diversity.Precision is a basic evaluating metric.It is defined as the proportion of users that accept the recommended items: Here,  represents the list of items recommend to users;  represents the set of items that the user has bought.Different algorithm will provide different recommendation list to users.The union set of recommendation lists   can be used to work out the proportion of recommended items in the entire item set.We use coverage to define this proportion, as shown in Here,   represents the list of items recommended to user  and  represents the quantity of all items.Diversity is an important metric for personalized recommender systems.It is used to evaluate the difference between users' recommendation lists, and we use the average hamming distance of recommendation list to define diversity as follows: () is the hamming distance of recommendation list between user  and user ;  is the length of the recommendation list; and  represents the quantity of users.
For comparison, we present the results of the four recommendation methods which are the probabilistic spreading (ProbS), heat spreading (HeatS), user-based collaborative filtering (UserCF), and the association rule algorithm (ARule).User-based collaborative filtering algorithm is one of the most classic collaborative filtering methods.Based on the similarity of purchased items between users, it recommends the items that similar users have bought but not yet bought by the user himself.The association rule method is also widely used in recommender systems.This method concentrates on the latent relationship between items.To find these relationships, every user's item list is analyzed to create a list of the most related items called association rule.Heat spreading method, a variant of probabilistic spreading method, has the highest rate of coverage and diversity in current recommendation algorithms, but it ignores accuracy.In the experiment, we use accuracy of recommendation as the lower bound of precision and use its coverage and diversity rate as the upper bound.Therefore, we use the enhance metrics to evaluate the performance, as shown in (10a)-(10c).Heat spreading and probabilistic spreading are integrated methods which do well on precision but are not so good on coverage and diversity: -Diversity = Div Div HeatS .
To evaluate the performance of the approximate model, missing rate  and comprehensive Comp are defined in the metric, as shown in (11).Here,  * presents the missing rate of performance such as precision, coverage, and diversity;  * presents the performances of normal model and  * is the approximate performance of the model;  denotes the percentage of approximate data in the entire data set; Δ is the controlling parameter that accommodates the floating of ; , , and  are the weight of metrics (precision, coverage, and diversity) for comprehensive rate: The performances of our method are far better than those of the other methods on MovieLens data set which has the most links in the experiment, with ProbS running a close second, while both UserCF and ARule performed significantly worse.When the length of recommendation is lower than 20, the performance of our method is at least twice as well as the other two methods.On the Netflix data set, our method consistently performs very well in terms of precision, coverage, and diversity.The precision of ProbS goes near to that of our method while its other performances are much worse.In addition, our method gets good comprehensive performance on Last.fm which is the sparsest data set in the experiment.When the recommendation list length is over 50, the precision of our method is lower than that of ARule, and coverage runs a close second.Furthermore, the consistency of its diversity could be rated as the best.The performances of the approximate model are shown in Figures 5 and 6.
Different metrics are drawn on different maps to show their tendency of coverage, as shown in Figure 5.For precision, with the increasing of data size, the missing rate declines more slowly and is leveled out in the end.The missing rate is controlled between 0 and 20%.The tendency of coverage is different from that of precision.The missing rate will increase, at first, while the transition occurs in the range of 5 to 10 (percentage).On the contrary, the missing rate of diversity is very high in the heat of the line.The lowest point is located in the range of 5 to 10 as well, and the missing rate is leveled out at the tail.Figure 6 shows the comprehensive performances of approximate model.Different comprehensive curve will be drawn depending on the parameters, and we kept the parameters fixed as Δ = 10,  = 0.50,  = 0.15, and  = 0.35 in the experiment.There are some differences between the two data sets on their comprehensive curve.The comprehensive curve of MovieLens data set is flat, and less lower values lay on the heat of the curve while slight oscillation comes on the tail.In contrast, the comprehensive curve of Netflix data set is like a hook.Considering the above-mentioned factors, the optimal value occurs near 10 (percentage).

Conclusions
In this paper, we proposed a method which makes use of users' behaviors to give recommendation.Instead of modeling with tags or contexts, our method takes the collecting lists to construct a recommendation model without the contents of items.As shown in the experiment, our method exhibits an all-round competitive performance on precision, coverage, and diversity, in comparison with four typical classes of recommendation algorithms.To reduce the computing complexity of our method, approximate model is also proposed in this paper, where the adjusting parameters are the determinant of performance curve of approximate model.As shown in the experiment, the approximate method is feasible since the optimal value is under 20%.When precision is considered to be the most important metric, it is   Fundamental Research Funds for the Central Universities (no.ZYGX2012J071), and Special Project of Sichuan Youth Science and Technology Innovation Research Team (no.2013TD0006).

Figure 1 :
Figure 1: Graphical model representation of our method.

Figure 5 :
Figure 5: The performances of approximate model.

Table 1 :
The basic statistical features of the three data sets.