A Novel Preferential Diffusion Recommendation Algorithm Based on User ’ s Nearest Neighbors

Recommender system is a very efficient way to deal with the problem of information overload for online users. In recent years, network based recommendation algorithms have demonstrated much better performance than the standard collaborative filtering methods. However, most of network based algorithms do not give a high enough weight to the influence of the target user’s nearest neighbors in the resource diffusion process, while a user or an object with high degree will obtain larger influence in the standard mass diffusion algorithm. In this paper, we propose a novel preferential diffusion recommendation algorithm considering the significance of the target user’s nearest neighbors and evaluate it in the three real-world data sets: MovieLens 100k, MovieLens 1M, and Epinions. Experiments results demonstrate that the novel preferential diffusion recommendation algorithm based on user’s nearest neighbors can significantly improve the recommendation accuracy and diversity.


Introduction
With the rapid development of Internet in the past years, the amount of online information increases at an exponential speed, which leads to information overload problem.When faced with vast amount of information, we can hardly find the valuable information accurately and quickly.The personalized recommender system is one of the most effective tools to resolve this problem, and it also can help enterprises make the users' potential demand a realistic demand [1,2].
To date, various recommendation methods have been proposed and developed.One of the most successful recommender system methods is based on the collaborative filtering technique [3][4][5].Recently, some physical methods, such as mass diffusion [6][7][8][9] and heat conduction [10,11], have found applications in personalized recommendation.Standard mass diffusion algorithm applied the three-step mass diffusion starting from the target user on a user-object bipartite network, which accurately outperforms the standard collaborative filtering methods [1].Many different bipartite network based methods [12] are proposed to achieve even better recommendation performance.In [6], Zhou et al. proposed a hybrid method by combining the mass diffusion and heat conduction to solve the apparent diversity-accuracy dilemma of recommender systems.Motivated by enhancing the preferential diffusion algorithm's ability to find unpopular and niche objects, the preferential diffusion has been designed in [9].Moreover, Zhang and Zeng proposed a strategy to adding some virtual connections to the networks, which is useful to deal with the cold start problem in recommender system [13].
However, all these methods do not give a high enough weight to the influence of the target user's nearest neighbors in the resource diffusion process.As we all know, birds of a feather flock together.The user's nearest neighbors are the ones who have similar taste with the given user.Therefore we introduce a novel preferential diffusion recommendation algorithm considering the significance of the target user's nearest neighbors in the diffusion process.

Standard Mass Diffusion Recommendation Algorithm.
As is shown in Figure 1, the standard mass diffusion (SMD) algorithm is equivalent to a three-step random walk process.At first, objects in the bipartite network are assigned an initial resource , with   = {  1 ,   2 , . . .,    , . . .,    } for the target user .For simplicity, if an object is collected by the user , its initial resource is assigned to be 1, otherwise it is assigned to be 0.That is to say, the initial resource vector  can be written as Then, each object's resource is redistributed to the user who has collected the object averagely, and the user's resource is the sum of the resources received from objects.At last, each user's resource was reallocated to the objects which he has collected averagely.The final score of the object's resource can be calculated via the transformation   = , where  is the resource transfer matrix.
where   is the degree of the object  and   is the degree of the user .

The Novel Preferential Diffusion
where  is the nearest neighbors set of the target user .In Figure 2,  1 = (3, 1, 2, 2, 0).But only the objects which the target user  has selected can distribute the resources to users and then redistribute them via the transformation where  is the same as (2).In Figure 2,  2 = (2, 0.5, 1, 1.5, 0).Finally, we use the linear combination the resources vectors  1 and  2 to get the last objects' resources vector .That is to say, where  is a variable parameter from 0 to 1.

Data and Metrics
3.1.Data.To test the algorithmic performance, we use three benchmark data sets as shown in Table 1.The sparsity of these data sets is shown in the last column of Table

Metrics.
There has been considerable research in the area of recommender systems evaluation.Accuracy is the most important aspect in evaluating the recommendation algorithmic performance.In this paper, we use ranking score [8] to measure the ability of a recommendation algorithm to generate a ranking list of the target user's uncollected objects that matches the users' preference.For the target user   , the recommendation algorithm will return   a ranking list of all his unselected objects and, according to   , if   has selected the object   and   is at   th place in the ranking list, we say the position of   is  where  is the number of his unselected objects.We obtain the mean value of all the user-object ranking scores in   ; namely, Clearly, the larger the ranking score, the lower the algorithm's accuracy and vice versa.
In the practical recommender system, we may consider the number of objects that users like in the recommendation list.Therefore, we take another accuracy metric called precision.For a target object   and user   , there are four cases in the recommender system.The first is that the recommender system recommended the object and user likes it.The second is that recommender system recommended the object but the user does not like it.The third is that the user likes the object but the recommender system did not recommend it.Finally is the case that the user does not like the object and the recommender system did not recommend it.As is shown in Table 2,   ,   ,   , and   denote the number of the objects in the four cases.
For a target user   , the precision of recommendation   () is defined as We obtain the mean precision () of all the users in the recommender system.Besides accuracy, diversity is taken into account as another important aspect to evaluate the recommendation algorithm.There are two kinds of diversity.One is called intrauser-diversity [17]; the other is called interuser-diversity [18].In this paper, we consider the interuser-diversity.It considers the different objects between users in the recommendation list.For two users   and   , the differences can use be measured by the Hamming distance [18]: where   () is the number of common objects between   and   in the recommendation list and  is the length of the recommendation list.Clearly, if   and   have the same recommendation list,   () = 0, while if the recommendation lists are completely different,   () = 1.
In reality, it has been found that a recommender system which has a high accuracy might not be satisfied by the users [19].For example, for a film website, recommending the popular films to the users may not always be the best recommendation, because users might have already seen those films in other ways.A good recommender system can find the objects that match the users' preferences and are unlikely to be already known.As a result, the novelty is also often used in evaluating the recommendation algorithmic performance.
The average degree of objects in the recommendation list is widely used to identify the novelty of a recommender system [20], which is defined by where  is the number of users,    is the recommendation list for user   , and    is the degree of the object   .

Results and Discussion
In our first set of experiments, we compare the ranking score of the NNMD algorithm under different  and top  ( is the number of the target user's nearest neighbors) with that of the SMD algorithm.The results on MovieLens 100k, MovieLens 1M, and Epinions data are reported in Figure 3.
Clearly, we can see that in MovieLens 100k and MovieLens 1M, with the increase of , the rank score is smaller and smaller; that is to say, the recommendation accuracy is getting better and better.However, when  is more than 30, the change of rank score is very small.Moreover, as long as  is not equal to 0 or 1, the rank score of our method is better than that of the SMD algorithm.It is interesting to note that the optimal parameters of our method are the same in MovieLens 100k and MovieLens 1M, which are  = 50 and  = 0.9, while, in Epinions, the improvement of the rank score is not significant.When  is greater than 0 or  is greater than 20 the rank score of the NNMD algorithm is a little worse than that of the SMD algorithm, and, with the change of  and , the rank scores of the two algorithms are almost the same.But when  is less than 20 and  = 0, the rank score of our method is getting better than that of the SMD algorithm.
Clearly, we can get the optimal parameters  = 10 and  = 0 in Epinions.
Then we examined the performance in precision, interuser-diversity, and novelty of our novel algorithm at the optimal parameters  and .Summaries of the results for all algorithms and metrics on MovieLens 100k, MovieLens 1M, and Epinions data sets are shown in Table 3.The optimal parameters are subject to the lowest ranking score.The other three metrics, namely, precision, interuser-diversity, and novelty, are obtained at the optimal parameters.Clearly, the NNMD algorithm outperforms the SMD algorithm over all four evaluation metrics.
The comparison of precision between NNMD and SMD in three data sets under different length of recommendation list is shown in of the NNMD algorithm is better than that of the NMD algorithm in all the three data sets and it has a very significant improvement in MovieLens 100k and MovieLens 1M.That is to say, our method can recommend objects for users more accurately.
Figure 5 shows the comparison of interuser-diversity between our method NNMD and SMD in three data sets under different length of recommendation list.It clearly shows that interuser-diversity of our NNMD algorithm is better than that of the SMD algorithm in all the three data sets, especially in MovieLens 100k and MovieLens 1M.In other words, the objects in the recommendation list of our method are more different between users.
Figure 6 shows the comparison of novelty between our method NNMD and SMD in three data sets under different length of recommendation list.It clearly indicates that the novelty of our method is much better than the SMD in Movie-Lens 100k and MovieLens 1M, while, in Epinions, the results of the two algorithms are very similar, but our method also has a little improvement than that of the SMD algorithm.
In summary, the recommendation performance of our method is better than that of the standard mass diffusion.In particular, the precision of our method increases an average of 13.27% percent compared to that of the SMD in MovieLens 100k and increases an average of 35.9% percent in MovieLens 1M and increases an average of 4.47% percent in Epinions.Although the improvement of the algorithmic performance in some aspects is not significant in Epinions data set, the reason may be that the data is so sparse that the novel algorithm cannot get the proper user's nearest neighbors and it affects our algorithmic performance.

Conclusion and Future Work
Most of network based recommendation algorithms have a tendency to recommend popular objects to the users [1] because the object with high degree has a significant influence in the resource diffusion process.In this paper we propose a novel preferential diffusion recommendation algorithm based on user's nearest neighbors which give a high weight to the influence of the target user's nearest neighbors in the resource diffusion process.Experimental results based on MovieLens 100k, MovieLens 1M, and Epinions data set show that making a suitable adjustment in the parameter  or the size of the user's nearest neighbors set can help recommendation algorithm get a better recommendation performance.It can not only provide more accurate recommendations but also generate more diverse and novel recommendations.
For future work, we intend to consider the level of rating between user and his nearest neighbors.Moreover, we will use the trust data [21,22] in the network, because it can be used to find the nearest neighbors more accurately in high sparse data set, and it may have a better recommendation performance.

Figure 1 :
Figure 1: Standard mass diffusion algorithm at work on the bipartite user-object network.Users are shown as circles and objects are squares.The target user is indicated by the black circle.

Figure 2 :
Figure 2: A novel preferential diffusion algorithm by user's nearest neighbors at work on the bipartite user-object network.Users are shown as circles and objects are squares.The target user is indicated by the black circle and the nearest neighbors of the target user are the circles which have a letter "" in them and  is the variable parameter.

Figure 3 :
Figure 3: The overall ranking score of the NNMD algorithm under different  and  in MovieLens 100k, MovieLens 1M, and Epinions data set and the ranking score of the SMD algorithm in MovieLens 100k, MovieLens 1M, and Epinions.

Figure 4 .Figure 4 :Figure 5 :
Figure 4: The precision of NNMD and SMD algorithm in MovieLens 100k, MovieLens 1M, and Epinions under different length of recommendation list.The parameters for the NNMD algorithm are  = 50 and  = 0.9 in MovieLens 100k and MovieLens 1M while they are  = 10 and  = 0 in Epinions.

Figure 6 :
Figure 6: The novelty of NNMD and SMD algorithm in MovieLens 100k, MovieLens 1M, and Epinions under different length of recommendation list.The parameters for the NNMD algorithm are  = 50 and  = 0.9 in MovieLens 100k and MovieLens 1M, while they are  = 10 and  = 0 in Epinions.
is the Jaccard similarity between user  and user  and   and   are the user neighbors set of user  and user , respectively.Then we can get the objects' initial resource denoted by the vector  1 , with   1 = {  11 ,   12 , . . .,   1 , . . .,   1 } for the target user . 1 can be written as

Table 1 :
Basic properties of the three data sets and the sparsity is defined as /(    ).

Table 2 :
The four cases of the unselected objects of the target user in the recommender system.

Table 3 :
Algorithmic performance for MovieLens 100k, MovieLens 1M, and Epinions data.The precision, interuser-diversity, and novelty are corresponding to  = 20.The parameters for NNMD are  = 50 and  = 0.9 in MovieLens 100k and MovieLens 1M, while, in Epinions, the parameters are  = 10 and  = 0.The entries corresponding to the best performance over all methods are emphasized in bold.