The problem of the previous researches on personalized ranking is that they focused on either explicit feedback data or implicit feedback data rather than making full use of the information in the dataset. Until now, nobody has studied personalized ranking algorithm by exploiting both explicit and implicit feedback. In order to overcome the defects of prior researches, a new personalized ranking algorithm (MERR_SVD++) based on the newest xCLiMF model and SVD++ algorithm was proposed, which exploited both explicit and implicit feedback simultaneously and optimized the well-known evaluation metric Expected Reciprocal Rank (ERR). Experimental results on practical datasets showed that our proposed algorithm outperformed existing personalized ranking algorithms over different evaluation metrics and that the running time of MERR_SVD++ showed a linear correlation with the number of rating. Because of its high precision and the good expansibility, MERR_SVD++ is suitable for processing big data and has wide application prospect in the field of internet information recommendation.
As e-commerce is growing in popularity, an important challenge is helping customers sort through a large variety of offered products to easily find the ones they will enjoy the most. One of the tools that address this challenge is the recommender system, which is attracting a lot of attention recently [
Recently, collaborative filtering algorithm has been widely studied in both academic and industrial fields. The data processed by collaborative filtering algorithm are divided into two categories: explicit feedback data (e.g., ratings, votes) and implicit feedback data (e.g., clicks, purchases). Explicit feedback data are more widely used in the research fields of recommender system [
Examples of an explicit feedback matrix (a) and an implicit feedback matrix (b) for a recommender system.
Collaborative filtering algorithms also can be divided into two categories: collaborative filtering (CF) algorithms based on rating prediction [
The problem of the previous researches on personalized ranking is that they focused on either explicit feedback data or implicit feedback data rather than making full use of the information in the dataset. However, in most real world recommender systems both explicit and implicit user feedback are abundant and could potentially complement each other. It is desirable to be able to unify these two heterogeneous forms of user feedback in order to generate more accurate recommendations. The idea of complementing explicit feedback with implicit feedback was first proposed in [
In order to overcome the defects of prior researches, this paper proposes a new personalized ranking algorithm (MERR_SVD++), which exploits both explicit and implicit feedback and optimizes Expected Reciprocal Rank (
The rest of this paper is organized as follows: Section
In conventional CF tasks, the most frequently used evaluation metrics are the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE). Therefore, rating prediction (such as the Netflix Prize) has been the most popular method for solving the CF problem. Rating prediction methods are always regression based: they minimize the error of predicted ratings and true ratings. The simplest algorithm for rating prediction is
LTR is the core technology for information retrieval. When a query is input into a search engine, LTR is responsible for ranking all the documents or Web pages according to their relevance to this query or other objectives. Many LTR algorithms have been proposed recently, and they can be classified into three categories: pointwise, listwise, and pairwise [
In the pointwise approach, it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then the LTR problem can be approximated by a regression problem: given a single query-document pair, its score is predicted.
As the name suggests, the listwise approach takes the entire set of documents associated with a query in the training data as the input to construct a model and predict their scores.
The pairwise approach does not focus on accurately predicting the degree of relevance of each document; instead, it mainly cares about the relative order of two documents. In this sense, it is closer to the concept of “ranking.”
The algorithms about personalized ranking can also be divided into two categories: personalized ranking with implicit feedback (PRIF) [
In this paper, we use capital letters to denote a matrix (such as
Given that a matrix
If
SVD++ is a collaborative filtering algorithm unifying explicit and implicit feedback based on rating prediction and matrix factorization [
The feature matrix of users can be defined as
So the prediction formula of
In this section, we will firstly introduce our MERR_SVD++ model, then give the learning algorithm of this model, and finally analyze its computational complexity.
In practical applications, the user scans the results list from top to bottom and stops when a result is found that fully satisfies the user’s information need. The usefulness of an item at rank
Using the definition of ERR in [
In this paper, we define that
A toy example of the dataset that the users only gave explicit feedback. (a) denotes the explicit feedback dataset. (b) denotes the implicit feedback dataset.
A toy example of the dataset that contains both explicit feedback data and implicit feedback data. (a) denotes the explicit feedback dataset. (b) denotes the implicit feedback dataset, and the numbers in bold denote the dataset that users only gave implicit feedback.
So far, through the introduction of SVD++, we can exploit both explicit and implicit feedback simultaneously by optimizing evaluation metric ERR. So we call our model MERR_SVD++.
Note that the value of rank
Similar to other ranking measures such as RR, ERR is also nonsmooth with respect to the latent factors of users and items, that is,
Substituting (
Given the monotonicity of the logarithm function, the model parameters that maximize (
Based on Jensen’s inequality and the concavity of the logarithm function in a similar manner to [
We can now maximize the objective function (
The number of the feature Of iterations
%Index relevant items for user Initialize and %Update %Update
The published research papers in [
Here, we first analyze the complexity of the learning process for one iteration. By exploiting the data sparseness in
We use two datasets for the experiments. The first is the MovieLens 1 million dataset (ML1m) [
Just as has been justified by [
NDCG is another most widely used measure for ranking problems. To define
ERR is a generalized version of Reciprocal Rank (RR) designed to be used with multiple relevance level data (e.g., ratings). It has similar properties to RR in that it strongly emphasizes the relevance of results returned at the top of the list. Using the definition of ERR in [
Since in recommender systems the user’s satisfaction is dominated by only a few items on the top of the recommendation list, our evaluation in the following experiments focuses on the performance of top-5 recommended items, that is, NDCG@5 and ERR@5.
For each dataset, we randomly selected 5 rated items (movies) and 1,000 unrated items (movies) for each user to form a test set. We then randomly selected a varying number of rated items from the rest to form a training set. For example, just as in [
All the models were implemented in MATLAB R2009a. For MERR_SVD++, the value of the regularization parameter
In this section we present a series of experiments to evaluate MERR_SVD++. We designed the experiments in order to address the following research questions: Does the proposed MERR_SVD++ outperform state-of-the-art personalized ranking approaches for top-N recommendation? Does the performance of MERR_SVD++ improve when we only increase the number of implicit feedback data for each user? Is MERR_SVD++ scalable for large scale use cases?
We compare the performance of MERR_SVD++ with that of five baseline algorithms. The approaches we compare with are listed below: Co-Rating [ SVD++ [ xCLiMF [ CofiRank [ CLiMF [
The results of the experiments on the ML1m and the extracted Netflix datasets are shown in Figure
The performance comparison of MERR_SVD++ and baselines.
Compared to Co-Rating, which is based on rating prediction, MERR_SVD++ is based on ranking prediction and succeeds in enhancing the top-ranked performance by optimizing ERR. As reported in [
Hence, we give a positive answer to our first research question.
The influence of implicit feedback on the performance of MERR_SVD++ can be found in Figure
The influence of implicit feedback on the performance of MERR_SVD++.
With this experimental result, we give a positive answer to our second research question.
The last experiment investigated the scalability of MERR_SVD++, by measuring the training time that was required for the training set at different scales. Firstly, as analyzed in Section
Scalability analysis of MERR_SVD++ in terms of the number of users in the training set.
The observations from this experiment allow us to answer our last research question positively.
The problem of the previous researches on personalized ranking is that they focused on either explicit feedback data or implicit feedback data rather than making full use of the information in the dataset. Until now, nobody has studied personalized ranking algorithm by exploiting both explicit and implicit feedback. In order to overcome the defects of prior researches, in this paper we have presented a new personalized ranking algorithm (MERR_SVD++) by exploiting both explicit and implicit feedback simultaneously. MERR_SVD++ optimizes the well-known evaluation metric Expected Reciprocal Rank (ERR) and is based on the newest xCLiMF model and SVD++ algorithm. Experimental results on practical datasets showed that our proposed algorithm outperformed existing personalized ranking algorithms over different evaluation metrics and that the running time of MERR_SVD++ showed a linear correlation with the number of rating. Because of its high precision and the good expansibility, MERR_SVD++ is suitable for processing big data and can greatly improve the recommendation speed and validity by solving the latency problem of personalized recommendation and has wide application prospect in the field of internet information recommendation. And because MERR_SVD++ exploits both explicit and implicit feedback simultaneously, MERR_SVD++ can solve the data sparsity and imbalance problems of personalized ranking algorithms to a certain extent.
For future work, we plan to extend our algorithm to richer ones, so that our algorithm can solve the grey sheep problem and cold start problem of personalized recommendation. Also we would like to explore more useful information from the explicit feedback and implicit feedback simultaneously.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is sponsored in part by the National Natural Science Foundation of China (nos. 61370186, 61403264, 61402122, 61003140, 61033010, and 61272414), Science and Technology Planning Project of Guangdong Province (nos. 2014A010103040 and 2014B010116001), Science and Technology Planning Project of Guangzhou (nos. 2014J4100032 and 201510010203), the Ministry of Education and China Mobile Research Fund (no. MCM20121051), the second batch open subject of mechanical and electrical professional group engineering technology development center in Foshan city (no. 2015-KJZX139), and the 2015 Research Backbone Teachers Training Program of Shunde Polytechnic (no. 2015-KJZX014).