Recommendation systems are used when searching online databases. As such they are very important tools because they provide users with predictions of the outcomes of different potential choices and help users to avoid information overload. They can be used on e-commerce websites and have attracted considerable attention in the scientific community. To date, many personalized recommendation algorithms have aimed to improve recommendation accuracy from the perspective of vertex similarities, such as collaborative filtering and mass diffusion. However, diversity is also an important evaluation index in the recommendation algorithm. In order to study both the accuracy and diversity of a recommendation algorithm at the same time, this study introduced a “third dimension” to the commonly used user/product two-dimensional recommendation, and a recommendation algorithm is proposed that is based on a triangular area (TR algorithm). The proposed algorithm combines the Markov chain and collaborative filtering method to make recommendations for users by building a triangle model, making use of the triangulated area. Additionally, recommendation algorithms based on a triangulated area are parameter-free and are more suitable for applications in real environments. Furthermore, the experimental results showed that the TR algorithm had better performance on diversity and novelty for real datasets of MovieLens-100K and MovieLens-1M than did the other benchmark methods.
With the rapid development of the Internet and e-commerce, which have tremendous impacts on our lifestyles, the way that information is accessed has been changed. On the one hand, hundreds of millions of products are available online, making life much more convenient [
A variety of personalized recommendation algorithms have been proposed previously [
Essentially, all of these methods make recommendations for users by studying vertex similarity and focus primarily on recommendation accuracy, and therefore recommendation diversity is relatively poor. However, the latest study found that although the most popular methods do not achieve very high accuracy [
New developments in the e-commerce have taken place at the two big Chinese companies (Alibaba and Jingdong). For example, these companies are now developing vigorously their offline entities to operate online. The traditional two-dimensional recommendations (user and product model) cannot fully accommodate the roles of location and high diversity in that recommendation model. The present paper proposes a data model that introduces a “third dimension” into the traditional two-dimensional user/product recommendation relationship. A recommendation algorithm based on triangulated area is designed by using the third-dimensional data relationship. In this algorithm, the relationship weighs each of the two relationships in the three-dimensional model and uses the relationship weights between each of the two relationships as the side lengths of a triangle and then the triangular area is calculated using Heron’s formula. As each triangle corresponds to a product, recommendations can be made for the target user according to the sorted results of triangular areas, from large to small.
The triangle recommendation algorithm that is proposed in this paper has been tested on two real datasets. The results show that TR was more effective than other benchmark methods in terms of recommendation diversity. In addition, the parameter-free character of TR is an important feature as it is, therefore, easier to apply to reality than some parameter-based benchmark methods. The primary focus of the present work was not limited to overfocusing on recommendation accuracy, and therefore recommendation diversity increased dramatically.
The rest of this paper is organized as follows. In the second part, a detailed description is given of how to build a triangle model and how to apply the model to recommendation systems, and the diversity analysis of the triangle model is explained. In the third section, the experimental data are presented and the performance evaluation standards of the recommendation algorithm are introduced. The fourth section shows the performance of the proposed method on a public dataset and compares its performance with other benchmark methods. Finally, the work is summarized and an indication of the direction for future work is provided.
A user-object network without weights and directions
Recommendation accuracy and diversity are studied when evaluating the performance of recommendation algorithms. In order to achieve recommendations with high diversity, a “third dimension (
These schematics illustrate the relationship between any of the two dimensions in the three dimensions of the proposed recommendation algorithm.
The composition of the three dimensions is
In this investigation, the relationship between two dimensions was not studied directly from the perspective of their similarity. In the user/object bipartite graph, the relationship weightings are calculated by combining the
The construction of three-dimensional space. The relationships between any of the two dimensions in the three-dimensional model are treated as the three sides of a triangle, and
Firstly, the calculation of weightings
Because the sameness between
The data of
The cluster of
The records of each user in
No specific study was available of a time-based dynamic model as a reference [
The initial behavior vector
Thus, the final result of
Next, in the other two relationships, which are the relationship weights of the object and the category
In addition, the weight of the user and category
Each triangle contains an object. As all the three side lengths are known, the triangle area can be obtained according to Heron’s formula as follows:
To make sure the three weights can construct a triangle,
When increasing
It is important to note that the main purpose of the proposed recommendation algorithm based on the triangle model is to improve the diversity of the recommendation. Thus, when correcting the area after increasing
Processing of the area correction for the case of decreasing
It should be noted that to ensure recommendation accuracy, the minimum value of the denominator in (
In the construction of a triangle model, there will be cases when the three relationship weightings cannot meet the requirements for constructing a triangle. When
Considering
Let the function
Then,
It can be obtained from the formula above that the user’s behavior will not be affected when
Because there is no direct information representing the relationship
If
Then,
Hence,
It can be concluded from the above formula that changes in
This algorithm introduces the “third dimension” to the traditional two-dimensional recommendation algorithm and proposes a method based on the triangular area to improve the diversity of the recommendation algorithm. In that case, the question arises: Is the diversity performance getting better with the increase of dimensions? To this end, this section of the manuscript presents an analysis using the “squeeze theorem” that is used to determine whether limits exist and demonstrates the method of equation-solving in linear algebra. The deterministic problem of diversity in two dimensions, four dimensions, and higher dimensions is analyzed.
The triangular recommendation algorithm has better diversity performance than the two-dimensional recommendation algorithm.
The existing recommendation algorithms have introduced various methods from various fields and then the bipartite graph is used to describe how to make recommendations for the target users. These algorithms mainly focus on recommendation accuracy and have poor performance in diversity and novelty. When so implemented, the most popular recommendation algorithm does not have higher accuracy but it has better performance in the diversity and novelty [
The
The schematic represents the contents in (
Thus, at least
Four-dimensional or even higher dimensional recommendation algorithms can be represented by the triangle recommendation algorithm.
Assume a diversity recommendation algorithm is constructed with user, object,
As the quadrilateral formed by four points varies, that is, it may not necessarily be a regular figure, so the area of the quadrilateral is denoted by
It is not difficult to see that
(1) When
(2) When
To sum up, in the two cases discussed above, the “squeeze theorem” can be used to illustrate the importance of adding a “third dimension” into the two-dimensional relationships. In other words, the purpose is to improve the diversity of the recommendation, but the accuracy of the recommendation cannot be abandoned completely. In the two-dimensional relationship, it is difficult to achieve good recommendation diversity; in the four-dimensional relationship, as illustrated by the discussion above, it is not necessary to have the fourth dimension. Hence, all cases for higher dimensions can be transformed into the research of several three-dimensional relationships and there is no need for higher multidimensional models. In the three-dimensional relationship, it not only retains the recommendation accuracy in the two-dimensional relationship but also improves the recommendation diversity. Thus, the method proposed in this paper, which is adding the third dimension to the two dimensions, is effective in improving diversity.
In the research of traditional recommendation algorithms, there has never been a research method working with three dimensions. Therefore, this paper has difficulties in the data collection. Finally, two commonly used real datasets were selected for the purposes of this paper, which were the MovieLens-100K dataset and the MovieLens-1M dataset. These often are used as the database for different recommendation algorithms. The MovieLens datasets were provided by the Group Lens project at the University of Minnesota. The dataset uses a 5-point rating, and the higher the score, the better the data. In constructing the bipartite model, only data with a rating greater than or equal to 3 were considered. After coarse-graining, the small dataset contains 82520 links and the large one contains 836478 links. It should be noted that the bipartite model has no weighting in the following analysis. In other words, the rating is ignored. The basic statistics of the dataset are shown in Table
The basic statistics for two real online rating datasets.
Data | Users | Objects | Links | Sparsity |
---|---|---|---|---|
MovieLens-100K | | | | |
MovieLens-1M | | | | |
In order to evaluate the performance of recommendation algorithms in practical applications, cross-validation usually is used to assess how the results are extended to independent datasets [
In previous literature, extensive research has been done on how to evaluate the performance of recommendation algorithms [
Accuracy is one of the important indices for evaluating the quality of recommendation algorithms. First, the AUC (area under the ROC curve) [
Then, three
The Precision index is defined as the ratio of the number of common objects that appear in the recommended list and testing set to the length of the recommendation list. For all the users, the average Precision is defined as
Recall is also defined as a ratio
In the personalized recommendation algorithm, diversity is an important index to evaluate the diversity of the objects recommended by the algorithm. As it is difficult to obtain external sources of object similarity information, the diversity measure usually is based on the evaluation matrix. The Intersimilarity is one of the widely used diversity indices and it can be quantified by Hamming distance [
Novelty is an important index, which aims to quantify the ability of an algorithm to produce novel (i.e., unpopular or unwelcome) and unexpected results. Here, novelty is quantified using the average popularity of the recommended object. It can be defined as
The recommendation algorithm based on triangle area was applied to two real online rating datasets. In order to facilitate the comparison, some existing recommendation algorithms also were considered, including global ranking (GR) [
In GR, from the perspective of item degree, all items are sorted in descending order according to their degrees and then items with high degrees are recommended to the target user [
The results of the seven evaluation metrics are shown in Table
The values of the seven evaluation metrics after applying the different recommendation algorithms on the two datasets.
AUC | MAP | P | R | H | I | N | |
---|---|---|---|---|---|---|---|
| |||||||
GR | 0.863 | 0.208 | 0.058 | 0.358 | 0.395 | 0.408 | 255 |
UCF | 0.887 | 0.315 | 0.070 | 0.476 | 0.550 | 0.394 | 242 |
ICF | 0.888 | | 0.073 | 0.494 | 0.674 | 0.413 | 211 |
MD | 0.898 | 0.325 | 0.075 | 0.527 | 0.618 | 0.355 | 230 |
CosRA | | 0.380 | | | 0.724 | 0.335 | 204 |
TR | 0.6105 | 0.0482 | 0.0446 | 0.3196 | | | |
| |||||||
| |||||||
GR | 0.856 | 0.144 | 0.053 | 0.222 | 0.403 | 0.415 | 1660 |
UCF | 0.872 | 0.176 | 0.061 | 0.263 | 0.458 | 0.415 | 1640 |
ICF | 0.885 | | 0.072 | 0.314 | 0.629 | 0.404 | 1445 |
MD | 0.885 | 0.188 | 0.066 | 0.297 | 0.504 | 0.403 | 1618 |
CosRA | | 0.223 | | | 0.598 | 0.387 | 1541 |
TR | 0.5717 | 0.0447 | 0.0334 | 0.2940 | | | |
The performance comparison between TR and HC in seven evaluation metrics.
AUC | MAP | P | R | H | I | N | |
---|---|---|---|---|---|---|---|
| |||||||
HC | | 0.037 | 0.021 | 0.123 | 0.858 | 0.056 | 23 |
TR | 0.6105 | | | | | | 138 |
| |||||||
| |||||||
HC | | | | 0.162 | | 0.045 | |
TR | 0.5717 | 0.0447 | 0.0334 | | 0.8043 | | 531 |
By comparing TR and HC with other recommendation algorithms, it is evident that these two algorithms have better performance than other algorithms in terms of diversity and novelty. However, the performances of these two algorithms are inferior to other algorithms in terms of accuracy. Of the seven evaluation criteria, for the dataset MovieLens-100K, HC is only better than TR in terms of AUC value and the novelty value. TR is better than HC from the perspective of accuracy. For the dataset MovieLens-1M, the performance difference of HC and TR in diversity and novelty is not significant. However, the performance difference in accuracy is very evident. This is because HC studies the similarity between items from the perspective of degree and decides how to allocate resources accordingly. This is helpful in improving accuracy. There is no effective way to achieve the best performance in all the three aspects of accuracy, diversity, and novelty at the same time [
From Tables
In this study, research of the recommendation algorithm differs from the traditional two-dimensional recommendation algorithm. This was the research focus of the investigation. Due to the rapid development of the Internet, the volume of available information has increased dramatically. Thus, information overload has become an unavoidable problem, which needs to be solved urgently. If too much attention is paid to the accuracy of the recommendation, it will bring trouble for users, and the users may feel that a recommendation is unnecessary. Therefore, the focus of the present study was recommendation diversity. A “third dimension” was introduced alongside the two normal dimensions, and the relationship between any of the two dimensions in the three-dimensional model is studied and quantified. The three dimensions then were treated as three sides of a triangle, and each triangle corresponded to an object. The area of the triangle was calculated using Heron’s formula. Finally, the recommendation list for target users was obtained according to the triangle area in descending order. In the second part of the investigation, the effectiveness of the three-dimensional recommendation was illustrated by the method of solving multivariate equations and application of the “squeeze theorem” that is used to decide whether a limit exists. The experimental results showed also that the proposed algorithm performs well in terms of recommendation diversity.
In the realization of the proposed recommendation algorithm, user behaviors are clustered and processed by combining the
The authors declare that they have no conflicts of interest.
This work was funded partially by the Open Fund of Key Laboratory of the Ministry of Education (Grant 13zxzk01) and the Digital Media Science Innovation Team of CDUT (Grant 10912-kytd201510).