Recommender systems are widespread due to their ability to help Web users surf the Internet in a personalized way. For example, collaborative recommender system is a powerful Web personalization tool for suggesting many useful items to a given user based on opinions collected from his neighbors. Among many, similarity measure is an important factor affecting the performance of the collaborative recommender system. However, the similarity measure itself largely depends on the overlapping between the user profiles. Most of the previous systems are tested on a predefined number of common items and neighbors. However, the system performance may vary if we changed these parameters. The main aim of this paper is to examine the performance of the collaborative recommender system under many similarity measures, common set cardinalities, rating mean groups, and neighborhood set sizes. For this purpose, we propose a modified version for the mean difference weight similarity measure and a new evaluation metric called users’ coverage for measuring the recommender system ability for helping users. The experimental results show that the modified mean difference weight similarity measure outperforms other similarity measures and the collaborative recommender system performance varies by varying its parameters; hence we must specify the system parameters in advance.
Today, Web users face an abundance of choices when they surf the Web. Hence, recommender systems (RSs) as many Web personalization tools become necessary to offer Web users personalized items they may like. These systems become available in many Web sites that cover social networks, e-commerce, e-business, e-tourist, and many others [
Basically, RS compares users based on a suitable similarity measure which plays an important rule for the success of the whole system. However, different similarity measures often lead to different sets of neighbors for a given active user. A good similarity measure will produce a close set of neighbors for a given active user [
This motivates us to study the cardinality effect of the common set on the performance of different similarity measures for collaborative recommender systems. The proximity between two users based upon a single commonly rated item is surely weaker than that of 20 common rated items. Moreover, the second case is more reliable because close sets of neighbors are guaranteed. This paper studies the effect of three parameters, namely, cardinality of the common rated items, the rating mean group, and the number of neighbors on the performance of the collaborative recommender system. The contributions of this paper are threefold: The notion of users’ coverage is introduced as opposed to items’ coverage. We proposed a modified version for mean difference weights similarity measure. Our experiments are implemented on both synthetic and real data sets.
The rest of this paper is organized as follows: a literature review is given in Section
Many papers have discussed and proposed many similarity measures, but they fixed the lowest number of the common items in advance and examined their proposals based on that predefined number [
Usually, active users correlate highly with neighbors having very small number of corated items. These neighbors are terrible predictors because they were based on tiny samples of common items. Authors of [
Vozalis and Margaritis [
Many types of recommender systems are proposed based on the way they build user models and their work principle [
Formally, CRS has
During similarity computation phase, the RS matches the active user to the available database of the training users according to a suitable similarity measure. This value is a measure of how closely two users resemble each other. Once similarity values are computed, the system ranks users according to their similarity values with the active user to extract a set of neighbors for him. After that the CRS assigns a predicted rating to all the items seen by the neighborhood set and not by the active user. The predicted rating,
Similarity computation is the third phase for building a recommender system. Obviously, the accuracy and the reliability of this phase rely largely on the two phases below it. This paper concentrates on the similarity computation phase and assumes that all remaining phases are fixed and accurate except changing the number of neighbors for some experiments. For similarity computation, many similarity measures are used in literature and this paper will examine only three of them. The first similarity measure is Pearson correlation coefficient (PCC) [
PCC computes the similarity between two users based on the common ratings,
The second similarity measure we examined is called cosine similarity measure [
Again, the common set is the core for this calculation. A third similarity measure is called mean difference weights (MDW) similarity measure proposed by Bobadilla et al. [
We took into account the point raised by [
The mean difference weights similarity measure does not take the user’s mean into consideration because it was proposed for learning algorithms like GA. However, a correction factor based on the rating means of the two users in consideration can be added when we fix the weights and rely on the direct calculations without a learning algorithm. In this paper, we propose the following correction factor:
The modified mean difference weights similarity measure is simply formula (
Because of the correction factor, this measure does not give high similarity value for users with one common item if their rating means are different. This similarity measure is called modified mean difference weights (MMD).
By increasing the cardinality of the common set, we expect that the recommender system could not help all the active users. Therefore, we have to measure the system ability to help the intended users through a measure we call it users’ coverage or penetration, which is different from that of items’ coverage (this will be discussed later). This metric is defined by the following definition.
The users’ coverage of a given recommender system with a minimum predefined cardinality of the common set is the number of users benefitting from the recommender system (who can get neighbors and hence predictions) over all the active users of the system:
This measure helps us to study the effect of increasing the cardinality of the common set on the usability of the system. Low value of the users’ coverage means that the system could not help many users because they have low overlapping between them.
We construct a sample dataset in Table It should cover many cardinalities of the common set. Therefore, we take many values 1, 2, 5, 8, and 10. It should cover three rating mean groups (small, medium, and high). The sample data is arranged such that one opposite-minded user and three users with different rating means (low, medium, and high) are available for each cardinality of the common set. The last two users represent opposite-minded users to the active user with two different common items, 8 and 10. For each rating mean group, the user with the bigger cardinality of the common set inherits the same items of the user with the lower cardinality to see the effect of increasing the cardinality of the common set without changing the previous set of items.
Sample dataset for 19 users with 12 items.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rating mean | Similarity value | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PCC | COS | MWD | MMD | |||||||||||||||
|
— |
|
|
|
|
|
|
|
|
|
|
|
|
3.0 |
|
|
|
|
|
1 | 0 | 0 | 5 | 0 | 4 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 4.00 | 1.00 | 1.00 | 1.00 | 0.75 |
|
1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1.33 | 1.00 | 1.00 | 1.00 | 0.444 |
|
1 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 2 | 3 | 0 | 0 | 0 | 2.67 | 0.00 | 1.00 | 0.50 | 0.444 |
|
1 | 0 | 0 | 1 | 0 | 4 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 2.67 | −1.00 | 1.00 | −1.00 | −0.89 |
|
||||||||||||||||||
|
2 | 2 | 0 | 5 | 0 | 4 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 3.50 | 0.707 | 0.987 | 0.75 | 0.640 |
|
2 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 1.50 | 0.000 | 0.949 | 0.75 | 0.375 |
|
2 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 2 | 4 | 0 | 2 | 0 | 2.75 | 0.707 | 0.981 | 0.75 | 0.688 |
|
2 | 3 | 0 | 1 | 0 | 4 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 2.75 | −0.99 | 0.759 | 0.00 | 0.00 |
|
||||||||||||||||||
|
5 | 2 | 0 | 5 | 0 | 5 | 0 | 4 | 2 | 5 | 0 | 3 | 0 | 3.71 | 0.451 | 0.975 | 0.60 | 0.485 |
|
5 | 2 | 1 | 0 | 0 | 1 | 0 | 0 | 3 | 2 | 1 | 0 | 2 | 1.71 | 0.311 | 0.937 | 0.70 | 0.400 |
|
5 | 4 | 0 | 0 | 3 | 3 | 0 | 0 | 2 | 4 | 0 | 2 | 3 | 3.00 | 0.236 | 0.914 | 0.50 | 0.500 |
|
5 | 3 | 0 | 1 | 0 | 4 | 0 | 1 | 4 | 3 | 0 | 3 | 0 | 2.71 | −0.86 | 0.727 | 0.00 | 0.00 |
|
||||||||||||||||||
|
8 | 2 | 2 | 5 | 4 | 5 | 3 | 4 | 2 | 5 | 0 | 3 | 0 | 3.50 | 0.612 | 0.973 | 0.625 | 0.536 |
|
8 | 2 | 1 | 0 | 0 | 1 | 3 | 2 | 3 | 2 | 1 | 1 | 2 | 1.80 | 0.512 | 0.922 | 0.50 | 0.300 |
|
8 | 4 | 0 | 0 | 3 | 2 | 2 | 4 | 2 | 3 | 2 | 2 | 3 | 2.70 | 0.369 | 0.927 | 0.50 | 0.450 |
|
8 | 3 | 5 | 1 | 2 | 4 | 2 | 1 | 4 | 3 | 0 | 3 | 0 | 2.80 | −0.92 | 0.681 | −0.13 | −0.12 |
|
||||||||||||||||||
|
10 | 3 | 5 | 1 | 2 | 0 | 2 | 1 | 3 | 0 | 4 | 4 | 5 | 3.00 | −1.00 | 0.636 | −0.20 | −0.20 |
|
||||||||||||||||||
|
8 | 0 | 5 | 1 | 2 | 0 | 2 | 1 | 0 | 0 | 4 | 4 | 5 | 3.00 | −1.00 | 0.565 | −0.50 | −0.50 |
The similarity values between the active user and the training users of the sample dataset are listed in Table The similarity value of Three similarity measures (PCC, COS, and MWD) give a full positive similarity value for PCC with only one common item will give always the maximum similarity value irrespective of the rating mean values of the two users. This is because the numerator of the formula always equals the denominator for this case. MWD gives the same similarity value, 0.75, for
PCC can identify opposite-minded users easily where it gives −1 similarity value for The results show that COS cannot identify opposite-minded users where it gives high similarity values 0.636 and 0.565 for
To get an overall view for each rating mean group, Table Usually, similarity values decrease as we increase the cardinality of the common set. This is because of the increased number of items in the user profile. PCC is very sensitive to the cardinality of the common set because it calculates the rating deviation from the rating mean not the rating itself. This deviation spans negative and positive values and becomes more variable as we increase the common items. Hence, it gives different similarity values for different common sets and these values decrease by increasing the cardinality of the common set. Accordingly, many users may get unfair similarity values, hence increasing or reducing their contribution to the active user based on these values. PCC performance with opposite-minded users is reasonable as it always gives negative values with a low deviation value. That means PCC has good capabilities for capturing opposite-minded users and hence can easily prevent opposite-minded users from being neighbors for a given active user. COS is less sensitive to the common set cardinality. It always gives high values even for opposite-minded users like group four in Table MWD and MMD give similarity values that have less deviation than that of PCC. The weakness of these two measures lies in their ability for capturing opposite-minded users. They give zero similarity values for
Results classification according to the rating mean groups.
Sequence of users | Similarity measure | Sequence of similarity values | Maximum difference |
---|---|---|---|
|
PCC | 1 → 0.707 → 0.451 → 0.612 | 0.549 |
COS | 1 → 0.987 → 0.975 → 0.973 | 0.027 | |
MWD | 1 → 0.75 → 0.6 → 0.625 | 0.4 | |
MMD | 0.75 → 0.64 → 0.485 → 0.536 | 0.265 | |
|
|||
|
PCC | 1 → 0 → 0.311 → 0.512 | 1 |
COS | 1 → 0.949 → 0.937 → 0.922 | 0.078 | |
MWD | 1 → 0.75 → 0.7 → 0.5 | 0.5 | |
MMD | 0.444 → 0.375 → 0.4 → 0.3 | 0.144 | |
|
|||
|
PCC | 0 → 0.707 → 0.236 → 0.369 | 0.707 |
COS | 1 → 0.981 → 0.914 → 0.927 | 0.086 | |
MWD | 0.5 → 0.75 → 0.5 → 0.5 | 0.25 | |
MMD | 0.444 → 0.688 → 0.5 → 0.45 | 0.244 | |
|
|||
|
PCC | −1 → −0.99 → −0.86 → −0.92 | −0.14 |
COS | 1 → 0.759 → 0.727 → 0.681 | 0.319 | |
MWD | −1 → 0 → 0 → −0.13 | −1 | |
MMD | − 0.89 → 0 → 0 → −0.12 | −0.89 |
This section discusses the methodology of choosing the dataset for our experiments, the way of dividing the dataset into training and test subsets, and the metrics used for evaluating the system performance. The selected dataset should reflect different mean groups and different cardinalities of the common set to study both effects. The following subsections analyze the MovieLens dataset and select the experiments dataset for this paper.
One million MovieLens dataset consists of 1000209 ratings assigned by 6040 users on 3900 movies [
Some statistics of 1 million MovieLens dataset.
Category | Specifications | Number of ratings | Number of users | Percentage | Total percentage | |
---|---|---|---|---|---|---|
Ratings range | Mean range | |||||
C11 | |
|
565 | 13 | 0.2152 | 0.596 |
C21 | |
25256 | 15 | 0.2483 | ||
C31 | >500 | 130686 | 8 | 0.1325 | ||
|
||||||
C12 | |
|
2759 | 521 | 8.6258 | 21.2582 |
C22 | |
142875 | 580 | 9.6026 | ||
C32 | >500 | 409259 | 183 | 3.0298 | ||
|
||||||
C13 | |
|
5127 | 2597 |
|
78.1457 |
C23 | |
137292 | 1918 |
|
||
C33 | >500 | 146390 | 205 |
|
The results of Table
We conduct our experiments using an elected dataset from the 1 M MovieLens dataset [
For experiments, we use leave-one-out cross-validation, which uses each time one user of the dataset as the test user and the remaining users as the training users. Thus each user is used the same number of times for training and once for testing. Thus the number of total users, training users, and active users are
During the testing phase, the set of active user declared ratings,
We conduct four experiments on the 108 users’ dataset, the first experiment uses Pearson correlation coefficient (formula (
Each experiment is performed five times with different cardinalities of the common set. First, we assume the cardinality of the common set to be greater than or equal to one, and then we increase it by five each time until we reach 20. Table
Common sets and their cardinalities.
Common set | Code | Size |
---|---|---|
1 | CS1 | C ≥ 1 |
2 | CS2 | C ≥ 5 |
3 | CS3 | C ≥ 10 |
4 | CS4 | C ≥ 15 |
5 | CS5 | C ≥ 20 |
The performance of each examined CRS is evaluated using items’ coverage, percentage of the correct predictions (PCP), and mean absolute error (MAE) [
Here,
The PCP is the percent of the correctly predicted items by the system to the total number of items in the test ratings set of the active user. The set of correctly predicted items for a given active user and the total PCP over all the active users are defined by the following formulae [
The MAE measures the deviation of predictions generated by the CRS from the true ratings specified by the active user [
The predicted rating,
The results show that many active users could not gain a benefit from the systems when the cardinality of the common set is increased. The users’ coverage values for all systems are listed in Table
Users’ coverage of CBRS, CVRS, DWRS, and MDRS.
C | CBRS | CVRS | DWRS | MDRS | ||||
---|---|---|---|---|---|---|---|---|
Number | % | Number | % | Number | % | Number | % | |
CS1 | 106 | 98.15 | 108 | 100 | 108 | 100 | 108 | 100 |
CS2 | 98 | 90.74 | 101 | 93.52 | 99 | 91.67 | 99 | 91.67 |
CS3 | 80 | 74.07 | 80 | 74.07 | 80 | 74.07 | 80 | 74.07 |
CS4 | 71 | 65.74 | 71 | 65.74 | 70 | 64.82 | 70 | 64.82 |
CS5 | 54 | 50 | 56 | 51.85 | 55 | 50.93 | 55 | 50.93 |
The results show that the users’ coverage is high for low cardinality values and starts decreasing by increasing the cardinality of the common set for all systems. The lowest value is
Actually, the similarity measure is a crucial part for any CRS. However, our experiments show that this impact depends largely on the cardinality of the common set. The results of all systems for all metrics, three neighborhood set sizes, and five cardinalities of the common set are listed in Table
PCP and coverage of the examined RS with different common set sizes.
|
PCP | Coverage | MAE | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CS1 | CS2 | CS3 | CS4 | CS5 | CS1 | CS2 | CS3 | CS4 | CS5 | CS1 | CS2 | CS3 | CS4 | CS5 | ||
CBRS | 10 | 10.15 | 23.22 | 27.83 | 30.18 | 32.09 | 32.05 | 69.62 | 80.08 | 84.95 | 87.99 | 7.02 | 1.66 | 1.09 | 0.9 | 0.62 |
30 | 31.7 | 34.38 | 35.34 | 35.71 | 36.43 | 89.02 | 94.46 | 95.81 | 95.78 | 96.61 | 1.23 | 0.97 | 0.77 | 0.71 | 0.49 | |
50 | 35.53 | 36.11 | 36.24 | 36.25 | 36.74 | 96.49 | 97.19 | 97.25 | 96.67 | 97.12 | 1 | 0.93 | 0.75 | 0.7 | 0.48 | |
|
||||||||||||||||
CVRS | 10 | 6.74 | 20.24 | 27.44 | 30.5 | 31.98 | 22.98 | 60.33 | 78.13 | 85.05 | 87.39 | 9.02 | 2.15 | 1.08 | 0.85 | 0.63 |
30 | 26.02 | 33.92 | 35.23 | 35.59 | 36.05 | 75.77 | 93.52 | 95.77 | 96.19 | 96.46 | 1.84 | 1.05 | 0.74 | 0.68 | 0.52 | |
50 | 34.27 | 35.49 | 36.13 | 35.98 | 36.6 | 94.46 | 97.04 | 97.77 | 97.48 | 97.44 | 1.08 | 1 | 0.71 | 0.66 | 0.5 | |
|
||||||||||||||||
DWRS | 10 | 18 | 26.53 | 29.94 | 31.65 | 32.95 | 51.59 | 74.52 | 82.79 | 86.66 | 88.74 | 3.62 | 1.44 | 0.99 | 0.84 | 0.6 |
30 | 34.2 | 35.45 | 36.23 | 36.09 | 36.54 | 91.71 | 94.41 | 95.49 | 95.79 | 96.34 | 1.12 | 0.95 | 0.76 | 0.69 | 0.5 | |
50 | 36.48 | 36.53 | 36.83 | 36.49 | 37.15 | 96.17 | 96.88 | 97.1 | 97.1 | 97.39 | 1.01 | 0.91 | 0.74 | 0.68 | 0.49 | |
|
||||||||||||||||
MDRS | 10 | 23.08 | 29.22 | 31.46 | 31.78 | 32.95 | 63.93 | 79.67 | 85.41 | 87.43 | 89.16 | 2.41 | 1.3 | 0.95 | 0.83 | 0.6 |
30 | 34.84 | 35.65 | 36.13 | 36.01 | 36.68 | 93.04 | 94.74 | 95.6 | 95.83 | 96.27 | 1.09 | 0.94 | 0.76 | 0.69 | 0.5 | |
50 | 36.47 | 36.48 | 36.75 | 36.64 | 37.45 | 96.37 | 96.81 | 97.06 | 97.09 | 97.39 | 1.01 | 0.91 | 0.74 | 0.68 | 0.49 |
Horizontally, the results are improved with increasing the cardinality of the common set and also vertically with increasing the neighborhood set sizes. This improvement gets narrower as we move in both directions. We may conclude that implementing RS with high value of neighborhood set size and low cardinality of common sets hides the effectiveness of the similarity measure. In general, increasing the cardinality of the common set has two contradictory effects on the CRS performance. Negatively, it reduces the users’ coverage and hence the system could not help as many users as before. Positively, it enhances the CRS ability for identifying true neighbors and hence increases the system accuracy.
For more clarifications, we computed the improvement percentages between the results of CS1 and CS5 in terms of PCP, coverage, and MAE for all systems. For comparison purposes, we used the following two formulae for measuring increase improvement percentages (for PCP and coverage) and decrease improvement percentages (for MAE) [
Table
Improvement percentages of the examined RSs for PCP, coverage, and MAE.
|
PCP | Coverage | MAE | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
CBRS | CVRS | DWRS | MDRS | CBRS | CVRS | DWRS | MDRS | CBRS | CVRS | DWRS | MDRS | |
10 | 216.23 | 374.31 | 83.07 | 42.63 | 174.49 | 280.31 | 72.01 | 39.47 | 91.21 | 92.98 | 83.34 | 75.06 |
30 | 14.93 | 38.54 | 6.83 | 5.27 | 8.53 | 27.3 | 5.05 | 3.47 | 60.35 | 71.91 | 55.63 | 54.34 |
50 | 3.4 | 6.79 | 1.83 | 2.67 | 0.66 | 3.16 | 1.27 | 1.07 | 51.97 | 53.32 | 51.95 | 51.66 |
Another important point is that MDRS gets the lowest improvement percentages (only 42.63%) even if its overall performance is better than the other ones. This indicates that this similarity measure is able to elect representative neighbors from the very beginning and hence its improvement is slow.
For more clarity, Table MDRS is the best in terms of all metrics for CBRS results go faster to that of MDRS than CVRS by increasing the cardinality of the common set. In terms of PCP, CVRS is the worst for all As we increase Increasing Increasing Vertically with CS1 and CS2, MDRS and DWRS are the best in terms of PCP and MAE. At this stage, many neighbors who have been ranked at the top for CBRS and CVRS may have only one or two items in common with the active user. These neighbors are not true neighbors as their overlapping with the active user is very small.
Best system in terms of PCP, coverage, and MAE.
|
PCP | Coverage | MAE | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CS1 | CS2 | CS3 | CS4 | CS5 | CS1 | CS2 | CS3 | CS4 | CS5 | CS1 | CS2 | CS3 | CS4 | CS5 | |
10 | MDRS | MDRS | MDRS | ||||||||||||
30 | MDRS | DWRS | MDRS | MDRS | CBRS | MDRS | CVRS | CBRS | |||||||
50 | DWRS | MDRS | CBRS | CVRS | CBRS | MDRS | CVRS | CBRS |
PCP of the examined RSs for different cardinalities of the common set and
Coverage of the examined RSs for different cardinalities of the common set and
MAE of the examined RSs for different cardinalities of the common set and
One important way for enhancing the CRS accuracy is to select a similarity measure that produces a close set of neighbors. The results show that the effect of the similarity measure depends on the cardinality of the common set, as the system accuracy gets better with high values of this set. Moreover, this effect becomes less significant as we increase the number of the neighbors for the active user. In this case, large number of neighbors let them compensate each other and hence the differences between systems become very low.
The modified mean difference weights similarity measure outperforms other systems for many cases as it takes the ratings mean into consideration. In general, increasing the cardinality of the common set has two contradictory effects on the CRS performance. Negatively, it reduces the users’ coverage and hence the system could not help as many users as before. Positively, it enhances the CRS ability for identifying true neighbors and hence increases the system accuracy. The results show that some similarity measures outperform others for a specific cardinality of the common set. However, the role may change with another cardinality. For example, MDRS performs better than both CBRS and CVRS for CS1 and
Many approaches have been done for alleviating the effect of small sets of common items, some of them try to predict missing items, and others try to devalue the contribution of the corresponding users. However, our view for future work is to propose new techniques that rely directly on the actual user data without any predictions to fill the missing values.
The author declares that there are no competing interests.