Granular association rules reveal patterns hidden in many-to-many relationships which are common in relational databases. In recommender systems, these rules are appropriate for cold-start recommendation, where a customer or a product has just entered the system. An example of such rules might be “40% men like at least 30% kinds of alcohol; 45% customers are men and 6% products are alcohol.” Mining such rules is a challenging problem due to pattern explosion. In this paper, we build a new type of parametric rough sets on two universes and propose an efficient rule mining algorithm based on the new model. Specifically, the model is deliberately defined such that the parameter corresponds to one threshold of rules. The algorithm benefits from the lower approximation operator in the new model. Experiments on two real-world data sets show that the new algorithm is significantly faster than an existing algorithm, and the performance of recommender systems is stable.
Relational data mining approaches [
Granular association rules [
A granular association rule mining problem is defined as finding all granular association rules given thresholds on four measures [
In this paper, we propose a new type of parametric rough sets on two universes to study the granular association rule mining problem. We borrow some ideas from variable precision rough sets [
With the lower approximation of the proposed parametric rough sets, we design a backward algorithm for rule mining. This algorithm starts from the second universe and proceeds to the first one; hence it is called a backward algorithm. Compared with an existing sandwich algorithm [
Experiments are undertaken on two real-world data sets. One is the course selection data from Minnan Normal University during the semester between 2011 and 2012. The other is the publicly available MovieLens data set. Results show that
The rest of the paper is organized as follows. Section
In this section, we revisit granular association rules [
The data model is based on information systems and binary relations.
An example of information system is given by Table
A many-to-many entity-relationship system.
Customer
CID | Name | Age | Gender | Married | Country | Income | NumCars |
---|---|---|---|---|---|---|---|
|
Ron | 20–29 | Male | No | USA | 60 k–69 k | 0-1 |
|
Michelle | 20–29 | Female | Yes | USA | 80 k–89 k | 0-1 |
|
Shun | 20–29 | Male | No | China | 40 k–49 k | 0-1 |
|
Yamago | 30–39 | Female | Yes | Japan | 80 k–89 k | 2 |
|
Wang | 30–39 | Male | Yes | China | 90 k–99 k | 2 |
Product
PID | Name | Country | Category | Color | Price |
---|---|---|---|---|---|
|
Bread | Australia | Staple | Black | 1–9 |
|
Diaper | China | Daily | White | 1–9 |
|
Pork | China | Meat | Red | 1–9 |
|
Beef | Australia | Meat | Red | 10–19 |
|
Beer | France | Alcohol | Black | 10–19 |
|
Wine | France | Alcohol | White | 10–19 |
Buys
|
|
|
|
|
|
|
---|---|---|---|---|---|---|
|
1 | 1 | 0 | 1 | 1 | 0 |
|
1 | 0 | 0 | 1 | 0 | 1 |
|
0 | 1 | 0 | 0 | 1 | 1 |
|
0 | 1 | 0 | 1 | 1 | 0 |
|
1 | 0 | 0 | 1 | 1 | 1 |
In an information system, any
The following definition was employed by Yao and Deng [
A granule is a triple [
The set of objects that are instances of
The support of the granule is the size of
Let
When
A binary relation is more often stored in the database as a table with two foreign keys. In this way the storage is saved. For the convenience of illustration, here we represented it with an
With Definitions
A many-to-many entity-relationship system (MMER) is a 5-tuple
An example of MMER is given in Tables
Now we come to the central definition of granular association rules.
A granular association rule is an implication of the form
According to (
From the MMER given in Tables
One has
Rule How many customers are men? How many products are alcohol? Do all men like alcohol? Do all kinds of alcohol favor men?
An example of complete granular association rules with measures specified is “40% men like at least 30% kinds of alcohol; 45% customers are men and 6% products are alcohol." Here 45%, 6%, 40%, and 30% are the source coverage, the target coverage, the source confidence, and the target confidence, respectively. These measures are defined as follows.
The source coverage of a granular association rule is
The target coverage of
There is a tradeoff between the source confidence and the target confidence of a rule. Consequently, neither value can be obtained directly from the rule. To compute any one of them, we should specify the threshold of the other. Let
Let
The relationships between rules are interesting to us. As an example, let us consider the following rule.
One has
Rule
A straightforward rule mining problem is as follows.
Since both
In this section, we first review rough approximations [
The classical rough sets [
Let
These concepts can be employed for set approximation or classification analysis. For set approximation, the interval
Ziarko [
Let
The equivalence relation
For the convenience of discussion, we rewrite his definition as follows.
Let
Note that
Ziarko [
Gong and Sun [
Since our data model is concerned with two universes, we should consider computation models for this type of data. Rough sets on two universes have been defined in [
Let
From this definition we know immediately that, for
Now we explain these notions through our example.
We have the following property concerning the monotonicity of these approximations.
Let
That is, with the increase of the object subset, the lower approximation decreases while the upper approximation increases. It is somehow ad hoc to people in the rough set society that the lower approximation decreases in this case. In fact, according to Wong et al. [
We argue that different definitions of the lower approximation are appropriate for different applications. Suppose that there is a clinic system where
In our example presented in Section
We are looking for very strong rules through the lower approximation indicated in Definition
Given a group of people, the number of products that favor all of them is often quite small. On the other hand, the number of products that favor at least one of them is not quite meaningful. Similar to probabilistic rough sets, we need to introduce one or more parameters to the model.
To cope with the source confidence measure introduced in Section
Let
We do not discuss the upper approximation in the new context due to lack of semantic.
From this definition we know immediately that the lower approximation of
The following property indicates that
Let
One has
The following property shows the monotonicity of
Let
However, given
Similar to the discussion in Section
In our previous work [
To make use of the concept proposed in the Section
With this equation, we propose an algorithm to deal with Problem 1. The algorithm is listed in Algorithm
(1) (2) (3) (4) (5) (6) (7) (8) output rule (9) (10) (11)
Search in
Search in
For each granule obtained in Step
Check possible rules regarding
Because the algorithm starts from the right-hand side of the rule and proceeds to the left-hand side, it is called a backward algorithm. It is necessary to compare the time complexities of the existing sandwich algorithm and our new backward algorithm. Both algorithms share Steps
According to the loops, the time complexity of Algorithm
Intuitively, the backward algorithm avoids computing
The space complexities of these two algorithms are also important. To store the relation
The main purpose of our experiments is to answer the following questions. Does the backward algorithm outperform the sandwich algorithm? How does the number of rules change for different number of objects? How does the algorithm run time change for different number of objects? How does the number of rules vary for different thresholds? How does the performance of cold-start recommendation vary for the training and testing sets?
We collected two real-world data sets for experimentation. One is course selection, and the other is movie rating. These data sets are quite representative for applications.
The course selection system often serves as an example in textbooks to explain the concept of many-to-many entity-relationship diagrams. Hence it is appropriate to produce meaningful granular association rules and test the performance of our algorithm. We obtained a data set from the course selection system of Minnan Normal University. (The authors would like to thank Mrs. Chunmei Zhou for her help in the data collection.) Specifically, we collected data during the semester between 2011 and 2012. There are 145 general education courses in the university. 9,654 students took part in course selection. The database schema is as follows. Student (student ID, name, gender, birth-year, politics-status, grade, department, nationality, and length of schooling). Course (course ID, credit, class-hours, availability, and department). Selects (student ID, course ID).
Our algorithm supports only nominal data at this time. For this data set, all data are viewed nominally and directly. In this way, no discretization approach is employed to convert numeric ones into nominal ones. Also we removed student names and course names from the original data since they are useless in generating meaningful rules.
The MovieLens data set assembled by the GroupLens project is widely used in recommender systems (see, e.g., [ Remove movie names. They are not useful in generating meaningful granular association rules. Use release year instead of release date. In this way the granule is more reasonable. Select the movie genre. In the original data, the movie genre is multivalued since one movie may fall in more than one genre. For example, a movie can be both animation and children's. Unfortunately, granular association rules do not support this type of data at this time. Since the main objective of this work is to compare the performances of algorithms, we use a simple approach to deal with this issue, that is, to sort movie genres according to the number of users they attract and only keep the highest priority genre for the current movie. We adopt the following priority (from high to low): comedy, action, thriller, romance, adventure, children, crime, sci-Fi, horror, war, mystery, musical, documentary, animation, western, filmnoir, fantasy, and unknown.
Our database schema is as follows. User (user ID, age, gender, and occupation), Movie (movie ID, release year, and genre), Rates (user ID, movie ID).
There are 8 user age intervals, 21 occupations, and 71 release years. Similar to the course selection data set, all these data are viewed nominally and processed directly. We employ neither discretization nor symbolic value partition [
We undertake five sets of experiments to answer the questions proposed at the beginning of this section.
We compare the efficiencies of the backward and the sandwich algorithms. We look at only the run time of Lines 3 through 11, since these codes are the difference between two algorithms.
For the course selection data set, when
Run time information: (a) course selection, (b) MovieLens (3,800 users).
Basic operations information: (a) course selection, (b) MovieLens (3,800 users).
For the MovieLens data set, we employ the data set with 3,800 users and 3,952 movies. We use the following settings:
Now we study how the number of rules changes with the increase of the data set size. The experiments are undertaken only on the MovieLens data set. We use the following settings:
First we look at the number of concepts satisfying the source confidence threshold
Number of concepts on users for MovieLens: (a)
Second we look at the number of granular association rules satisfying all four thresholds. Figure
Number of granular association rules for MovieLens: (a)
We look at the run time change with the increase of the number of users. The time complexity of the algorithm is given by (
Run time on MovieLens: (a)
Figure
Number of granular association rules: (a) course selection, (b) MovieLens.
Now we study how the performance of cold-start recommendation varies for the training and testing sets The experiments are undertaken only on the MovieLens data set. Here we employ two data sets. One is with 1,000 users and 3,952 movies, and the other is with 3,000 users and 3,952 movies. We divide user into two parts as the training and testing set. The other settings are as follows: the training set percentage is 60%,
Figure Figures When The performance of the recommendation does not change much on the training and the testing sets. This phenomenon figures out that the recommender is stable.
Accuracy of cold-start recommendation on MovieLens: (a) 1,000 users and 3,952 movies, (b) 3,000 users and 3,952 movies.
Now we can answer the questions proposed at the beginning of this section. The backward algorithm outperforms the sandwich algorithm. The backward algorithm is more than 2 times and 3 times faster than the sandwich algorithm on the course selection and MovieLens data sets, respectively. Therefore our parametric rough sets on two universes are useful in applications. The number of rules does not change much for different number of objects. Therefore it is not necessary to collect too many data to obtain meaningful granular association rules. For example, for the MovieLens data set, 3,000 users are pretty enough. The run time is nearly linear with respect to the number of objects. Therefore the algorithm is scalable from the viewpoint of time complexity. However, we observe that the relation table might be rather big; therefore this would be a bottleneck of the algorithm. The number of rules decreases dramatically with the increase of thresholds The performance of cold-start recommendation is stable on the training and the testing sets with the increase of thresholds
In this paper, we have proposed a new type of parametric rough sets on two universes to deal with the granular association rule mining problem. The lower approximation operator has been defined, and its monotonicity has been analyzed. With the help of the new model, a backward algorithm for the granular association rule mining problem has been proposed. Experimental results on two real-world data sets indicate that the new algorithm is significantly faster than the existing sandwich algorithm. The performance of recommender systems is stable on the training and the testing sets. To sum up, this work applies rough set theory to recommender systems and is one step toward the application of rough set theory and granular computing. In the future, we will improve our approach and compare the performance with other recommendation approaches.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is in part supported by the National Science Foundation of China under Grant nos. 61379089, 61379049, and 61170128, the Fujian Province Foundation of Higher Education under Grant no. JK2012028, the Key Project of Education Department of Fujian Province under Grant No. JA13192, and the Postgraduate Education Innovation Base for Computer Application Technology, Signal and Information Processing of Fujian Province (no. [2008]114, High Education of Fujian).