A New Classification Approach Based on Multiple Classification Rules

A good classifier can correctly predict new data for which the class label is unknown, so it is important to construct a high accuracy classifier. Hence, classification techniques are much useful in ubiquitous computing. Associative classification achieves higher classification accuracy than some traditional rule-based classification approaches. However, the approach also has two major deficiencies. First, it generates a very large number of association classification rules, especially when the minimum support is set to be low. It is difficult to select a high quality rule set for classification. Second, the accuracy of associative classification depends on the setting of the minimum support and the minimum confidence. In comparison with associative classification, some improved traditional rule-based classification approaches often produce a classification rule set that plays an important role in prediction.Thus, some improved traditional rule-based classification approaches not only achieve better efficiency than associative classification but also get higher accuracy. In this paper, we put forward a new classification approach called CMR (classification based onmultiple classification rules). CMR combines the advantages of both associative classification and rule-based classification. Our experimental results show that CMR gets higher accuracy than some traditional rule-based classification methods.


Introduction
Classification is a pervasive data mining problem which has many applications, such as medical analysis, fraud detection, and network security [1].A good classifier can correctly classify an object for which the class label is unknown.Therefore, building accurate and efficient classifier is one of the essential tasks of data mining.As a consequence, classification techniques are quite useful in ubiquitous computing [2][3][4][5].The classification problem has been extensively studied by the research community.Various types of classification approaches have been proposed (e.g., KNN [6], Bayesian classifiers [7], decision trees [8], support vector machines [9], neural networks [10], and associative classifiers [11]).Classification is generally divided into two steps.First, we construct classification model based on the training dataset.Second, we use the model to predict new instances for which the class labels are unknown.
In recent years, associative classification has been investigated widely [11][12][13][14][15][16][17][18][19].It integrates association rule mining algorithm and classification.Associative classification induces a set of association classification rules from the training dataset which satisfies certain user-specified frequency and confidence.Then it selects a small set of high quality association classification rules and uses this rule set for prediction.The experimental results indicate in [11,12] that associative classification gets higher accuracy than some traditional classification approaches such as decision trees [8].In comparison with some traditional rule-based classification approaches, associative classification has two characteristics: (1) it generates a large number of association classification rules and (2) the measure support and confidence are used for evaluating the significance of association classification rules.However, associative classification has some weaknesses.First, it often generates a very large number of association classification rules in association rule mining, especially when the training dataset is large and dense.It takes great efforts to select a set of high quality classification rules from among them.Second, the accuracy of associative classification depends on the setting of the minimum support and the minimum confidence.Third, the efficiency of associative classification is low when the minimum support is set to be low and the training dataset is large.
Traditional rule-based classification approaches also have been studied extensively [20][21][22][23][24][25].Some traditional rulebased algorithms like FOIL [20], CN2 [21], and ELEM2 [22] discover a small set of high quality classification rules.They employ a sequential covering methodology and induce one rule at a time, and then they remove the positive instances that are covered by each new discovered rule.This rule induction process is done in a greedy fashion as it uses a heuristic function to select an attribute value to determine how each rule would be extended.They can achieve higher efficiency than associative classification.However, the accuracy of some traditional rule-based classification approaches may not be as high in some datasets.One of the reasons is that they usually generate a much small set of classification rules, especially when the training dataset is small.Some novel rule-based classification approaches have been proposed recently [22][23][24][25].They can generate more classification rules than the onerule-at-a-time algorithms and achieve higher accuracy.CPAR [23] keeps all close-to-the-best attribute values during the rule building process.By doing so, CPAR can select more attribute values and build several rules at one time.Thus, CPAR discovers more classification rules than FOIL.CPAR achieves higher average classification accuracy than associative classification algorithm CBA [11].CATW algorithm [24] is different from PRM [23].After an example is covered by a rule, CATW decreases both the tuple weight of the example and the weight of attribute values which the rule contains.As a result, CATW generates a much larger set of classification rules than FOIL.CATW achieves higher accuracy than CPAR and CMAR [12] in many cases.CMER [25]  The outline of this paper is as follows.In Section 2, we comment on some related work.In Section 3, first we give some definitions.Second, we use an example to describe the main ideas of inducing classification rules of CMR.Third, we develop the algorithm of CMR.Finally, we discuss how to predict class labels for unseen examples using the classification rule set generated by CMR.We report our experimental results in Section 4. We conclude the paper in Section 5.

Related Work
There are many classification approaches in classification domain.They are extremely different on inducing the classification rules from the training dataset and testing an object for which the class label is unknown.Our work presented in this paper is related to some existing researches of various classification methods.Therefore, we comment on some of these including the extraction of classification rules and testing strategies in the following.
(1) Generating the Set of Classification Rules from the Training Dataset.Decision tree method selects the best attribute with the highest information gain and then builds the decision tree as a classification rule set.It generates a small rule set, especially when the training dataset is small.Yin and Han [23] introduce three ways of generating the set of classification rules from the training dataset.They are FOIL, PRM, and CPAR.FOIL uses foil gain to select the best attribute value from the whole training dataset.And then it selects the best attribute value in the conditional database of the new selected attribute value.It adds it one by one to produce a classification rule.FOIL repeatedly searches for the current best rule and removes all the positive examples which are covered by the rule until all the positive examples in the dataset are removed.FOIL induces a small rule set as decision tree.PRM proposes an algorithm which modifies FOIL.PRM uses foil gain to select the best attribute value and adds attribute values one by one.However, after an example is covered by a rule, instead of removing it, its weight is decreased by multiplying a factor.PRM generates more rules than FOIL and each positive example is usually covered more than once.CPAR builds rules by adding attribute values one by one, which is similar to PRM.CPAR selects the best attribute value and keeps all close-to-the-best attribute values.CPAR connects the best attribute with the attribute values which are close to the best one.By doing so, CPAR selects more attribute values and builds several rules at one time.CATW [24] not only decreases the weight of an example by multiplying a factor, but also decreases the attribute values weight which the rules matche after a rule is generated.ELEM2 takes into account the support of an attribute value and selects the most relevant attribute value for formulating rules.CMER [25] uses the support and foil gain to select several important attribute values to build the candidate set and the seed set.It connects the seed set with the candidate set and generates several classification rules at a time.CAEP [26] proposes a new measure growth rate for finding pattern.It produces emerge patterns with growth rate greater than or equal to the minimum threshold and then induces classification rule set with these patterns.In associative classification [11], association rule mining is used to generate candidate rules, which includes all conjunctions of attribute values that meet the minimum support threshold.Then the measure confidence is used to generate the association classification rule set.
In this paper, we propose a new classification approach called CMR.First, CMR uses the support and confidence to generate classification rules with length one.Second, CMR connects the seed set with the candidate set and generates rules with length two.Third, CMR builds new seed set and then induces rules by connecting the new seed set with the attribute values which are the best in the conditional databases.Finally, CMR removes the examples which are covered by all the rules generated and iterates this process.CMR induces many rules at a time.
(2) Classifying New Objects.When predicting the class label, CBA uses the best rule whose body matches the example.CMAR uses multiple association classification rules and weighted Chi-square to measure the strength of group rule set under both conditional support and class distribution.CPAR selects the best  rules for each class and compares the average expected accuracy of each class.CAEP sums the contributions of the individual EPs.ELEM2 gives a decision score for each class that the matched rules indicate.CMR selects the best  rules for each class and compares the average decision score.

CMR: Classification Based on Multiple Classification Rules
In this part, we first give some definitions.Second, we use an example to describe the process of rule mining of CMR.Third we develop the algorithm of CMR.Finally, we give the measure of the significance of the classification rules and introduce how to predict new examples.)) , where Definition 2 (support).The support of pattern  is defined as follows: where count() is the number of examples in  which contain pattern  and || is the number of examples in dataset .
Definition 3 (confidence).The confidence of pattern  is defined as follows: where count(  ) is the number of examples which contain pattern  and have a class .Let the minimum support be 20% and let the minimum confidence be 60%.We induce classification rules for the class ( = 89).First, CMR generates the rules with length one.CMR adopts the minimum support and the minimum confidence as the measurement.We calculate the support and the confidence of all the attribute values in the positive examples.If the support of an attribute value  is greater than the minimum support, and if the confidence of  is greater than the minimum confidence, then  is selected as the rules with length one.We have rules with length one as shown in Table 2.

Inducing
Second, CMR constructs the candidate set and the seed set.If the foil gain of an attribute value  in the positive examples is greater than the zero, then  is selected as an element of the candidate set.We use the average foil gain of all elements in the candidate set as the threshold of the minimum foil gain.If the foil gain of an attribute value  in the candidate set is greater than the minimum foil gain, then  is selected as an element of the seed set.We have the candidate set as shown in Table 3.In this example, the seed set consists of one element.It is the best attribute value in the candidate set.Attribute value 88 is the element of the seed set with the foil gain 1.150 and support 70%.
Third, CMR connects the seed set with the candidate set to produce patterns.If the confidence of a pattern  is greater than the minimum confidence, then  →  is a classification rule which has length two.If the support of a pattern  is greater than the minimum support, and the foil gain of  is greater than the minimum foil gain, then  is selected as an element of the new seed set.The new seed set consists of only one element.It is the pattern (88,11).
Fourth, we generate rules by extending pattern (88, 11).CMR selects an attribute value with the best foil gain from  Finally, CMR removes the examples that are covered by rules produced and iterates the process.

The Algorithm of Inducing Rules of CMR.
Algorithm 1 is the algorithm of inducing rules of CMR.

Predication Using Classification Rule
Set.When we predict an example, we use all rules that satisfy the example.Three cases are possible for matching an example with a set of rules.There may be only one match (i.e., the example matches only one rule), more than one match (i.e., the example matches more than one rule), or no match (i.e., the example does not match any rules).If the matched rules do not agree on the class labels, we give a decision score for each class.Thus, we need to evaluate every rule to determine its For rule  :  → , we use the expected accuracy to estimate the significance of rule .The expected accuracy of rule  is given by (4) (denoted as SIG): where  is the number of classes,  is the total number of examples which contain pattern , and   is the total number of examples which contain pattern  and have the class label .
For a testing example, we select the best  rules which are matched by the example.If all the best  rules have the same class label, then the testing example is classified into this class.If the matched rules are not in the same class, CMR computes a decision score for each class.If the class label  has the number  rules which match the example, the decision score of the class label  is the average SIG of  rules.CMR classifies the example into the class with the highest decision score.

Experimental Results
All experiments are performed on mushroom characteristic dataset.The number of the testing dataset is set to be 500 Input: Training data set  =  ∪  ( and  are the sets of all positive and negative examples, respectively), the minimum support, the minimum confidence.Output: A set of classification rule (1) the rule set  ← ⌀, the candidate set cs ← ⌀, the seed set ss ← ⌀, the frequent pattern FP ← ⌀, the conditional positive example   ← ⌀, the conditional negative example   ← ⌀ (2) while (|| > 0) do (3) compute the support and the confidence of each attribute value  in  (4) if (sup() > minsup && conf() > minconf) (5)  ←  ∪ {} (6) else if sup() > minsup (7) FP ← FP ∪ {} (8) end if (9) compute the foil gain of each attribute value  in FP (10) if in all experiments.We select them from 0-5500 in turn in mushroom dataset.We select the best 3 rules for prediction.
In Table 4, we choose the size of the training dataset from 100 to 1000 in turn.We select the training dataset randomly in mushroom dataset.The minimum support is varied from 3% to 7%.From Table 4, we can see that CMR has different accuracy in different support.The average accuracy of CMR is the highest when the minimum support is 5%.However, the accuracy of CMR is not varied obviously by the minimum support.
Table 5 and Figure 1 show the accuracy of FOIL, CMR, and CMER, respectively.In Table 5 and Figure 1, the minimum support is set to be 5%.The minimum confidence is set to be 100%.From Table 5 and Figure 1, we can conclude that (1) the accuracy of CMER is higher than Foil, no matter how large the training date set is, (2) when the training dataset is small, the accuracy of CMR is much higher than Foil, (3) CMR achieves higher accuracy than CMER in many cases, and (4) CMR achieves higher average classification accuracy than CMER.

Conclusions
Accuracy and efficiency are crucial factors in classification tasks in data mining.Associative classification gets higher accuracy than some traditional rule-based classification approaches in some cases.However, it generates a large number of association classification rules.Therefore, the efficiency of associative classification is not high when the minimum support is set to be low and the training dataset is large.In comparison with associative classification, one of the reasons that traditional rule-based classification methods cannot achieve high accuracy is that they often generate a few classification rules.In this paper, a new classification approach called CMR is proposed.CMR combines the advantages of both associative classification and rule-based classification.It induces many rules at a time.As a result, CMR generates much more classification rules than many other traditional rule-based classification methods, especially when the training dataset is small.Our experimental results show that the techniques developed in this paper are feasible.Our experimental results also show that CMR achieves high accuracy.

1 |
is the number of positive examples which contain attribute value V and | 1 | is the number of negative examples which contain attribute value V.
Rules.We first generate the rules with length one.Then we construct a candidate set and a seed set.Third, we connect the seed set with the candidate set and generate the rules with length 2. Fourth, we build a new seed set and generate the classification rules based on the new seed set.Finally, we remove the examples which are covered by the just-found rules and iterate the process.The following example shows the detailed process of inducing rules of CMR.
Example 4. The training dataset  is shown in Table 1.We suppose that the attribute  is the decision attribute and others are the condition attributes.In the training dataset , we suppose that all examples which have the class ( = 89) are positive examples and all examples which have the class ( = 90) are negative examples.

Table 1 :
The training data set.

Table 2 :
The rule set with length one.

Table 3 :
The candidate set.If the confidence of (88, 11, 81) is greater than the minimum confidence, then a classification rule is induced.Otherwise it continues finding the best attribute value from the conditional of pattern (88, 11, 81).After the rule which contains pattern (88, 11) is produced, CMR removes it from the new seed set and generates rules for other patterns in the new seed set until there are no patterns in the new seed set.

Table 4 :
Accuracy of CMR with different the minimum supports.

Table 5 :
Accuracy of FOIL, CMER and CMR.