Although naïve Bayes learner has been proven to show reasonable performance in machine learning, it often suffers from a few problems with handling real world data. First problem is conditional independence; the second problem is the usage of frequency estimator. Therefore, we have proposed methods to solve these two problems revolving around naïve Bayes algorithms. By using an attribute weighting method, we have been able to handle conditional independence assumption issue, whereas, for the case of the frequency estimators, we have found a way to weaken the negative effects through our proposed smooth kernel method. In this paper, we have proposed a compact Bayes model, in which a smooth kernel augments weights on likelihood estimation. We have also chosen an attribute weighting method which employs mutual information metric to cooperate with the framework. Experiments have been conducted on UCI benchmark datasets and the accuracy of our proposed learner has been compared with that of standard naïve Bayes. The experimental results have demonstrated the effectiveness and efficiency of our proposed learning algorithm.
Naïve Bayes classifier is a supervised learning method based on Bayes rule of probability theory, running on labeled training examples and driven by a strong assumption that all attributes in the training examples are independent from one another on the given training examples known as naïve Bayes assumption or naïve Bayes conditional independence assumption. Naïve Bayes classifier has high performance and rapid classification speed and has exhibited its effectiveness especially in huge training instances with plenty of attributes mainly because of its independence assumption [
In practice, classification performance is affected by the attribute independence assumption which is usually violated in real world. However, due to the attractive advantages of efficiency and simplicity, both stemming from the attribute independence assumption, many researchers have proposed effective methods to further improve the performance of naïve Bayes classifier by weakening the attribute independence without neglecting its advantages. We categorize some typical previous methods of relaxing naïve Bayes assumption and give brief reviews in Section
Although Chen and Wang [
Contributions of this paper are threefold:
We briefly make a survey of ways to improve naïve Bayes, especially focusing on those naïve Bayes weighting methods.
We propose a novel attribute weighting framework called Attribute Weighting with Smooth Kernel Density Estimation, simply AWSKDE. The AWSKDE framework employs a smooth kernel that makes the probabilistic estimation of likelihood to be dominated by the weights, which enables the combination of kernel methods and weighting methods. After setting up the kernel, we can generate a set of weights directly by using various methods cooperating with the kernel.
On the AWSKDE framework, we propose a learner called AW
Our experimental results show that mutual information criterion based on AWSKDE framework exhibits superior performance compared to standard naïve Bayes classifier.
The paper is organized as follows: we briefly make a survey of ways to improve naïve Bayes in Section
A number of methods that weaken attribute independent assumption for naïve Bayes have been proposed in the recent years. Jiang et al. [
For data expansion, Kang and Sohn [
Wong [
In structure extension, Webb et al. [
As for attribute weighting methods, we have two ways to get attribute weights. The first one is to construct a function with the parameters of attribute weight and to let this function fit itself with the training data by estimating the weights. Zaidi et al. [
Chen and Wang [
There are many other methods that can be categorized into attribute weighting. Lee et al. [
In this section, we explain the concepts of machine learning methods used in this paper, including naïve Bayes classifier, naïve Bayes attribute weighting, and kernel density estimation for naïve Bayes categorical attributes. The symbols used in this paper are summarized in Notations section.
In supervised learning, consider a training data set
But likelihood
In the training phase, only
In the classification phase, if we have a test instance
As it was aforementioned, naïve Bayes assumption conflicts with most real world applications (note that it is rare that attributes in the same data set do not have any relationships between each other). Therefore, many researchers provide proposals to relax naïve Bayes assumption effectively, which have been reviewed in Section
In this paper, we focus on attribute weighting methods combined with kernel density estimation technique which is applied to naïve Bayes learner in order to relax conditional independence assumption.
Generally, naïve Bayes attribute weighting scheme can be formulated in several forms. Firstly, the weight to each attribute is defined as follows:
If the weight depends on attribute and class, the corresponding formula is as follows:
The following formula is used for the case when the weight depends on attribute value:
Referring back to (
It is worthwhile to mention that (
In our approach, we follow (
Based on information theoretic perspective, attribute weighting method tries to find out which attribute will give more information for classification than other attributes. If an attribute
In naïve Bayes learner, which has been discussed in Section
Given a test instance
Note that
In [
Hence, the classifier is formulated as follows:
As mentioned earlier, in this section, we propose an attribute weighting framework working on the categorical attribute called
In (
The estimation
Hence, AWSKDE framework is defined as follows:
The AWSKDE framework incorporates a smooth kernel to make the probabilistic estimation of likelihood dominated by the weights. This enables natural combination of kernel methods and weighting methods. After setting up the kernel, we can generate a set of weights estimated by various methods cooperating with the kernel.
Our approach generates a set of attribute weights
The average weight
We also incorporate split information used in C4.5 [
Now, the weight of
We feed AW
(1)
estimate
(2)
(3)
(a)
(b)
(c)
(1) for each dimension of test instance
(2) Output the class value
During the training phase, AW
Time complexity (
Algorithm  Training time  Classification time 

NB 


AWSKDE^{MI} 


Here, we also present a framework named
The estimation
We also build an attribute weighting naïve Bayes learner with mutual information metric based on this AWLSKDE framework, called AW
In order to compare AW
Description of data sets used in the experiments.
Data set  Instances  Attributes  Classes  Missing  Numeric 

Anneal  898  39  6  Y  Y 
Balancescale  625  5  3  N  Y 
Breastcancer  286  10  2  Y  N 
Breastw  699  10  2  Y  N 
Colic  368  23  2  Y  Y 
Credita  690  16  2  Y  Y 
Dermatology  366  35  6  Y  Y 
Glass  214  10  7  N  Y 
Heartstatlog  250  14  2  N  Y 
Hepatitis  155  20  2  Y  Y 
Ionosphere  351  35  3  N  Y 
Lymph  148  19  4  N  Y 
Primarytumor  339  18  21  Y  N 
Segment  2310  20  7  N  Y 
Sick  3772  30  2  Y  Y 
Vehicle  846  19  4  N  Y 
Vote  435  17  2  Y  N 
Experimental results in terms of classifiers’ accuracy. Note that accuracies are estimated using 10fold crossvalidation with 95% confidence interval.
Data set  Naïve Bayes  AWSKDE^{MI}  AWLSKDE^{MI} 

Anneal  93.99 ± 1.55 

76.17 ± 2.79 
Balancescale 


89.6 ± 2.39 
Breastcancer  71.68 ± 5.22 

70.28 ± 5.30 
Breastw 

96.85 ± 1.29  88.41 ± 2.37 
Colic 

81.79 ± 3.94  79.62 ± 4.12 
Credita  85.94 ± 2.59 

83.62 ± 2.76 
Dermatology 


75.14 ± 4.43 
Glass 

76.64 ± 5.67  62.62 ± 6.48 
Heartstatlog 


77.78 ± 5.15 
Hepatitis 


79.35 ± 6.37 
Ionosphere 

91.45 ± 2.93  86.61 ± 3.56 
Lymph 


76.35 ± 6.85 
Primarytumor 

49.85 ± 5.32  24.78 ± 4.60 
Segment 

88.70 ± 1.29  75.28 ± 1.76 
Sick 

97.03 ± 0.54  93.88 ± 0.76 
Vehicle  66.67 ± 3.18 

61.82 ± 3.27 
Vote  90.11 ± 2.81  89.89 ± 2.83 



Average  84.78 ± 3.23 

76.05 ± 3.86 
In the implementation of our algorithm, all the probabilities including
To compare the performance of the algorithms, we have adapted
Table
It can be seen that AW
In this paper, a novel attribute weighting framework called
Even though AW
The
The cardinality of attribute
The value of
Training data set consists of
An instance,
Class label,
An element of
A test instance,
The unconditioned probability of event
The conditional probability of
An estimation of
The frequency of
The weightvalue of attribute
The mutual information between
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (no. NRF2013R1A1A2013401).