Research Article Kernel Neighborhood Rough Sets Model and Its Application

. Rough set theory has been successfully applied to many ﬁ elds, such as data mining, pattern recognition, and machine learning. Kernel rough sets and neighborhood rough sets are two important models that di ﬀ er in terms of granulation. The kernel rough sets model, which has fuzziness, is susceptible to noise in the decision system. The neighborhood rough sets model can handle noisy data well but cannot describe the fuzziness of the samples. In this study, we de ﬁ ne a novel model called kernel neighborhood rough sets, which integrates the advantages of the neighborhood and kernel models. Moreover, the model is used in the problem of feature selection. The proposed method is tested on the UCI datasets. The results show that our model outperforms classic models.


Introduction
Rough set theory, which was proposed by Pawlak in 1982, is a powerful mathematical method to study incomplete and imprecise information.This theory has been successfully applied to many fields, such as data mining, decision-making, pattern recognition, machine learning, and intelligent control [1][2][3][4].Kernel rough sets [5] and neighborhood rough sets [6] are two important models in rough set theory.
Hu innovatively proposed the kernel rough sets model [5,7].A Gaussian kernel rough sets model-based feature selection method was discussed in [8].The information fusion problem of imperfect images has also been studied based on Hu's research [9].Ghosh et al. proposed an efficient Gaussian kernel-based fuzzy rough sets approach for feature selection [10].A novel fuzzy rough sets model was constructed by combining the hybrid distance and the Gaussian kernel in [11].A new feature selection method based on kernel fuzzy rough sets and a memetic algorithm were proposed for the transient stability assessment of power systems [12].In these studies, the information granules are constructed in the kernel structure.The "min" and "max" aggregation operations are used in approximation calculations [13][14][15].That is, the decision for a sample is dependent on the nearest sample [7].The computation of the lower approximation comes with risks if there is noise in the datasets [16].Data noise can lead to an increase in the classification error rate by using the kernel rough sets model [16].
Neighborhood is an important concept in classification and clustering.To formulate the notion of approximation, the neighborhood system was introduced into the relational model by Lin [17][18][19].Yao presented a framework for the formulation, interpretation, and comparison of neighborhood systems and rough sets approximations [20].Hu et al. investigated the issue of heterogeneous feature subset selection based on neighborhood rough sets [6,21].Based on neighborhood granulation, samples are constructed as a family of neighborhood granules to approximate the object sets.The neighborhood model can handle noisy data well based on the tolerance neighborhood relation and probabilistic theory [22].However, the main limitation of this model is that it cannot describe the fuzziness of samples [16].
Overall, the kernel rough sets model, which has fuzziness, is susceptible to noise in the decision system.The neighborhood rough sets model can handle noisy data but cannot describe the fuzziness of samples.That is, we can construct a new model by the combination of advantage of kernel and neighborhood rough sets.
On the other hand, increasing amounts of high-dimensional data must be processed for some real applications.Currently, feature selection plays an important role in machine learning and data mining.Neighborhood rough sets and kernel rough sets are widely used in feature selection [23][24][25][26].We also can deal with the feature selection problem by using the new rough sets model.
Based on the motivations above, the contributions of this paper include the following: (1) We define a novel model, the kernel neighborhood rough sets model, which integrates the advantages of the neighborhood and kernel models.( 2) Moreover, the model is used in the problem of feature selection.(3) The proposed method is tested on the UCI datasets.The results show that our model yields a better performance than classic models.
This paper is organized as follows.In Section 2, some basic concepts regarding neighborhood rough sets and kernel rough sets are briefly reviewed.In Section 3, the kernel neighborhood rough sets (KNRS) model is investigated in detail.Section 4 shows the application of KNRS to feature evaluation and feature selection.Numerical experiments are reported in Section 5. Finally, Section 6 concludes the paper.

Preliminary Knowledge
In this section, we review the kernel rough sets (KRS) model [5] and the neighborhood rough sets (NRS) model [6].

Kernel Rough Sets (KRS) Model
Definition 1. Suppose U is a nonempty finite set of objects and k is a Gaussian kernel function k x i , x j = exp − x i − x j 2 /2δ 2 , where x i − x j 2 is the Euclidean distance.
Therefore, U, A, k is a kernel approximation space, where Definition 2. Given a kernel approximation space U, A, k , X ∈ F U is a fuzzy subset of U, and we define the lower and upper approximations of X on the space U, A, k as follows: Definition 4. Given a neighborhood approximation space U, A, δ , ∀x ∈ U, and δ ≥ 0, δ x is a δ neighborhood of x whose center is x and the radius is δ, where Here, δ x can be considered to be the neighborhood granule.
Remark 1.Given two points Euclidean space, the distance between them can be computed as and Definition 5. Given a neighborhood approximation space U, A, δ , for any subset X ⊆ U, we define the lower and upper approximations of X on the space U, A, δ , respectively, as follows: The definitions of the lower and upper approximations are the most important concepts in KRS and NRS.

Kernel Neighborhood Rough Sets (KNRS) Model
In this section, we study the KNRS model.The definitions and theorems of KNRS are discussed in detail.The kernel neighborhood decision system is also investigated.

Kernel Neighborhood Rough Sets
Definition 6.Given a kernel neighborhood approximation space U, A, k, δ , δ ≥ 0, where k is a Gaussian kernel function, ∀x, y ∈ U; thus, x δ k is a kernel neighborhood granule of x, where where k is a kernel function, for any fuzzy subset X ⊆ F U ; we define the lower and upper approximations of X on the space U, A, k, δ , respectively, as follows: The method defined above is crisp and has no noise tolerance ability.Here, we propose an improved model that is called variable precision lower and upper approximation.
where k is a kernel function, for any fuzzy subset X ⊆ F U ; the variable precision lower and upper approximations of X are defined as follows, where | * | denotes the cardinality of the specified set: Then, 0 5 ≤ α ≤ 1 as in [22].
The relation matrix is presented in Table 2.Each line is the kernel neighborhood granule of u i .Given a fuzzy set X = 0 9/u 1 , 0 8/u 2 , 0 9/u 3 , 0 3/u 4 , 0 1/ u 5 , we obtain Definition 9. Given a kernel neighborhood approximation space U, A, k, δ , where δ ≥ 0 and k is a kernel function, for any fuzzy subset X ⊆ F U , the positive, negative, and boundary regions of X in the space U, A, k, δ are, respectively, expressed as follows: Theorem 1.Given a kernel neighborhood approximation space U, A, k, δ , where x

Kernel Neighborhood Decision System.
A kernel neighborhood approximation space is called a variable precision kernel neighborhood decision system which is denoted by U, C ∪ D, k, δ, α .C and D are condition and decision features, respectively.Definition 10.Consider a variable precision kernel neighborhood decision system U, C ∪ D, k, δ, α , where δ ≥ 0, 0 5 ≤ α ≤ 1, and k is a kernel function.Suppose B ⊆ C and U/D = D 1 , D 2 , … , D n .We, respectively, define the lower and upper approximations of D with respect to feature subset B as follows:  3 Complexity performance of the employed algorithms.We should select a proper feature subset that increases the size of the positive region and decreases the size of the boundary region.
The dependency degree reflects the approximating power of a condition feature set.A higher approximation degree means that the samples that are described by feature subset B are more consistent with decision D.
Theorem 2. Given a variable precision kernel neighborhood decision system U, C ∪ D, k, δ, α , where

Feature Based on KNRS
One of the most important applications of the information rough sets theory is the evaluation of the classification power of the attributes.In this section, we define the significance of the feature subsets.The feature selection algorithm is also discussed.It is impractical to obtain the optimal subset of features from 2 n − 1 candidates through an exhaustive search, where n is the number of features.We use a forward greedy search algorithm, which is usually more efficient than a standard brute-force exhaustive search [4].That is, one starts with an empty set of attributes and adds features to the subset of selected attributes one by one.Each selected attribute maximizes the increment of significance of the current subset.
Proof.Please refer to the proof of Theorem 2.
Corollary 1 shows that an object must belong to the positive region with respect to the feature sets (such as C 2 ) if the object belongs to the positive region with respect to the feature subset (such as C 1 , where C 1 ⊆ C 2 ⊆ C).Therefore, it is unnecessary to consider every object when computing the positive region.Then, we can obtain a fast feature selection algorithm by improving Algorithm 1.

Experimental Analysis
In this section, we evaluate the effectiveness of KNRS through a series of experiments.The data sets are downloaded from the UCI machine learning repository (http://archive.ics.uci.edu/ml/index.php)as [3] and are described in Table 3.The numerical attributes of the samples are linearly normalized as follows: where x min and x max are the bounds of the given attribute.Two popular machine leaning algorithms, namely, CART  5 Complexity a series of experiments.We set δ from 0.05 to 0.95 with a step size of 0.1.We set the precision degree α from 0.5 to 0.95 with a step size of 0.05.The evaluation criterion is the classification accuracy in the selected feature subset with parameters δ and α.
According to Figures 1-5, on most of the data sets, higher precision is achieved in a larger area when δ is between 0.5 and 0.95 and α is between 0.5 and 0.75.That is, the FNRS model is feasible and stable in most cases.

Effectiveness of the Fast Feature Selection.
We propose feature selection based on a kernel neighborhood rough set (Algorithm 1: FSKNRS) and a fast version (Algorithm 2: FFSKNRS).We evaluated the run times of the two methods.The results are listed in Table 4.The fast feature selection algorithm that is based on KNRS yields a better performance than Algorithm 1 is.

Comparison of the Effectiveness in Feature Selection.
In this section, we select KRS [5], NRS [22], and neighborhood entropy (NE) [28] as the comparison models with KNRS.The feature subsets that are selected by different algorithms are presented in Table 5.The features are presented in the order in which they were added to the feature space.
The KNS model cannot obtain any subsets of the data set "glass."Most of the feature subsets are slightly different.A small difference in feature quality may lead to a completely different ranking.The orders of the feature subsets reflect the relative significance of the features in terms of the corresponding measures.Therefore, the large differences among these selected features are due to the differences in the qualities of the feature measures.
Then, we build classification models with the selected features and test their classification performance based on a 10-fold cross-validation.average value and standard deviation are used to measure the classification performance.We compare KNRS, KRS, NRS, and NE in Tables 6 and 7, where the learning algorithms of CART and RBF SVM are used to evaluate the selected features.
The comprehensive correlation results are shown in Table 8.The number of wins for KNRS, KRS, NRS, and NE are 4, 0, 4, and 2, respectively.KNRS achieves the highest average classification accuracy by using fewer features.It is thus concluded that KNRS outperforms the other feature measures.We can interpret the results from two aspects.For the kernel model (KRS), the lower approximation is computed by the "min" operation.Then, the decision on a sample is dependent on the nearest sample.This procedure can lead to a decision error if the nearest sample is a noise point.We obtain a lower classification accuracy when there is considerable data noise in the samples.In contrast, neighborhoodbased binary relations can only be expressed in terms of 0 or 1. Neighborhood models (such as NRS and NE) cannot describe the fuzziness delicately like the kernel model.KNRS is a better choice because it integrates the advantages of the kernel and neighborhood models.

Conclusion and Future Work
As we know that, genetic algorithms and neural networks are famous mathematical model for pattern recognition, machine learning, and intelligent control.However, rough sets theory has also been successfully applied to these fields [5,13,15].In this study, we define a novel model, the kernel neighborhood rough sets model, which integrates the advantages of the neighborhood and kernel models.Moreover, the model is used in the problem of feature selection.The parameters of KNRS are also discussed in detail.Then, we evaluate the effectiveness of the fast feature selection algorithm.A comparison of the results shows that our model yields a better performance than classic models.
There are two potential directions for future work.First, many other rough sets models can be incorporated into KNRS, such as fuzzy rough sets and Pawlak rough sets.Evaluating the significance of features by using the confluent modes is an important issue.Second, the application of our model to big data is necessary.Consequently, the development of a version of KNRS within a distributed framework requires further attention.

8 Definition 11 .
Consider a variable precision kernel neighborhood decision system, U, C ∪ D, k, δ, α , where δ ≥ 0, 0 5 ≤ α ≤ 1, and k is a kernel function.Suppose B ⊆ C and U/D = D 1 , D 2 , … , D n .The positive, negative, and boundary regions of D in the space U, A, k, δ are, respectively, defined as follows: POS B D = KN B D , NEG B D = U − KN B D , BN B D = KN B D − KN B D 9 The size of the boundary increases the uncertainty in the decision system.The samples in the boundary generally have the same condition features but belong to different decision classes.This discrepancy leads to the poor classification

Definition 12 .
Consider a variable precision kernel neighborhood decision system U, C ∪ D, k, δ, α , where δ ≥ 0, 0 5 ≤ α ≤ 1, and k is a kernel function.Suppose B ⊆ C and U/D = D 1 , D 2 , … , D n .The dependency degree of D relative to B is defined as follows:

Definition 13 .
Consider a variable precision kernel neighborhood decision system U, C ∪ D, k, δ, α , where δ ≥ 0, 0 5 ≤ α ≤ 1, and k is a kernel function.Suppose B ⊆ C and U/D = D 1 , D 2 , … , D n .The significance of feature a in B is defined as follows: SIG a, B, D = γ B D − γ B−a D 11 SIG a, B, D is used to evaluate the significance of attribute a in subset B. That is, a is an important feature if it increases the dependency degree of subset B. As mentioned in Definition 11, we need subset B to be more consistent with decision D. Thus, we define the feature selection algorithm as follows.
Corollary 1.Given a variable precision kernel neighborhood decision system U, C ∪ D, k, δ, α , where Input: decision system U, C ∪ D, k, δ, α and stopping threshold ε Output: selected features red 1. red ← ∅, S ← U 2. while red ≠ C Algorithm 2: Fast feature selection based on a kernel neighborhood rough set (FFSKNRS).

Table 4 :
Comparison of run times (Sec).

Table 5 :
Subsets of the features selected with KNRS, KRS, NRS, and NE.

Table 8 :
Comprehensive correlations of the models.