Ensemble-Based Neighborhood Attribute Reduction: A Multigranularity View

,


Introduction
Up to now, attribute reduction is one of the key topics in Rough Set eory (RST) [1][2][3][4]. As a rough set-based feature selection, attribute reduction aims to reduce the data dimensions related to a given constraint. And the used constraint can be clearly explained by the considered measure such as approximation quality [5] and conditional entropy [6]. Actually, many established researches have employed di erent measures to design the corresponding algorithms. For example, Hu et al. [5] have studied uncertainty measures related to fuzzy rough set and then further explored approximation quality-based attribute reduction; Dai et al. [6][7][8] have investigated attribute reduction with respect to several types of conditional entropies. erefore, the corresponding reducts are actually the subsets of the condition attributes which meet the constraints de ned by using the measures such as approximation quality and conditional entropy, respectively. ese researches above have investigated di erent kinds of attribute reduction based on di erent measures, and these measures characterize the uncertainties or distinguishing abilities of the decision attribute through applying the information o ered by the condition attributes. However, it is worth noting that they all focus on one and only one xed parameter in attribute reduction [9][10][11], such as the xed Gaussian kernel parameter in fuzzy rough set or the xed radius in neighborhood rough set. From the viewpoint of Granular Computing (GrC) [12][13][14][15][16], applying one and only one parameter in rough set can only re ect the information over a fixed granularity [17][18][19]. erefore, the attribute reduction over a fixed granularity can be termed as the single-granularity attribute reduction. Nevertheless, the single-granularity attribute reduction fails to meet the requirements of practical applications. Generally speaking, the limitations of single-granularity attribute reduction can be mainly summarized as the following two aspects: (1) e single-granularity attribute reduction fails to select attributes from multilevel [20][21][22] or multiview [23,24]. For example, supposing that a reduct is obtained to preserve one given information granulation on the universe of discourse, this reduct may not be still the reduct for a little finer or coarser level of information granulation which may be caused by slight variation of data. In viewpoint of this, if it is requested to obtain a reduct with higher adaptability to the granularity world constructed by multilevel or multiview, it is necessary to establish the attribute reduction from the multigranularity view.
(2) e single-granularity attribute reduction may result in obtaining poor learning performance. e reduct derived from single-granularity algorithm may only acquire the ability from a fixed sight. It is mainly because the information over the fixed granularity may be lacking versatility and comprehensiveness. Actually, it is reported that given the same data sets, different granules can provide complementary predictive powers, and the learning performance may be improved by combining their information [25,26]. To improve the learning performance, it may be a feasible method to employ the information provided by different granularities for computing a reduct. erefore, it is necessary to generate a reduct through combining the information from the multigranularity view.
To overcome the limitations mentioned above, it is necessary to establish the attribute reduction from the multigranularity view. It contains two processes which are summarized as follows: (1) Construct the structure of multigranularity in RST. In this paper, the multigranularity structure will be constructed through employing different radii. As reported in [16,27], the radius-based neighborhood forms a information granule, then the neighborhood-based singlegranularity can be constructed by applying one and only one radius, and it follows that neighborhood based multigranularity can be constructed by applying a set of different radii. In this process, applying a smaller radius will generate a finer information granule while applying a greater radius will generate a coarser information granule. e different scales of information granules can offer us the multigranularity based results of neighborhood rough approximations [25,28]. erefore, the structure of multigranularity is naturally obtained in neighborhood rough set. (2) Design a framework to realize the attribute reduction from the multigranularity view.
e MultiGranularity View for Computing Reduct (MG-VCR) is constructed to realize the new algorithm, and the mechanism of MG-VCR is mainly faced up with two open problems which can be shown as follows: (1) How to select candidate attribute: Generally speaking, all the decision classes are considered as a union in the traditional process of computing reducts, which can be seen as global view [29]. However, the traditional approach paid less attention to the key condition attributes related to different decision classes. In other words, the reduct derived from the traditional approach may lead to the improvement of one decision class but the deterioration of the other decision class. To fill such a gap, we will apply the ensemble selector [30] from local view. And, the new strategy pays much attention to the key condition attributes related to different decision classes [31]. Moreover, the majority voting strategy is employed in the ensemble selector among different decision classes for selecting candidate attribute. (2) How to select granularity: Different from traditional single-granularity algorithms which focus on addressing the fixed problem over one given granularity, multigranularity attribute reduction is a framework which processes information from a new perspective. Actually, the multigranularity attribute reduction can consider the differences of the different granularities to generate the multigranularity reduct. Generally speaking, a natural way of finding multigranularity reduct is to generate reduct one by one in terms of each granularity. Nevertheless, employing all the granularities to generate reducts may be too time-consuming as the number of granularities increasing. To improve the time efficiency, given a set of granularities, fusing the information over some granularities of them may be an interesting attempt, such as only the finest granularity and the coarsest granularity are applied.
From the discussions above, the motivation of this paper is to construct a framework which can not only take the requirements of different decision classes into account but also realize the attribute reduction from multigranularity view. For one thing, to select attributes from the viewpoint of different decision classes, an ensemble selector [30] with majority voting strategy is applied. For another thing, to realize the attribute reduction from multigranularity view, a set of radii are employed to construct the neighborhood based multigranularity, then the new attribute reduction will be designed in terms of multigranularity view. e ensemble selector selects key condition attributes for different decision classes, and the multigranularity view may provide new strategies for more abilities related to the reduct. Immediately, the all process of constructing multigranularity attribute reduction is given in the following three steps: (1) Firstly, the multigranularity structure is constructed in terms of the radius-based neighborhood. A set of radii sorted in ascending order are applied, and then different granularities can be obtained in turn, i.e., the neighborhood based multigranularity can be constructed. (2) Secondly, to accelerate the process of computing reduct, the finest granularity and the coarsest 2 Complexity granularity instead of all the granularities are employed. (3) Finally, the fitness function searching strategy is employed to compute reduct. In each iteration, the candidate attribute can be derived from all or most of the decision classes. In this process, the majority voting strategy is employed to determine which attribute to be added into the potential reduct.
e main contribution of this work can be summarized as the following aspects: (1) construct the neighborhood based multigranularity related to the given different radii; (2) design the attribute reduction from multigranularity view; (3) combine the ensemble strategy with the multigranularity attribute reduction; (4) the experimental results are conducted on 15 UCI data sets, and the experimental analyses demonstrate that our algorithms can be effective in the classification-oriented attribute reduction and efficient in the computational for generating reduct. e structure of this paper is organized as follows. In Section 2, following some basic notions with respect to neighborhood rough set, the neighborhood based multigranularity is constructed. In Section 3, the frequently used measures and the traditional attribute reduction are presented. Section 4 introduces the ensemble selector from local view into the traditional framework firstly, and then the Multi-Granularity View for Computing Reduct (MG-VCR) is presented. e experimental results and the corresponding analyses are shown in Section 5. We conclude some remarks and perspectives for future research in Section 6.

Neighborhood-Based Multigranularity
In rough set theory, neighborhood [32] is an effective way to realize the information granule. It not only implies a simple way to express the information granules but also provides us a natural direction to construct the concept of the multigranularity.
Similar to other rough set theory [31,[33][34][35][36], the neighborhood rough set can also be formed in a decision system such that D � 〈U, AT ∪ D〉: U is the set of the samples, called the universe; AT is the set of the condition attributes which characterize properties of samples; D is the set of decision attributes for labeling samples. Note that the problem of single decision attribute is mainly discussed in this paper. erefore, we use D � d { } in which d is the considered single decision attribute. ∀x ∈ U and d(x) denotes the label or decision value of sample x. Assuming that the labels of samples are category, then an equivalence relation over d can be defined such that IND d � (x, y) ∈ U × U : d(x) � d(y) , by IND d , a set of the decision classes can be obtained such that U/IND d � X 1 , X 2 , . . . , X q . Furthermore, ∀X p ∈ U/IND d is referred to as the p-th decision class in rough set theory. Specially, the decision class which contains the sample x is denoted by [x] d . Definition 1. Given a decision system D, ∀A ⊆ AT, consider a radius δ, then a neighborhood relation can be defined as where Δ A (x, y) is the distance between samples x and y through using the information offered by A. Note that the Euclidean distance is employed in this paper. Following equation (1), it is not difficult to construct the neighborhood of sample x such that Definition 2. Given a decision system D, U/IND d � X 1 , X 2 , . . . , X q }, ∀A ⊆ AT and ∀X p ∈ U/IND d , the neighborhood lower and upper approximations of X p in terms of A are defined as From the viewpoint of the GrC, N δ A (x) can also be regarded as the neighborhood-based information granule [37][38][39]. Obviously, different values of δ can generate different results of N δ A (x). Supposing that a set of the radii T � (δ 1 , δ 2 , . . . , δ n ) in ascending order is considered, then given a sample x ∈ U, a set of the neighborhoods of x will be derived such that (N ese different scales of the neighborhoods imply a structure of multigranularity [25,37,40] (in this paper, a neighborhood and a granularity are equivalent terms). It is mainly because a smaller neighborhood expresses a finer granularity while a greater neighborhood expresses a coarser granularity. e details can be referred to Figure 1. Figure 1 shows us the variation of the neighborhood with the radius increasing. It should be noticed that, in real-world applications, the samples in the given decision system may process n attributes, and then the corresponding neighborhood can be shown in n-dimensional situation. To simplify the description, Figure 1 shown is drawn in 2-dimensional situation. In sub-figure 1, given a radius δ 1 , the samples x 1 , x 2 , x 3 are in the neighborhood of the testing sample y; in sub- figure 1, the radius is enlarged as δ 2 , then more samples, i.e., samples x 4 and x 5 , are included in the neighborhood of y; · · ·; finally, if the radius enlarged as δ n in sub-figure 1, then more samples such that x 6 , x 7 , x 8 , x 9 , x 10 fall into the neighborhood of y. In this process, sub-figure 1 implies the finest granularity and sub-figure 1 implies the coarsest granularity.
As what has been pointed out in Figure 1, applying different radii can generate different granularities, then the multigranularity lower approximation and multigranularity upper approximation can be established through using different radii, the details expressions can be shown as Complexity 3

Some Measures.
By considering the relation between neighborhood-based information granule and decision classes, a lot of measures have been explored in neighborhood rough set theory, especially for characterizing uncertainties and neighborhood-based attribute reduction. In this section, four measures will be addressed. Approximation quality is one of the most frequently used measures, and it can be used to evaluate the certainty of belongingness by using the explanation of lower approximation in rough set model. e formal definition of approximation quality is shown as follows.

Definition 3.
[41] Given a decision system D, U/IND d � X 1 , X 2 , . . . , X q , ∀A ⊆ AT, the approximation quality of d with respect to A is defined as where | · | denotes the cardinality of the set. Approximation quality reflects the percentage of the samples which belong to one of the decision classes by the explanation of lower approximations. erefore, the greater the value of the approximation quality, the higher the degree of certain belongingness by the explanation of lower approximations in RST.
Conditional entropy is another widely accepted measure which can characterize the discriminating ability of condition attributes related to decision attribute. Presently, with respect to different requirements, various definitions of the condition entropies have been proposed [6][7][8][42][43][44]. A typical form of the conditional entropy is shown as follows.
where the base of the logarithm "log" is 2.
e lower the value of the conditional entropy, the higher the discriminating ability of the condition attributes related to the decision attribute.
Similar to conditional entropy, neighborhood discrimination index [45] can also be used to characterize the ability of condition attributes to distinguish samples with different labels. However, conditional entropy requires the neighborhood based information granules derived from the neighborhood relation, and obtaining the neighborhood of each sample from neighborhood relation is time-consuming. To improve the time efficiency, Wang et al. [45] have proposed the measure of neighborhood discrimination index which can be directly obtained by the neighborhood relation instead of neighborhoods. And, the formal definition of the neighborhood discrimination index can be shown as follows.

Definition 5.
[45] Given a decision system D, ∀A ⊆ AT, the neighborhood discrimination index of d with respect to A is defined as e lower the value of the neighborhood discrimination index, the higher the discriminating ability of the condition attributes related to the decision attribute.
Another measure deserved to be investigated is neighborhood decision error rate (NDER) which was proposed by Hu et al. [5]. NDER is commonly used to evaluate the classification performance of the neighborhood classifier (NEC) [46] in neighborhood rough set theory. Immediately, the formal definition of NDER can be defined as the follows.

Definition 6.
[47] Given a decision system D, ∀A ⊆ AT, the neighborhood decision error rate of d with respect to A is defined as where Pre(x) is the predict label of the sample x derived from the used classifier.
x 1 x 2 x 3 x 1 x 5 x 4 x 8 x 10 x 9 x 1 4 Complexity e neighborhood decision error rate shows the percentage of samples which are misclassified with using neighborhood classifier.

Attribute Reduction.
Attribute reduction [48][49][50][51][52][53] is one of the key topics in rough set. Generally speaking, the purpose of attribute reduction is to remove the redundant attributes by a given constraint, and the constraint can be constructed by the measure such as approximation quality and conditional entropy. Up to now, many different types of the attribute reduction with respect to different requirements have been explored by researchers. However, it can be found that those definitions of attribute reduction may have similar structure. And, the general form of attribute reduction [54,55] can be shown as follows.
Definition 7. Given a decision system D, ∀A ⊆ AT, δ is a given radius, A is called the reduct in terms of the measure ψ if and only if In Definition 7, ψ is a given measure which can be "c," "E," "H," or "NDER" in this paper. It is noted that the ψ-constraint should vary with the used measure ψ, and this can be further classified as two aspects: (1) If a greater value is requested through using the measure ψ (such as ψ � c), then the ψ-constraint can be set as no lower value obtained than using the raw attribute set, i.e., ψ δ (A, d) ≥ ψ δ (AT, d) (2) If a lower value is requested through using the measure ψ (such as ψ � NDER), then the ψ-constraint can be set as no greater value obtained than using the raw attribute set, i.e., Not only the definitions of attribute reduction have the similar structure but also the reducts can be derived by the similar searching strategy. Two main strategies are frequently used to compute reducts, i.e., exhaustive searching strategy and fitness function-based searching strategy. e former one is computationally more expensive than the latter one when dealing with data with the rapidly growing scale of samples, and the latter one is usually adopted due to its computational efficiency. It is mainly because different from the exhaustive strategy, fitness function-based searching strategy applies the greedy mechanism to compute reduct. erefore, the reducts with respect to the measures shown in Section 3.1 will be computed by the similar fitness function-based searching strategy. To realize the fitness function-based searching strategy, the significances are required to evaluate the importance of the attributes. en, the significance of attribute can be shown as follows: where ψ δ (A, d) is the value obtained by using the information over A in terms of d with using the measure ψ. e significance function shown in equatuon (9) indicates that the greater the value of the ψ δ (A ∪ a i , d), the more important the attribute a i is added into A; while the significance function shown in equation (10) indicates that the lower the value of the ψ δ (A ∪ a i , d), the more important the attribute a i is added into A. As a frequently used method of this fitness functionbased searching strategy, the addition method [30,47] begins with an empty set and then selects the attribute with the maximum significance into the potential reduct in each iteration until the constraint is satisfied. en, in the following context, we will use the addition method to further realize the fitness function-based searching strategy for computing reduct in terms of the measure ψ.
en, the addition method for computing reducts is shown in Algorithm 1.
Obviously, it is trivial to compute that the time complexity of Algorithm 1 is O(|U| 2 · |AT| 2 ). is can be attributed to the following two aspects: (1) the time complexity for computing neighborhood relation is O(|U| 2 ) because any two samples in U should be applied for computing distance; (2) in the worst case, no attributes can be deleted from the raw attributes and then the iteration shown in Step 3 should be executed |AT| times, and in the i-th iteration, Step 3 will be executed |AT| − i + 1 times. Finally, the time complexity of Algorithm 1 is O(|U| 2 · |AT| 2 ).

Ensemble-Based Attribute Reduction.
Generally speaking, it is observed that the algorithms for computing reducts can be termed as a process of searching attributes. And, we can see from Algorithm 1 that when selecting attributes, the decision classes are considered as a union (global view), but the requirements of different decision classes have not been fully considered. In other words, more attention can be paid to the key condition attributes related to different decision classes (local view). Presently, there is an increasing awareness of local view in rough set theory. is is mainly because local view may not only provide us more details of data but also offer us a direction to study the relationship among structures with respect to different decision classes. For instance, Chen and Zhao [29] have proposed the concept of local measures and investigated the structure of local attribute reduction in fuzzy rough set decision system, which provided a new insight into the problem of attribute reduction. Following this strategy, Song et al. [31] have investigated the local attribute reduction from the viewpoint of decision cost, different from the traditional approach which considered all decision classes from the global view, and they focused on the attributes with close connection to one given decision class than other ones. Yao and Zhang [53] have investigated the relationship between the classification-based attribute reduction (global view) and class-specific attribute reduction Complexity 5 (local view).
ough some previous results about local approaches have been explored in rough set theory, few of them have paid attention to the ensemble strategy from the local view. erefore, to consider the different requirements of different decision classes, the ensemble strategy from local view may be a feasible solution. Immediately, the local measures [56] and the ensemble strategy from local view can be presented in the following context. e flowing Definition 8 will show the local measures, and the four local measures corresponding to the measures shown in Section 3.1 can be defined as Different from the measures shown in Section 3.1, the local measures only apply the specific decision class instead of all the decision classes together. For instance, the approximation quality shown in Section 3.1 reflects the percentage of the samples which belong to one of the decision classes by the lower approximations, while the local approximation quality reflects the percentage of the samples which belong to the decision class X p by the corresponding lower approximation. Furthermore, the significances in terms of the decision class X p can also be shown as follows: e significance with respect to X p shows the degree of importance of the attribute a i in the individual decision class of X p . Similarly, equations (11) and (12) are suitable for different local measures; if a greater value is requested, then equation (11) is applied; otherwise, equation (12) is applied.
As the local measures and significances are presented above, then the ensemble strategy from the local view will be applied into the process of computing reduct. is process contains two key steps shown as follows: (1) Each decision class selects an attribute with the maximal value of significance with respect to itself in each iteration (2) e majority voting is employed to determine which attribute to be added into the potential reduct. It is noticed that the attribute with the highest frequency is selected as the candidate attribute each iteration, and if there are more than one attributes with the same highest frequency, the attribute which ranks lower/lowest will be selected Take the reduct in terms of approximation quality as an example, the attribute with the maximal growth of the approximation quality is selected in each iteration using Algorithm 1; while in the new ensemble-based algorithm, the attribute with maximal growth of most of the lower approximations (local approximation quality) of different decision classes is selected. e reason comes from that if most of the lower approximations of decision classes have been increased, then the approximation quality in union can also increase. e following example will show the process of the ensemble strategy among different decision classes. Example 1. As the decision system shown in Table 1, U � x 1 , x 2 , . . . , x 6 is the set of samples, AT � a 1 , a 2 , . . . , a 6 is the set of condition attributes, and d is the decision attribute.
Take the measure of "c" as an example, supposing that If Algorithm 1 is applied, then the obtained reduct is Immediately, c δ (A 1 , X 1 ) � 1.0000, c δ (A 1 , X 2 ) � 0.6667, and c δ (A 1 , X 3 ) � 0 can be obtained. If the ensemble strategy is employed in the process for computing reducts, then the obtained reduct is A 2 � a 1 , a 2 . Actually, in the first iteration of this new process, the attribute with the maximum value of significance with respect to the decision class of Inputs: decision system D, radius δ. Outputs: a reduct: A.
ALGORITHM 1: Addition method for computing reduct (AMCR). 6 Complexity is time the three attributes are with the same frequency. As a 1 ranks lowest in the first iteration, then a 1 is added into the potential reduct. In the next iteration, the selected attribute with respect to the decision class of X 1 , X 2 , X 3 is a 2 , a 2 , a 2 , respectively, then a 2 is the attribute with the highest frequency. erefore, a 2 is added into the potential reduct. Furthermore, the constraint is satisfied as c δ (A 2 , d) � 0.8333 > c δ (AT, d). Immediately, c δ (A 2 , X 1 ) � 1.0000, c δ (A 2 , X 2 ) � 1.0000, and c δ (A 2 , X 3 ) � 0 can be obtained.
It is observed that compared with the Algorithm 1, the ensemble strategy is employed, then the value of the local approximation quality with respect to the decision class X 2 has increasing, and the values of local approximation quality with respect to the decision class X 1 and X 3 remain unchanged. e above results tell us that the ensemble strategy among different decision classes can keep or improve the performance of single decision class. And, the similar mechanism can be applied for other measures.

Multigranularity
Attribute Reduction. Algorithm 1 shows the process of generating reducts over one and only one granularity, i.e., the algorithm from single-granularity view. As pointed in Section 1, the single-granularity attribute reduction has some inherent limitations. erefore, in the following context, the attention can be paid to the multigranularity attribute reduction. And, the realization of multigranularity attribute reduction is made up with two key processes: (1) construct the multigranularity structure in RST; (2) realize the attribute reduction from the multigranularity view. As shown in Section 2, given a set of radii T � (δ 1 , δ 2 , . . . , δ n ) which is in ascending order, it follows that different neighborhoods can be constructed by applying different radii. Furthermore, the multigranularity structure [28,40,57] can be constructed. en, the multigranularity attribute reduction can be designed in the following contexts.

Definition
9. Given a decision system D, U/IND d � X 1 , X 2 , . . . , X q , ∀A ⊆ AT, an ascending order set T, A is called the multigranularity reduct in terms of the measure ψ if and only if (1) A meets the ψ G -constraint (2) ∀A ′ ⊂ A, A ′ does not meet the ψ G -constraint Similar to the Definition 7, the measure ψ can also be "c," "E," "H," or "NDER," and the ψ G -constraint is a multigranularity constraint which should also vary with the used measure ψ. But, the result should satisfy the multigranularity constraint instead of the single-granularity. Let us see the multigranularity constraint; a simple way to design the multigranularity constraint is to fuse all the constraints [25] with respect to all the considered granularities. However, it will bring us two challenges: (1) the complexity of the fused constraint will lower the speed of reduction process; (2) too many constraints will result in the difficulty of eliminating attributes. erefore, to overcome the limitations above, we will develop a quick process which is based on the fusion related to the finest and the coarsest granularities. It is mainly because that, for a given testing sample, applying the finest granularity, the fewest neighbors are obtained and applying the coarsest granularity, the most neighborhoods are obtained. And, this performance can also be clearly observed from Figure 1 in Section 2.
Section 4.1 shows the ensemble-based attribute reduction with majority voting strategy, the context in section analyzes the multigranularity view, and then the new proposed algorithm will combine the multigranularity view with the ensemble strategy together. e new algorithm will be termed as Multi-Granularity View for Computing Reduct (MG-VCR). Similar to Algorithm 1, MG-VCR compute reducts through using the addition method. However, there are two main differences summarized as follows: (1) MG-VCR selects attributes from the viewpoints of different decision classes instead of the union, and the ensemble selector with the majority voting strategy is employed to select the candidate attribute (2) e multigranularity view will be employed in the MG-VCR, and as only the finest granularity and the coarsest granularity are applied, then given a radii set T � (δ 1 , δ 2 , . . . , δ n ), only δ 1 and δ n are employed in the process of computing reducts.
e first one gives a general framework about how to select candidate attributes, and the second one shows the strategy about how to select granularities in MG-VCR. en, the detailed process using the addition method will be designed as follows: In Algorithm 2, the time complexity of computing reduct is O(q · |U| 2 · |AT| 2 ), in which q is the number of decision classes.
Step 3 is the key in the process of computing reducts, what should be emphasized is the context shown as follows: (1) e finest and the coarsest granularity each takes up the weight η δ t to realize the multigranularity view, and the condition η δ 1 + η δ n � 1 holds. (2) (i) the process of the ensemble selector among the different decision classes X p (p � 1, 2, . . . , q); (ii) the majority voting mechanism which is used for selecting the candidate attributes. It is noted that the "Psig" is the weighted average values related to the decision class "X p ." In this process, we first compute the weighted average values related to each decision class through using the finest granularity and the coarsest granularity, and then the ensemble-based  Finally, it is noticed that if only one granularity is used in this process, the algorithm will degenerate into singlegranularity process, i.e., Single-granularity View for Computing Reduct (SG-VCR). Given the radii set T � (δ 1 , δ 2 , . . . , δ n ), SG-VCR should execute n times to obtain all the reducts, while MG-VCR only need execute one which may reduce the elapsed time as it applies the multigranularity view. e following example shows us a general process of the Multi-Granularity View for Computing Reduct (MG-VCR). Suppose that δ 1 � 0.03, δ n � 0.30, and η δ 1 � η δ n � 1/2 (η 0.03 � η 0.30 � 1/2). Table 1, take approximation quality as an example, as c δ 1 (AT, d) � 0.6667, c δ n (AT, d) � 0.3333 can be obtained, then the fusion value by the raw attributes (AT) is 0.5000.

Example 2. As the decision system shown in
It is noted that the significances of attributes are the also the fusion values (weighted average values). As q � 3 can be obtained, the significances of each attribute will be computed 3 times to obtain the importance degree related to each decision classes. Immediately, the most important attribute related to each decision classes will be obtained. In this process, the attribute with the maximum fusion value of significance with respect to the decision class of X 1 , X 2 , X 3 is a 1 , a 2 , a 1 , respectively. en, a 1 is added into the potential reduct (A) using the majority voting strategy. is employs the same strategy shown in Example 1.
en, we should know if the potential reduct meets the ψ G -constraint. Finally, as c δ 1 (A, d) � 0.6667, c δ n (A, d) � 0.5000 can be obtained, the fusion value by the A is 0.5834 which is larger than the fusion value by the raw attributes. erefore, the reduct derived from the Algorithm 2 is A � a 1 .

Experimental Analysis
In this section, to validate the effectiveness of MG-VCR proposed in this paper, 15 UCI data sets have been employed to conduct the experiment. e basic description of data information is shown in Table 2. All the experiments have been carried out on a personal computer with Windows 7, dual-core 1.50 GHz CPU, 8 GB memory. e programming language is MATLAB R2016a.
Fivefold cross-validation (5-CV) is used for evaluating the effectiveness of the proposed new algorithm. 5-CV divides all samples into 5 groups of the same size, and then four groups compose the training set to compute the reduct, while the one group composes the testing set to obtain the results for comparisons. e algorithm process repeats 5 times in turn, and the mean values are recorded. Besides, in the previous works [30,58], it has been experimental proved that [0.1, 0.3] is an optimal candidate interval for the radius, where most of the classifiers can get good classification performance; to this end, 10 different values of δ such that 0.03, 0.06, . . . , 0.30 are selected. For SG-VCR, all the 10 radii will be employed to compute 10 reducts. As only the finest and the coarsest granularities are employed in the MG-VCR, the radii applied are 0.03 and 0.30, and the weight values are set as η δ 1 � η δ n � 1/2 (η 0.03 � η 0.30 � 1/2) in this paper; it should be emphasized that the MG-VCR only obtain a reduct using the given radii.
In the following context, "single-granularity algorithm" and "multigranularity algorithm" can be applied to denote the algorithm of Single-granularity View for Computing Reduct (SG-VCR) and Multi-Granularity View for Computing Reduct (MG-VCR), respectively. And, the results related to different reducts can be classified as two types. (1) Inputs: decision system D, U/IND d � X 1 , X 2 , . . . , X q , a set of radii T � (δ 1 , δ 2 , . . . , δ n ). Outputs: a multigranularity reduct: A.
(2) Record the frequencies of each attribute a p j and select the one with the highest frequency: b; 8 Complexity APP, ENT, DIS, and ERR denote the results related to the reduct derived from single-granularity algorithm in terms of approximation quality, conditional entropy, neighborhood discrimination index, and neighborhood decision error rate, respectively. And, these reducts can be collectively termed as single-granularity reducts.
(2) MGAPP, MGENT, MGDIS, and MGERR denote the results related to the corresponding reduct derived from the multigranularity algorithm in terms of approximation quality, conditional entropy, neighborhood discrimination index, and neighborhood decision error rate, respectively. And, these reducts can be collectively termed as multigranularity reducts. Table 3 shows the comparisons among the lengths of reducts, and the reducts are derived from single-granularity algorithm and multigranularity algorithm in terms of different measures.

Comparisons among the Lengths of Reducts.
With a careful investigation of Table 3, it is not difficult to observe the following.
In the aspect of the lengths, the reducts derived from multigranularity algorithm are greater than those derived from single-granularity algorithm generally. In other words, more attributes are added into the reduct set with using the multigranularity algorithm. ough there may exist some fluctuations between the measures of "DIS" and "MGDIS," the differences between the lengths with respect to the "DIS" and "MGDIS" are not obvious. Furthermore, the average values in the last row can demonstrate that reducts derived from multigranularity algorithm need more attributes to be added into the reduct than single-granularity algorithm. To sum up, the multigranularity algorithm can almost coincide with that of single-granularity algorithm in the lengths of reducts. Table 4 shows the comparisons among the elapsed time related to singlegranularity algorithm and multigranularity algorithm in terms of different measures. It should be noticed that the elapsed time of the single-granularity algorithm is the summate of the processes through using the constraints related to the 10 granularities, and the elapsed time of the multigranularity algorithm is the one process using the fusion constraint related to the finest granularity and the coarsest granularity. In this experiment, the obtained multigranularity reduct will represent all the reducts related to all the given granularities, and the finest granularity and the coarsest granularity are only employed to select the attributes. erefore, the multigranularity reduct is more than the reduct related to the used granularities, and it will be employed over all the given 10 granularities. Besides, the unit of the elapsed time is "s."

Comparisons among the Elapsed Time.
With a careful investigation of Table 4, it is not difficult to observe the following: (1) e multigranularity algorithm requires less time compared with the corresponding single-granularity algorithm. Furthermore, the total elapsed time of single-granularity algorithm in terms of all the fixed radii is five times as much as the elapsed time of multigranularity algorithm generally. It is mainly because the single-granularity algorithm applied all the given granularitities for computing reducts while the multigranularity algorithm only applied the finest granularity and the coarsest granularity.
(2) ere also exists an exception: in the data set "Wine Quality" (ID: 15), the elapsed time of computing "MGDIS" is greater than computing "DIS." is can mainly attribute to the too great value of the length; this can be observed in Table 3; the "MGDIS" is almost eight times as long as the "DIS." (3) Besides, the average elapsed time listed in the last row demonstrates that the multigranularity algorithm requires less time than single-granularity algorithm. To sum up, the elapsed time in Table 4 demonstrates that the multigranularity algorithm can improve the time efficiency compared with single-granularity algorithm.

Comparisons among the Classification Accuracies
Using NEC. Figure 2 shows us the comparisons among classification accuracies with respect to the single-granularity reducts and the multigranularity reducts using NEC. It is noted that the multigranularity algorithm obtains one multigranularity reduct, and this reduct will represent all the reducts related to n (n � 10 in our experiment) granularities. erefore, the one multigranularity reduct will be employed under all the granularities. In the following charts, the X-axis represents different kinds of reducts, and the Y-axis represents the corresponding values of classification accuracy.
With a careful investigation of Figure 2, it is not difficult to observe the following:    (1) e multigranularity reducts can improve classification accuracy compared with the single-granularity reducts. Take the data set of "Ionosphere" (ID: 8) as an example, the mean values of the classification accuracies derived from the single-granularity reducts are 0.8800, 0.8726, 0.8074, are 0.8686 with using APP, ENT, DIS, and ERR, respectively. While the mean values of the classification accuracies derived from the multigranularity reducts are 0.9057, 0.9154, 0.8646, and 0.9023 using MGAPP, MGENT, MGDIS, and MGERR, respectively. is demonstrates that the attribute reduction from multigranularity view can improve the classification accuracy.
(2) In the data set "Wine Quality" (ID: 15), the length of reduct shown in Table 3 indicates that more attributes are added into the reduct set using the reduct "DIS," while the mean value of classification accuracy in terms of "MGDIS" is also much greater than the value in terms of "DIS." Besides, the multigranularity reducts can generally improve the classification performance than the single-granularity reducts. is may indicate that though there exists some difference among different granularities, considering the finest and the coarsest granularities may be a feasible way to introduce the multigranularity thing into the attribute reduction. Table 5 shows the comparisons among the classification accuracies related to the raw attributes, single-granularity reducts, and multigranularity reducts. To felicitate the readers, the greatest values of classification accuracy are in bold, and the smallest values are in italics.
With a careful investigation of Table 5, it is not difficult to observe the following: (1) As for the measure related to the approximation quality and neighborhood discrimination index, among the comparisons, the largest value tends to be the one related to the multigranularity reduct, and the smallest value tends to be the one related to the single-granularity reduct. It implies that overfitting [59] may occur in most data sets using these singlegranularity reducts. In other words, the singlegranularity reducts perform better than the raw attributes in training samples but perform worse in testing samples. (2) As for the measure related to conditional entropy and neighborhood decision error rate, among the comparisons, the largest value tends to be the one related to the multigranularity reduct, and the smallest value tends to be the one related to the raw attributes. (3) To sum up, whether the singe-granularity reducts perform well or not, the multigranularity reducts can improve the classification performance.
e results shown in this subsection demonstrates that the multigranularity view may suggest new trends for improving the classification performances related to the reducts.
Furthermore, the Wilcoxon Rank Sum Test method will be employed for comparing the distributions with respect to the classification accuracies derived from the single-granularity algorithm and multigranularity algorithm. e purpose of our comparison is trying to reject the null-hypothesis that the distributions of the classification accuracies are significantly different. Assuming that the threshold is set as 0.05, if the value is greater than 0.05, we then reject the null-hypothesis, i.e., the distributions of the classification accuracies are similar.
It is observed from Table 6 that the obtained values are greater than 0.05 generally. As for the obtained values which are less than 0.05, we can find from Figure 2 that the mean values with respect to multigranularity reducts are greater than those with respect to the single-granularity reducts generally. Take the data set of "Wine Quality" (ID: 15) as an example, in the comparison of "DIS & MGDIS," the result with the method is less than 0.05, but the mean value of classification accuracy in terms of "DIS" and "MGDIS" is 0.4779 and 0.5507, respectively. e only exception is the data set "Vertebral Column" (ID: 13), in the comparison of "ERR & MGERR," the mean value of classification accuracy in terms of "ERR" and "MGERR" is 0.7768 and 0.6935, respectively. To sum up, there is not much significant difference between these two distributions of classification accuracies, and the multigranularity reducts even improve the classification performance than the single-granularity reducts.

Comparisons among the Classification Accuracies Using SVM and CART.
To further verify the effectiveness of multigranularity algorithm, the other two popular classifiers are also applied which are SVM [60][61][62] and CART [63]. To make the comparison feasible, the mean classification accuracies with respect to the single-granularity reduct are computed by those responding 10 values.
Tables 7 and 8 display the comparisons among the classification accuracies with respect to raw attributes, the single-granularity reducts, and the multigranularity reducts using SVM and CART, respectively. To felicitate the readers, the greatest values of classification accuracy are in bold, and the smallest values are in italics.
With a careful investigation of Tables 7 and 8, it is not difficult to observe the following: (1) In Table 7, the classification accuracies related to the raw attributes are largest generally. And, in Table 8, the percentage that the classification accuracies related to the raw attributes is largest and almost 55%. It implies that overfitting occurs in most data sets with using SVM or CART.
(2) e classification accuracies with respect to the multigranularity reducts are greater than the singlegranularity reducts generally. Actually, in Table 7, among the comparisons (15 UCI data sets), the performance of "MGENT" is the best, the larger values take up 80% (12 data sets); and as for "MGAPP," "MGDIS," and "MGERR," they all get 66.7% (10 data sets). e similar performance can be seen in Table 8, the performance of "MGDIS" and the "MGERR" is the best, and the larger values both take up 86.7% (13 data sets); and the suboptimal one is "MGAPP" which gets 73.3% (11 data sets), the performance of "MGENT" which gets 66.7% (10 data sets).
(3) To sum up, the multigranularity reducts are superior to those single-granularity reducts in improving classification accuracies. As presented in Tables 7 and  8, the multigranularity reducts can improve the classification accuracies generally. And the average values of the 15 UCI data sets in both the tables can also demonstrate that the classification accuracies related to the multigranularity reducts are greater than the single-granularity reducts. To sum up, the multigranularity reducts perform better than those single-granularity reducts with using SVM and CART.
e Anoval function was used to verify whether there is a significant difference existing in the distributions with respect to the classification accuracies. Similar to the Wilcoxon Rank Sum Test shown above, the purpose of the comparison is trying to reject the null-hypothesis that the distributions of the classification accuracies are significantly different. Assuming that the threshold is set as 0.05 and if the value is greater than 0.05, we then reject the null-hypothesis, i.e., the distributions of the classification accuracies are similar. And the results of the comparison among the distributions using SVM and CART are listed in the following Tables 9 and 10, respectively.
Following the results of Tables 9 and 10, we can find that the values are greater than 0.05 generally. In the following, we will explain these results in Tables 9 and 10, respectively.     14 Complexity (1) As for those values which are equal toor less than 0.05 in Table 9, it can be observed from Table 7 that the classification accuracies with respect to the multigranularity reduct are greater than the ones in terms of the corresponding single-granularity reducts. Take the data set of "Vertebral Column" (ID: 13) as an example, there are two values no greater than 0.05: in the comparisons of "ENT & MGENT" and "ERR & MGERR." However, the classification accuracies with respect to these reducts (ENT, MGENT, ERR, and MGERR) are 0.7600, 0.8194, 0.7832, and 0.8323. We can find that the values in terms of "MGENT" ("MGERR") are greater than "ENT" ("ERR"). (2) e similar discovery can be observed from Table 10.
As for the values which are the same to or less than 0.05, the multigranularity reducts provide greater values of classification accuracies. However, the only exception is the data set "Ionosphere" (ID: 8); in the comparison of "APP & MGAPP," the value of the result is less than 0.05, and the mean values in term of "APP" and "MGAPP" is 0.8991 and 0.8800, respectively.
To sum up, we can conclude that there is not much significant difference between the distributions among the classification accuracies with respect to the single-granularity reducts and the multigranularity reducts. Furthermore, not only the multigranularity reducts can provide better classification accuracies than the single-granularity reducts but also the multigranularity algorithm can improve the time efficiency.

Conclusions and Future Perspectives
In this paper, a framework of multigranularity view for computing reduct has been proposed. Different from the traditional algorithm for reduct which uses only one fixed granularity, our algorithm is executed from the multigranularity view. In the experiment, the finest and the coarsest granularities are employed to realize the multigranularity framework. Furthermore, to select attributes from the viewpoint of different decision classes, the ensemble strategy is introduced into the traditional fitness function based searching strategy, and the majority mechanism is employed to choose the attribute with the highest frequency with respect to different decision classes. Compared with the single-granularity view for computing reduct, the proposed algorithm can not only reduce the elapsed time but also the reducts derived from the new algorithm can improve the classification accuracies.
e following topics deserve our further investigations: (1) Both the finest granularity and the coarsest granularity each take up the weight of 1/2 in this paper, and different weights with respect to granularities may be conducted with different requirements in the future investigations. (2) e weighted average values related to the finest granularity and the coarsest granularity are applied in the multigranularity algorithm, and other more granularities or other more fusion strategies can be considered as the multigranularity view to realize the attribute reduction. (3) To consider the different requirements of different decision classes, the ensemble selector with the majority voting mechanism is employed to select the suitable attribute in our experiment, and more types of strategies may be investigated among decision classes in the future.
Data Availability e (UCI) data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.