Combined Accelerator for Attribute Reduction: A Sample Perspective

. In the ﬁeld of neighborhood rough set, attribute reduction is considered as a key topic. Neighborhood relation and rough approximation play crucial roles in the process of obtaining the reduct. Presently, many strategies have been proposed to accelerate such process from the viewpoint of samples. However, these methods speed up the process of obtaining the reduct only from binary relation or rough approximation, and then the obtained results in time consumption may not be fully improved. To ﬁll such a gap, a combined acceleration strategy based on compressing the scanning space of both neighborhood and lower approximation is proposed, which aims to further reduce the time consumption of obtaining the reduct. In addition, 15 UCI data sets have been selected, and the experimental results show us the following: (1) our proposed approach signiﬁcantly reduces the elapsed time of obtaining the reduct; (2) compared with previous approaches, our combined acceleration strategy will not change the result of the reduct. This research suggests a new trend of attribute reduction using the multiple views.


Introduction
With the rapid development of technology, a large number of high-dimensional data come into being such as video, image, and text.e high dimensionality of data [1][2][3][4][5] may contribute to a great challenge for decision-making and data analysis.Attribute reduction [6][7][8][9][10][11][12], as an efficient tool for reducing the dimensionality of data [13][14][15], has been widely concerned by many researchers.
In the topic of attribute reduction, there are two important notions for the process of obtaining the reduct.One fundamental notion is binary relation [16], and the other is rough approximation.
Firstly, binary relation is effective to conduct information granulation [17][18][19][20][21].In the classical rough set [22][23][24][25], binary relation is an equivalence relation, and each single equivalence class can be regarded as an information granule [26].Nevertheless, the equivalence relation can only deal with categorical data, and this somewhat limits the applications of the rough set.erefore, several generalized binary relations and the corresponding rough sets have been proposed [27][28][29][30]31].Among those generalized rough approximations, the neighborhood rough set offers us a useful mechanism to process continuous data. is is mainly because the neighborhood relation can be directly obtained via computing the distance between samples.From the viewpoint of Granular Computing (GrC), the neighborhood rough set groups the samples into different neighborhood granules [16].
To obtain the neighborhood granule for each sample, all the other samples in the data should be scanned by using traditional approaches [26,32,33].With the increasing of the size of data, the elapsed time of computing neighborhood granules will be unacceptable.To save the time consumption of computing neighborhood granules, Liu et al. [34] have introduced the hash function into the calculating of neighborhood.Such hash function can map all the samples into a series of buckets.erefore, only samples in the adjacent buckets should be considered to obtain a neighborhood granule.It should be emphasized that such a method provides a novel approach for reducing the time consumption of obtaining the neighborhood granule from the perspective of the sample.
Secondly, rough approximation is a general concept represented by a pair of sets, which is always characterized via the so-called upper and lower approximations.Following the detailed description of the mechanism of bucket, it is not difficult to observe that the bucket only reduces the scanning space of obtaining the neighborhood granule from the viewpoint of the sample.However, the mechanism of bucket takes all the samples into consideration when computing the lower approximation.If the scale of data is becoming larger, then the elapsed time of deriving lower approximation will be significantly increased.For such a reason, Qian et al. [35] have proposed the local approximation which aims to remove the useless information in the process of obtaining the lower approximation.Such a strategy only uses the samples in the target concept instead of the samples in the whole universe for deriving the lower approximation.
erefore, such a method fully meets the requirements of big data analysis.
However, no matter which strategy is employed, it can only accelerate the process of obtaining the reduct from one and only one point of view.Motivated by further reducing the elapsed time of obtaining the reduct, a new approach which combines the above two crucial notions will be proposed from the perspective of the sample.Firstly, the mechanism of bucket is used, which aims to reduce the scanning space of deriving neighborhood granules.Secondly, by using the local approximation, the useless information granules will be removed for quickly computing the lower approximation.Finally, the appropriate attributes will be selected until the intended constraint defined in the attribute reduction is satisfied.Obviously, the main contribution of our approach is to take both advantages of bucket and local approximation, and then it may be more effective for reducing the elapsed time of obtaining the reduct.It should be noticed that though our approach pay much attention to the viewing of the sample, our objective is actually achieved by considering two different perspectives of the sample.One is the sample related to constructing information granule, and the other is the sample related to constructing lower approximation.
e remainder of this paper is organized as follows.In Section 2, the basic knowledge of the neighborhood rough set will be presented.In Section 3, the mechanism of bucket, local approximation, and our proposed strategy will be described in detail.In Section 4, comparative experimental results and the corresponding analyses will be shown over 15 UCI data sets.Finally, some remarks and perspectives for the future work conclude the paper in Section 5.

Preliminaries
2.1.Binary Relation.Without loss of generality, a decision system can be represented as DS � 〈U, AT ∪ d〉 in which U is the set of samples called the universe, AT is the set of conditional attributes, and d is the decision attribute.Furthermore, ∀x i ∈ U, a(x i ) indicates its value over the conditional attribute a ∈ AT, and d(x i ) denotes its value of decision attribute.
With regard to a given decision system DS, to derive the result of information granulation over U, various forms of binary relations have been explored in terms of different requirements.Two typical forms will be analyzed as follows: (i) One is the equivalence relation.If information granulation is required over the categorical data, then the binary relation can be given by the equivalence relation.For example, in supervised learning task, the values over decision attribute in a decision system are categorical.To derive the result of information granulation over the decision attribute, the equivalence relation } can be employed.en, the result of information granulation (i.e., partition) can be derived, and the value of s is the number of equivalence classes.Specifically, each equivalence class in such partition is referred to an information granule.(ii) e other is the parameterized binary relation.If information granulation is required over the continuous data, then the binary relation can be given by parameterized binary relation.Take the neighborhood relation as an example, and the values over conditional attribute in the decision system are continuous; to derive the results of information granulation over the conditional attribute, the neighborhood relation δ A � (x i , x j ) ∈ U × U | dis A  (x i , x j ) ≤ δ} can be used, in which dis A (x i , x j ) is the distance between samples x i and x j over A, and δ is the radius.It follows that neighborhood system [36] can be derived.Specifically, each neighborhood obtained by δ A (x i ) � x j ∈ U | (x i , x j ) ∈ δ A   is referred to an information granule.

Neighborhood Rough Set
Definition 1.Given a decision system DS, ∀A ⊆ AT and ∀X p ∈ (U/IND d ) (1 ≤ p ≤ s), the neighborhood lower and upper approximations of X p with respect to A are defined as e pair [N A (X p ), N A (X p )] is referred to as a neighborhood rough set of X p in terms of A. Definition 2. Given a decision system DS, the approximation quality of d over A ⊆ AT is defined as in which |X| denotes the cardinality of set X.

Mathematical Problems in Engineering
From the viewpoint of rough set theory, approximation quality reflects the percentage of the samples which belongs to one of the decision classes by the explanation of neighborhood lower approximation.
e greater the value of approximation quality, the greater the degree of such belongingness.

Attribute Reduction.
Attribute reduction [32,[37][38][39][40][41] aims to find an attribute subset without reducing some considered properties which are related to the raw attributes.Take the approximation quality-based attribute reduction as an example; such an attribute reduction is to obtain a minimal attribute subset that will not decrease the degree of the belongingness shown in Definition 2. e definition of the approximation quality reduct can be presented as follows.
Definition 3. Given a decision system DS, ∀A ⊆ AT, A is referred to as an approximation quality-based reduct if and only if Following the detailed description, generating the approximation quality reduct should satisfy two conditions.
e first condition implies that the value of approximation quality over AT should not be reduced, and the second condition indicates that each attribute in A is individually necessary.
To find the approximation quality reduct, a lot of researching results have been put forward using various searching strategies [7,8,42,43].
e forward greedy searching strategy has been widely accepted for its lower time complexity.Notably, the significance function is a necessary factor in the framework of forward greedy searching.is is mainly because each candidate attribute should be evaluated by the significance function.e definition of significance function is shown in the following.Definition 4. Given a decision system DS, ∀A ⊆ AT, the significance of the candidate attribute a with respect to A is defined as Following Definition 4, the detailed process of obtaining the reduct by using the forward greedy searching strategy can be shown in the following.
In Algorithm 1, the time complexity of the computing neighborhood relation is O(|U| 2 ). is is mainly because the distance between any two samples should be computed when computing the neighborhoods of all the samples.In the worst case, no attribute is redundant, and then all the attributes in the decision system should be added into the reduct.
erefore, the time complexity of Algorithm 1 is

Bucket-Based Attribute Reduction.
As what has been indicated in Section 2.1, the neighborhood of one sample can be represented as the granule.In Algorithm 1, to obtain the granule, it is necessary to scan all the samples in the decision system.However, if the number of samples, i.e., the value of |U| increases to a large scale, then the time consumption of obtaining the granule can be unacceptable.To alleviate the phenomenon of excessive time consumption, Liu et al. [34] have proposed the mechanism of bucket which employs a hash function to map all the samples into a series of sequenced buckets, and then only samples in the adjacent buckets should be considered instead of the whole samples in the universe.e definition of the bucket is shown in the following.
Definition 5 (see [34]).Given a decision system DS, x 0 is a special sample constructed from U, where ∀a ∈ AT, a(x 0 ) � min a(x i ): ∀x i ∈ U  .en, the samples in U can be divided into finite buckets B 0 , . . ., B t .B t can be denoted as Theorem 1 (see [34]).Given a decision system DS, B 0 , . . ., B t are the buckets, then ∀x i ∈ B q (q � 1, 2, . . ., t − 1), and the neighborhood of Notably, only samples in the adjacent buckets should be taken into consideration for obtaining the information granule through observing eorem 1. erefore, the scanning space of obtaining the information granule can be reduced.It follows that the elapsed time of calculating the information granule can be decreased.Following Definition 5 and eorem 1, forward greedy searching based on the bucket is presented as follows.
Different from Algorithm 1, the number of samples which should be scanned is It follows that the mechanism of bucket can reduce the number of times to scan samples for obtaining an information granule.
erefore, the efficiency of obtaining the reduct can be improved by using Algorithm 2.

Local Approximation-Based Attribute Reduction.
However, the mechanism of bucket shown in Section 3.1 only considers how to reduce the time consumption of obtaining the neighborhood of each sample.In the process of obtaining the reduct by considering the measure of approximation quality, the elapsed time of deriving the lower approximation is also time consuming.Following such Mathematical Problems in Engineering consideration, Qian et al. [35] have proposed a local approach which can reduce the time consumption of computing lower approximation.
rough observing the process of Algorithms 1 and 2, equation ( 1) is adapted to compute the lower approximation, and then the value of approximation quality can be derived from the obtained lower approximation.rough observing equation (1), it is not difficult to observe that calculating the lower approximation should use all the information granules obtained by scanning all the samples, which is exceedingly time consuming.In fact, only the set of information granules is useless.erefore, the local rough approximation is defined as follows.
ALGORITHM 2: Bucket-based forward greedy searching strategy for obtaining the reduct.4 Mathematical Problems in Engineering By using equation ( 6), the elapsed time of obtaining the lower approximation can be decreased, and then the time consumption of obtaining the reduct can also be reduced.
e detailed algorithm is shown as follows.
In Algorithm 3, to compute the lower approximation of X p , only the samples in X p instead of the whole samples in U should be used.erefore, the time complexity of obtaining the lower approximation is O(|U| • |X p |).However, in Algorithm 1, the time complexity of obtaining the lower approximation is O(|U| 2 ).erefore, by using the local approximation approach, the elapsed time of obtaining the reduct can be decreased.

Bucket and Local Approximation-Based Attribute
Reduction.Based on the detailed descriptions of bucket and local approximation, we can observe that both of these methods reduce the time consumption from the perspective of the sample.
(i) For the mechanism of bucket, it employs a hash function to map all the samples into a series of sequenced buckets, and only samples in the adjacent buckets should be considered when obtaining an information granule.en, the elapsed time of obtaining an information granule is decreased, and it follows that the time of obtaining the reduct can be saved.(ii) For the approach of local approximation, it can reduce the scope of obtaining a lower approximation, and then useless information can be removed.e elapsed time of computing the lower approximation is decreased, and it follows that the time of obtaining the reduct can also be saved.
Nevertheless, no matter which strategy is employed, it can only improve the time efficiency of obtaining the reduct from one and only one point of view.On the basis of this, we will propose a strategy which combines the above two approaches for further reducing the elapsed time of obtaining the reduct.Firstly, the mechanism of bucket is employed to obtain the information granule, which can reduce the scanning space of samples.Secondly, the useless information granules can be removed for quickly obtaining the lower approximation.Finally, the appropriate attributes can be selected by the measure of approximation quality until the constraint defined in the attribute reduction is satisfied.
In Algorithm 4, the time complexity of obtaining the lower approximation is O(|U′| • |X p |).However, in Algorithm 2, the time complexity of obtaining the lower approximation is O(|U ′ | • |U|).In Algorithm 3, the time complexity of obtaining the lower approximation is O(|U| • |X p |). erefore, the time consumption of obtaining the reduct can be further decreased by using our Algorithm 4.

Experiments
To demonstrate the effectiveness of our proposed method in this paper, 15 UCI data sets have been selected to conduct experiments.e basic descriptions of data sets are shown in Table 1.All the experiments have been carried out on a personal computer with Windows 10, dual-core 2.60 GHz CPU, 8 GB memory.e programming language is MAT-LAB R2017b.
In our experiments, 5-fold cross-validation has been adopted to test the performances of different reducts.It follows that data have been divided into 5 parts of the same size.In each round of calculation, 80% of the samples have been regarded as the training data set for deriving the reduct, and the rest have been denoted as testing data for classification.Furthermore, 20 radii, i.e., 0.01, 0.02, . . ., 0.2 have been selected, which are recommended in [34].

Comparisons of Elapsed Time.
In this experiment, the elapsed time of obtaining the reduct by using four different algorithms will be compared.Moreover, the standard deviation is used to characterize the stability of time.
With a deep investigation of Figure 1 and Tables 2 and 3, it is not difficult to observe the following: (1) With the increasing value of δ, the increasing trends can be observed for the elapsed time of computing reduct by using four algorithms in most cases.is is mainly because the number of attributes in the reduct is possibly to be increased if the value of δ increases.(2) Compared with Algorithm 1, both Algorithm 2 and Algorithm 3 emerge lower time consumption.Moreover, Algorithm 3 requires lower time consumption than Algorithm 2 does.Take "Cardiotocography" as an example, if δ is 0.1, then the elapsed time of obtaining reducts by using three different algorithms is 15.7143, 14.3278, and 14.0373 seconds, respectively.(3) Compared with the process of computing BAQR and LAQR, the process of searching BLAQR requires lower elapsed time.Take "Diabetic Retinopathy Debrecen" as an example; if δ is 0.15, then the elapsed time of obtaining BLAQR is 2.9222 seconds.e elapsed time of obtaining BAQR and LAQR is 3.9703 and 3.1344 seconds, respectively.Such results indicate that our proposed strategy has speeded up the process of finding the reduct significantly.(4) rough observing Tables 2 and 3, Algorithm 4 comes with a lower mean time.Furthermore, standard deviation is also lower than the ones produced Mathematical Problems in Engineering  Mathematical Problems in Engineering by AQR, BAQR, and LAQR for obtaining reducts in most cases.

Comparisons of Stability-Based Reducts.
In this section, the stabilities of reducts will be compared using four different algorithms.For more details about computation of the stability, see [20].Moreover, the standard deviation [44] is used to characterize the dispersion degree of stability.e lower the value of the standard deviation, the higher the stability of reducts.

Mathematical Problems in Engineering
Following Figure 2, it is not difficult to observe the following: (1) e stabilities of reducts by using four algorithms are the same.at is to say, the derived reducts by using four different algorithms are the same. is is mainly because the mechanism of bucket only reduces the scanning times of samples, and the obtained neighborhood relation does not change.Furthermore, the local approach only removes the useless granule, and the obtained lower approximation also does not change.erefore, our proposed strategy which combines the mechanism of bucket and local approximation does not change the result of the reduct.
(2) e standard deviations of reducts are also the same.It follows that the dispersion degrees of stabilities for obtaining the reduct by using four algorithms are the same.
Because the stability-based reducts by using four algorithms are the same, the classification accuracies derived from AQR, BAQR, LAQR, and BLAQR are also the same.
erefore, the results of classification will not be shown in this paper.

Comparisons of Elapsed Time over Incremental Data.
In this section, the time tendency with the incremental scale of data will be shown.e whole universe has been divided into 11 groups.Firstly, only the first group will be used to obtain the reduct and the corresponding elapsed time will be recorded; secondly, the second group will be added into the first group for computing reduct and the corresponding elapsed time will be recorded, and so on; and finally, all the groups will be combined for computing reduct and the corresponding elapsed time will be recorded.
Following Figure 3, it is not difficult to observe the followings: (1) With the increasing of the volume of data, the elapsed time of obtaining the reduct by using four different algorithms is increasing. is is mainly because when the scale of the sample increases, the elapsed time of calculating the neighborhood relation and lower approximation increases.erefore, the time consumption of obtaining the reduct will be increased.
(2) ough the data have been divided into a series of groups, it is also not difficult to observe that the elapsed time of obtaining the reduct by using our approach is less than that of other methods.Furthermore, with the size of data becoming large, our approach has obvious advantages in time consumption compared with other algorithms.

Conclusions and Future Perspectives
In this paper, we have designed an accelerator for calculating the reduct using the concepts of both bucket and local approximation.Our proposed strategy not only reduces the scanning space of obtaining information granules but also removes the useless granules for computing the lower approximations.Furthermore, the experimental results show that our proposed approach can significantly reduce the time consumption of obtaining the reduct.In addition, compared with the previous research studies, the stability-based reducts are the same which implies that the obtained reducts are unchanged by our strategy.e following topics deserve our further investigations: (1) e proposed strategy is only realized from the viewpoint of sample; some other acceleration strategies which take the attribute into account can be further studied.(2) Single granularity is applied to our proposed strategy; multiple granularities [16] can be further considered in the accelerator.

Definition 6 .
Given a decision system DS, ∀A ⊆ AT and ∀X p ∈ (U/IND d ), INDd, the neighborhood lower and upper approximations of X p with respect to A are defined as Inputs: Decision system DS � 〈U, AT ∪ d〉, radius δ.Outputs: An approximation quality based reduct A. (1) Compute c(AT, d); (2) A ⟵ ∅; (3) Do (i) ∀x i ∈ U, compute δ A (x i ); (ii) Compute the lower approximation based on δ A (x i ) by using equation (1); (iii) ∀a ∈ AT − A, compute Sig(a, A, d) based on the lower approximation, and then select b such that Sig(b, A, d) � max Sig(a, A, d):

ALGORITHM 1 :
Forward greedy searching strategy for obtaining the reduct.Inputs: Decision system DS � 〈U, AT ∪ d〉, radius δ.Outputs: An approximation quality reduct A. (1) Compute c(AT, d); (2) A ⟵ ∅; (3) Do (i) ∀x i ∈ U, compute δ A (x i ) by using Definition 5 and eorem 1; (ii) Compute the lower approximation based on δ A (x i ) by using equation (1); (iii) ∀a ∈ AT − A, compute Sig(a, A, d) based on lower approximation, and then select b such that Sig(b, A, d) � max Sig(a, A, d)

Table 1 :
Descriptions of data sets.