Knowledge Reduction Based on Divide and Conquer Method in Rough Set Theory

The divide and conquer method is a typical granular computing method using multiple levels of abstraction and granulations. So far, although some achievements based on divided and conquer method in the rough set theory have been acquired, the systematic methods for knowledge reduction based on divide and conquer method are still absent. In this paper, the knowledge reduction approaches based on divide and conquer method, under equivalence relation and under tolerance relation, are presented, respectively. After that, a systematic approach, named as the abstract process for knowledge reduction based on divide and conquer method in rough set theory, is proposed. Based on the presented approach, two algorithms for knowledge reduction, including an algorithm for attribute reduction and an algorithm for attribute value reduction, are presented. Some experimental evaluations are done to test the methods on uci data sets and KDDCUP99 data sets. The experimental results illustrate that the proposed approaches are efficient to process large data sets with good recognition rate, compared with KNN, SVM, C4.5, Naive Bayes, and CART.


Introduction
In the search for new paradigms of computing, there is a recent surge of interest, under the name of granular computing 1-3 , in computations using multiple levels of abstraction and granulation.To a large extent, the majority of existing studies include rough sets, fuzzy sets, cluster analysis, and classical divide and conquer methods 4, 5 and aim at solving specific problems 6 .
Rough set RS 7-10 is a valid mathematical theory for dealing with imprecise, uncertain, and vague information.It has been applied in such fields as machine learning, data mining, intelligent data analysis, and control algorithm acquiring, successfully since it was proposed by Pawlak and Skowron in 2007 10 .Knowledge reduction is one of the most important contributions of rough set theory to machine learning, pattern recognition, and data mining.Although the problem of finding a minimal reduction of a given information system was proven to be an NP-hard problem 8 , many promising heuristics have been developed that are promising.A variety of methods for knowledge reduction and their applications can be found in 2, 7-43 .Among these existing methods, one group method focuses on the indiscernibility relation in a universe that captures the equivalence of objects, while the other group considers the discernibility relation that explores the differences of objects 42 .For indiscernibility relation, one can employ it to induce a partition of the universe and thereby to construct positive regions whose objects can be undoubtedly classified into a certain class with respect to the selected attributes.Thus, knowledge reduction algorithms based on positive regions have been proposed in 8, 10, 15, 16, 21, 28, 30 .For discernibility relation, we have knowledge reduction algorithms based on a discernibility matrix and information entropy.Reduction methods based on discernibility matrix 34 have high cost of storage with space complexity O m × n 2 for a large decision table with n objects and m conditional attributes.Thus, storing and deleting the element cells in a discernibility matrix is a time-consuming process.Many researchers have studied discernibility matrix construction and contributed to a lot 10,18,28,35,38,39,43 .As well, knowledge reduction algorithms based on information entropy 20, 37, 42 have been developed.Although so many algorithms have been developed, it is valuable to study some new high efficient algorithms.
Divide and conquer method is a simple granular computing method.When the algorithms are designed by divide and conquer method, the decision table can be divided into many subdecision tables recursively in attribute space.That s to say, an original big data set can be divided into many small ones.If the small ones can be processed one by one, instead the original big one is processed on a whole, it will save a lot time.Thus, it may be an effective way to process large data set.The divide and conquer method consists of three vital stages.
Stage 1. Divide the big original problem into many independent subproblems with the same structure.
Stage 3. Merge the solutions of sub-problems into the one of original problem.So far, some good results for knowledge reduction based on divide and conquer method have been achieved, such as the computation of the attribute core and the computation of attribute reduction under given attribute order 12, 13 .Besides, decision treebased methods 26, 27, 44 have been studied and are very popular.In fact, the construction of decision tree is a special case of divide and conquer method, because it can be generated from top to down recursively.In the methods based on decision tree, a tree should be constructed by using decomposition at first.It costs more in the first stage, which is convenient to the following stages, and costs less in the following two ones.
However, the systematic method for knowledge reduction based on divide and conquer method is still absent, especially "how to keep invariance between the solution of original problem and the ones of sub-problems."It results in the difficulty to design the high efficient algorithms for knowledge reduction based on divide and conquer method.Therefore, it is urgent to discuss the knowledge reduction method based on divide and conquer methods systematically and overall.
Contributions of this work are 1 some principles for "keeping invariance between the solution of original problem and the ones of sub-problems" are concluded.Then, the abstract process for knowledge reduction based on divide and conquer method in the rough set theory is presented, which is helpful to design high efficient algorithm based on divide and conquer method. 2 Fast approaches for knowledge reduction based on divide and conquer method, including an algorithm for attribute reduction and an algorithm for attribute value reduction, are proposed.Experimental evaluations show that the presented methods are efficient.
The remainder of this paper is organized as follows.The basic theory and methods dealing with the application of rough set theory in data mining are presented in Section 2. Section 3 introduces the abstract process for knowledge reduction based on divide and conquer method in the rough set theory.A quick algorithm based on divide and conquer method for attribute reduction is presented in Section 4. In Section 5, a fast algorithm for attribute value reduction using divide and conquer method is proposed.Some simulation experimental evaluations are discussed to show the performance of the developed methods in Section 6.The paper ends with conclusion in Section 7.

Preliminaries
Rough set theory was introduced by Pawlak as a tool for concept approximation relative to uncertainty.Basically, the idea is to approximate a concept by three description sets, namely, the lower approximation, upper approximation, and boundary region.The approximation process begins by partitioning a given set of objects into equivalence classes called blocks, where the objects in each block are indiscernible from each other relative to their attribute values.The approximation and boundary region sets are derived from the blocks of a partition of the available objects.The boundary region is constituted by the difference between the lower approximation and upper approximation and provides a basis for measuring the "roughness" of an approximation.Central to the philosophy of the rough set approach to concept approximation is the minimization of the boundary region 28 .
For the convenience of description, some basic notions of decision tables are introduced here at first.

Definition 2.1 decision table 36 . A decision table is defined as S
U, A, V, f , where U is a non-empty finite set of objects, called the universe, A is a nonempty finite set of attributes, A C ∪ D, where C is the set of conditional attributes, and D is the set of decision attributes, D / φ.For any attribute a ∈ A, V a denotes the domain of attribute a.Each attribute determines a mapping function f : 2.1

The Knowledge Reduction Based on Divide and Conquer Method under Equivalence Relation
In the research of rough set theory, the divide and conquer method is an effective method to design high effective algorithm.It can be used to compute equivalence classes, positive region, and attribute core of decision table see Propositions 3.1, 3.2, and 3.3 even execute some operators of discernibility matrix see Propositions 3.4 and 3.5 .In this section, the divide and conquer method under equivalence relation in rough set theory will be discussed.Proposition 3.1 presents the approach of computing equivalence classes or positive region by using divide and conquer method.Compared with decision tree-based method without pruning , the approach allows us to generate "clear" leaves with the same decision for objects in the positive region and "unclear" leaves where objects are with different decisions and correspond to the boundary region.It may be an effective way to prevent overfitting, because "conquer" can play a role of pruning of tree.Furthermore, the approach needs less space because the construction of tree is not necessary.
Propositions 3.2 and 3.3 present the approach of attribute core determining based on divide and conquer method.Compared with the method of computing attribute core on original decision table, it may be more efficient and can process bigger data set, since the big data set has been divided into many small data sets.
Obviously, Propositions 3.1, 3.2, and 3.3 hold.Discernibility matrix by Skowron is a useful method to design some algorithms in rough set theory.However, due to its high complexity of algorithms based on explicit computing of the discernibility matrices, the efficiency of algorithms based on discernibility matrix needs to be improved.In the literature some useful methods have been proposed see 26-31 by Nguyen et al., and decomposition methods implemented in RSES .Our methods differ from the existing ones as follows.
1 "How to keep invariance between the solution of original problem and the ones of sub-problems" is a key problem.We conclude some principles for computing positive region, attribute core, attribute reduction, and value reduction see Propositions 3.1, 3.2, 3.3, 3.4, 3.5, 3.11, and 3.12 , which were not concluded before.
2 Although the decision tree-based methods and our approaches both belong to divide and conquer method, our approaches cost more on "conquer" and "merge" while they cost less on "divide," compared with the decision tree-based methods.Furthermore, our approaches need not to construct a tree, which maybe save space. 3 The existing heuristic ones in 26-28 can improve the efficiency by measuring the discernibility degree of different objects quickly.In our approaches, the element cells of discernibility matrix can be deleted by dividing decision table fast without storing the discernibility matrix.Thus, it may be a quick one for operating discernibility matrix with small space see Propositions 3.4 and 3.5 .

Given a decision table S
U, A, V, f and its discernibility matrix M Definition 2.8 ;

Let us denote by M the discernibility matrix of S and the discernibility matrix of
xy Definition 2.8 .After the partition of U/C 1 , suppose x and y be divided into sub-decision tables

Let us denote by M the discernibility matrix of S. Then, in the viewpoints of operating discernibility matrix, divide the decision table S by U/C 1 on attribute set C 1 if and only if one deletes all the element cells of M
C 1 from M one by one.
According to Proposition 3.4, it is easy to find that Proposition 3.5 holds.Propositions 3.4 and 3.5 present the approach of deleting element cells of discernibility matrix.By using the approach, the element cells of discernibility matrix can be deleted quickly without constructing or storing the discernibility matrix.It may be an effective way to operate discernibility matrix quickly within small space.Thus, Propositions 3.4 and 3.5 can be used to design some efficient algorithms without explicit computing of discernibility matrix.

The Knowledge Reduction Based on Divide and Conquer Method under Tolerance Relation
In the course of attribute value reduction, tolerance relation is often adopted due to some attribute values on condition attribute being deleted.Thus, tolerance relation in attribute value reduction may be needed.A method is introduced by Kryszkiewicz and Rybinski 17 to process incomplete information system, where " * is used to represent the missing values on condition attributes.Here, " * can be also represent the deleted values on condition attributes.According to the tolerance relation by Kryszkiewicz, the divide and conquer method under the tolerance relation will be discussed.

let us denote by d x a decision rule relative to object x in S i and a decision rule relative to x in S by d 1
x , respectively.There is x .
Proposition 3.12.Given a decision table S U, A, V, f , for all C 1 ⊆ C, given an decomposing order c 1 , c 2 , . . .c p p |C 1 | , and according to the order, S can be divided into k sub-decision tables S 1 , S 2 , . . ., S k , where S i U i , A, V, f .Assume S and S 1 , S 2 , . . ., S k be processed with the same.For each sub-decision table S i 1 ≤ i ≤ k , let us denote by Rule the certain decision rule set of S and the one of S i by Rule i , respectively.Then, Rule Propositions 3.11 and 3.12 present the approach of value reduction based on divide and conquer method.It can keep invariance between the solution of original decision table and the ones of sub-decision tables.By using the approach, it allows us to generate decision rules from sub-decision tables, not from original decision table.It may be a feasible way to process big data set.

The Abstract Process for Knowledge Reduction Based on Divide and Conquer Method in Rough Set Theory
According to the divide and conquer method under equivalence relation and tolerance relation, the abstract process for knowledge reduction in rough set theory based on the divide and conquer method APFKRDAC P , S will be discussed in this section.
Input: The problem P on S.
Output: The solution Solu of the problem P .
Step 1 determine a similarity relation of different objects .Determine a similarity relation between different objects, such as equivalence relation or tolerance relation.Generally, reflexivity and symmetry of different objects may be necessary.
Step 2 determine the decomposing order .Determine the order Step 3 determine the decomposing strategy .
3.1 Design a judgment criteria CanBeDivide for judging whether the universe can be decomposed.
3.2 Design a decompose function DecomposingFunc , which can be used to decompose the universe recursively.

Design a boolean function
IsEnoughSmall , which can be used to judge if the size of problem is small enough to be processed easily.
3.4 Design a computation function P rocessSmallP roblem , which can be used to process small problems directly.
3.5 Design a computation function MergingSolution , which can be used to merge the solutions of sub-problems.
Step 4 process the problem based on divide and conquer method .Step 5 optimize the solution .If necessary, optimize the solution Solu.
Step 6 return the solution .RETURN Solu.
Now, let us give an example for computing the positive region of decision table to explain Algorithm 3.13 see Algorithm 3.14: the algorithm for computing positive region based on divide and conquer method .Algorithm 3.14 CPRDAC P, S .
Input: The problem P on S: compute positive region.Output: The positive region Pos C D of S.
Step 1 Determine the similar relation .equivalence relation.
Step 3 Determine the decomposing strategy .
Step 5 Optimize the solution .Solu is an optimized result.
Step 6 Return the solution .RETURN Solu.
Example 3.15.Given a decision table S U, A, V, f , U {x 1 , x 2 , x 3 , x 4 , x 5 , x 6 }, C {c 1 , c 2 , c 3 } , now compute the positive region of S according to Algorithm 3.14.The whole process can be found in Figure 1.
All objects have the same decision.Thus, all objects of S 4 belong to positive region.
Only one object.Thus, the objects of S 5 belong to positive region.
Only one object.Thus, the object of S 2 belongs to positive region.
The decision values are not unique.Thus, all the objects of S 7 do not belong to positive region.

A Fast Algorithm for Attribute Reduction Based on Divide and Conquer Method
Knowledge reduction is the key problem in rough set theory.When the divide and conquer method is used to design the algorithm for knowledge reduction, some good results may be obtained.However, implementing the knowledge reduction based on the divide and conquer method is very complex, though it is only a simple granular computing method.Here, we will discuss the quick algorithm for knowledge reduction based on divide and conquer method.
In the course of attribute reduction, the divide and conquer method is used to compute the equivalence class, the positive region, and the non-empty label attribute set and delete the elements of discernibility matrix.Due to the complexity of attribute reduction, the following algorithm is not presented as Algorithm 3.14 in details.
According to Step 2 of Algorithm 3.13, the attribute set and the order must be determined, on which the universe of decision table will be partitioned in turns.Generally speaking, the decomposing order depends on the problem which needs to be solved.Furthermore, if the order is not given by field experts, it can be computed by the weights in 10, 15, 23, 25, 26, 28, 33, 36, 37, 41 .Of course, if the attribute order is given, it will be more suitable for Algorithm 3.13.Most techniques discussed below are based on a given attribute order and divide and conquer method.In this section, a quick attribute reduction algorithm based on a given attribute order and divide and conquer method is proposed.

Attribute Order
In 2001, an algorithm for attribute reduction based on the given attribute order is proposed by Jue Wang and Ju Wang 38 .For the convenience of illustration, some basic notions about attribute order are introduced here.
Given a decision table S U, A, V, f , an attribute order relation over

Attribute Reduction Based on the Divide and Conquer Method
Thus, the proposition holds.Therefore, Proposition 4.2 holds.
According to the algorithm in 38 , in order to compute the attribute reduction of a decision table, its non-empty label attribute set L SO should be first calculated.Using the divide and conquer method, an efficient algorithm for computing the non-empty label attribute set L SO is developed.A recursive function for computing the non-empty label attribute set is used in the algorithm.

Function 1
NonEmptyLabelAttr S, r //S is decision table.r is the number of attributes 1 ≤ r ≤ |C| .
Step Step 2.4 Merge the solutions .Here, there is no operation because the solutions are stored in the array NonEmptyLabel .
Using the above recursive function, an algorithm for computing the non-empty label attribute set of a decision table is developed.

Algorithm 4.3. Computation of The Non-empty Label Attribute Set L SO
Input: A decision table S and an attribute order SO : The non-empty label attribute set R 1 of S.
Step 1. R 1 φ; r 1; FOR j 1 TO |C| DO NonEmptyLabel j 0; END FOR Step 2. NonEmptyLabelAttr S, 1 ; Obviously, Algorithm 4.3 is an instance of Algorithm 3.13.Given an attribute order of the conditional attributes in a decision table, using the Algorithm 4.3 and divide and conquer method, an efficient attribute reduction algorithm is developed.

Algorithm 4.4. Computation of Attribute Reduction Based on Divide and Conquer Method
Input: A decision table S U, A, V, f and an attribute order SO : Step 2. Compute the positive region Pos C D , according to Algorithm 3.14.
Step 3. Compute the non-empty label attribute set R 1 by Algorithm 4.3.
Step 4. //Suppose c N be the maximum label attribute of According to Algorithm 3.13, Propositions 5.1, 5.2, and 5.3, a recursive function and an algorithm for value reduction based on divide and conquer method are developed as follows.Step 1 Ending Condition .IF there is contradiction on S, THEN return; END IF Step 2 Value reduction on c r based on divide and conquer method .
Divide S into k sub-decision tables S 1 , S 2 , . . ., S k on attribute c r by using tolerance relation.
Denote by array Solu i |U i | the solution of S i .Where, Using Function 2, we present an algorithm for value reduction based on divide and conquer method see Algorithm 5.4 .

Algorithm 5.4. An Algorithm for Value Reduction Based on Divide and Conquer Method
Input: A decision table S U, A, V, f Output: The certain rule set DR of S.
Step 2 Compute the positive region .According to Algorithm 3.14, compute the positive region Pos C D of S.
Step 3 Compute the non-empty label attribute .Assume the order for dividing decision table be c 1 , c 2 , . . ., c m m |C| .
Compute the non-empty label attribute set ASet by using Function 1.
Step Step 5 Get the rule set .In Step 4, let the number of non-empty label attribute set be u.Then, the time complexity of Step 4 is O u × 2n T 1, n , where T 1, n is an instance of T r, n which can be expressed by the following recursive equation: Suppose the data obey the uniform distribution.The time complexity of Algorithm 5.4 is

Experimental Evaluations
In order to test the efficiency of knowledge reduction based on divide and conquer method, some experiments have been performed on a personal computer.The experiments are shown as follows.

The Experimental Evaluations on UCI Data Sets
In this experiment, some experimental evaluations are done to present the efficiency and recognition results of Algorithms 4.4 and 5.4.In the mean time, some famous approaches for data mining are used to compare with our methods.The test course is as follows.First, 11 uci data sets Zoo, Iris, Wine, Machine, Glass, Voting, Wdbc, Balance-scale, Breast, Crx, and Tic-tac-toe are used.Second, our methods: the algorithm for discretization 14 it is an improved one based on the discretization method in 28 , the algorithm for attribute reduction Algorithm 4.4 , and the algorithm for attribute value reduction Algorithm 5.4 are used to test the 11 uci data sets.Third, 5 methods KNN, SVM, C4.5, Naive Bayes, and CART are also used to test the data sets, which belong to the "top 10 algorithms in data mining" 44 , and their source codes are afforded by Weka software.Weka http://en.wikipedia.org/wiki/Wekamachine l earning is used as the   experimental plat and "Java Eclipse Weka" as the developing tool.The test method is LOOCV Leave One Out Cross Validation .The specifications of the experimental computer are an Intel R Core TM2 Quad CPU Q8200 @2.33 GHz CPU, 2 GB RAM, and Microsoft Windows 7. The specifications of 11 data sets and the experimental results are as follows.
From Table 1 and Figure 2, it can be found that the recognition results of our methods on the 11 uci data sets are closed to the ones of KNN and CART, which are better than Naive Bayes, C4.5, and SVM.First, the experiments are done on 10 data sets ≤10 4 records from the original KDDCUP99 data sets.The experimental evaluations are showed in Tables 3, 4, and 5, where the time unit is "ms" in Tables 4 and 5.
From Tables 3, 4, and 5, it can be found that it will cost much time to train by SVM, which shows that SVM is not a good way to process KDDCUP99 data sets with large records.Thus, in the following experiments, SVM will be not tested.
Second, the experiments are done on 10 data sets ≤10 5 records from the original KDDCUP99 data sets.The experimental evaluations are showed in Tables 6 and 7, where "Tr" is the training time, "Te" is the test time, and the time unit is "ms" in Table 7. From Tables 6 and 7, it can be found that the recognition is lower than others by Naive Bayes and much test time is needed for KNN.Thus, Naive Bayes and KNN will be not tested in the following experiments.
Third, the experiments are done on 10 data sets ≤10 6 records from the original KDDCUP99 data sets.The experimental evaluations are showed in Table 8, where "RRate" is the recognition rate, "Tr" is the training time, "Te" is the testing time, and the time unit is "ms" in Table 8.
Fourth, the experiments are done on 10 data sets < 5 × 10 6 records from the original KDDCUP99 data sets.The experimental results are showed in Table 9, where "RRate" is the recognition rate, "Tr" is the training time, "Te" is the testing time, "-" is the overflow of memory, and the time unit is "second" in Table 9.
From Tables 8 and 9, it can be found that C4.5 is the best one and our method is the second best one among C4.5, CART, and our method.Due to the high complexity for discretization, our method can not complete the knowledge reduction of 4898432 records in this experiment.

The Conclusions of Experimental Evaluations
Now, we will give some conclusions for our approaches compared with KNN, SVM, C4.5, Naive Bayes, and CART, according to the LOOCV experimental results on the uci data sets and 10 cross-validation experimental results on KDDCUP99 data sets.
1 Compared with KNN, SVM, and Naive Bayes, the LOOCV recognition results by our methods on uci data sets are better than KNN, SVM, and Bayes.Furthermore, our methods on KDDCUP99 data sets have higher efficiency than KNN, SVM, and Naive Bayes, while they also have good recognitions results.
2 Compared with CART, the LOOCV recognition results by our methods on uci data sets are closed to CART.But our methods can process larger data sets than CART on KDDCUP99 data sets, while they both have good recognition results.
3 Compared with C4.5, the LOOCV recognition results by our methods on uci data sets are better than C4.5.Furthermore, the test time by our methods on KDDCUP99 data sets is less than C4.5, while C4.5 can process larger data sets than our methods.After these two methods are analyzed, we find that our methods are more complex than C4.5, due to the complex discretization C4.5 can process the decision table with continuous values directly, while the discretization should be necessary for our methods .As a coin has two sides, enough learning contributes to the better rule sets, thus less test time is needed by our methods than C4.5.
Therefore, the knowledge reduction approaches based on divide and conquer method are efficient to process large data set, although they need to be improved further in the future.

Conclusions
In this paper, the abstract process of knowledge reduction based on divide and conquer method is concluded, which is original from the approaches under the equivalence relation and the one under the tolerance relation.Furthermore, an example for computing positive region of the decision table is introduced.After that, two algorithms for knowledge reduction based on divide and conquer method, including an algorithm for attribute reduction and an algorithm for attribute value reduction, are presented, respectively.The proposed algorithms are efficient to process the knowledge reduction on uci data sets and KDDCUP99 data set, according to the experimental evaluations.Therefore, the divide and conquer method is an efficient and, therefore, suitable method to be used to knowledge reduction algorithms in rough set theory.With this efficiency, widespread industrial application of rough set theory may become possible.
the positive region of S i and the one of S by Pos C D , respectively.Then, Pos C D 1≤i≤k Pos i R D .
be defined.Let us denote by M the discernibility matrix of S. For any δ ∈ M, the attributes of discernibility matrix δ inherit the order relation of SO from left to right, that is, δ c j B, where c j ∈ C and B ⊂ C − {a} , and c j is the first element of δ by SO, called the non-empty label attribute of δ 33 .For c j , a set L SO {δ | δ c j B, δ inherit the order relation of SO from left to right and δ ∈ M} is defined.Hence, M can be divided into equivalence classes by label attributes defining a partition { 1 , 2 , . . ., |C| } of M denoted by M/L SO 33 .Supposing N max{k | k ∈ M/L SO ∧ k / φ}, its maximum non-empty label attribute is a N .

Step 4 .
RETURN R 1 .Suppose n |U|, m |C|.According to the conclusion of literature 45 , the average time and space complexities of the Algorithm 4.3 are T O n × m log n and S O m n .

Function 2 DRAVDAC
S, c r //Denote by array CoreValueAttribute the result of value reduction of c r .//The values of array CoreValueAttribute are all 0 initially.

FOR i 1 Step 6 .
TO |U| DO IF x i ∈ Pos C D THEN Construct a rule d i in terms with x i ; DR DR ∪ {d i }; RETURN DR.Suppose |C| m, |U| n.The time complexity of Step 1 is O m n .The average time complexities of Steps 2 and 3 are O n × m log n 45 .The time complexities of Steps 5 and 6 are both O m × n .Now, let us analyze the time complexity of Step 4.
the time complexity of Algorithm 5.4 is less than O n 2 × m .When p ≥ 5, the time complexity of Algorithm is less than O n 1.5 × m .The space complexity of Algorithm 5.4 is O m × n .
Voting Wdbc Balance Breast Crx Tic-tac-toe UCI data sets

Figure 2 :
Figure 2: Recognition result on uci data sets.
, for any subset X ⊆ U and indiscernibility relation IND B , the B lower approximation and upper approximation of X are defined as B − X P , r is unnecessary in P relative to Q if and only if Pos P Q Pos P −{r} Q , otherwise r is unnecessary in P relative to Q.The core of P relative to Q is defined as CORE Q P {r | r ∈ P ; r is necessary in P relative to Q}. Definition 2.6 see 36 .Given a decision table S U, A, V, f , P ⊆ A and Q ⊆ A, for all r ∈ P , if r is relatively necessary in P relative to Q, P is called independent relative to Q.
4 positive region 36 .Given a decision table S U, A, V, f , P ⊆ A, and Q ⊆ A, the P positive region of Q is defined as Pos P Q ∪ x∈U/Q P X .Definition 2.5 relative core 36 .Given a decision table S U, A, V, f , P ⊆ A, Q ⊆ A, r ∈

Proposition 3.2 see
13 .Given a decision table S U, A, V, f , for allc c ∈ C , divide S into k k |IND U/{c} | sub-decision tables S 1 , S 2 , . .., S k by U/{c}, where S i U i , C \ {c} ∪ D, V i , f i , which satisfies ∀ x∈U i ∀ y∈U i c x c y 1 ≤ i ≤ k and ∀ x∈U i ∀ z∈U j c x / c z 1 ≤ i < j ≤ k .Letus denote by Core i 1 ≤ i ≤ k the attribute core of S i and the one of S by Core.Then, 1≤i≤k Core i ⊆ Core ⊆ {c} ∪ 1≤i≤k Core i .
In the tolerance relation-based extension of rough set theory, the lower approximation X T B and upper approximation X B T of an object set X relative to an attribute set B B ⊆ C are defined as X T Definition 3.8 the covering of the universe under tolerance relation .Given a decision table S U, A, V, f and condition attribute set C 1 C 1 ⊆ C , according to the tolerance relation by Kryszkiewicz, the covering of the universe of S can be defined asU U 1 ∪U 2 ∪• • •∪U k , where ∀ x,y∈U p 1≤p≤k ∀ c∈C 1 c x c y ∨ c x * ∨ cy * , and ∀ x∈U r ∀ y∈U p 1≤r<p≤k ∃ c∈C 1 c x / * ∧c y / * ⇒ c x / c y .Definition 3.9 certain decision rule .Given a decision table S U, A, V, f , for all x i ∈ Pos C D , the object x i can result in a certain decision rule d i : des x i C ⇒ des x i D , d i a a x i , a ∈ C ∪ D. d i | C and d i | D are called the condition attribute set and decision attribute set of d i , respectively.Definition 3.10 see 36 .Given a decision table S U, A, V, f , for arbitrary certain decision rule, there is x i C ⊂ x i D .For all a ∈ C, if x i C\{a} ⊆ x i D does not hold, then, a is necessary in d i ; otherwise, a is not necessary in d i .
1 Definition 3.7 see 17 .The tolerance class T B x of an object x relative to an attribute set B is defined as T B x {y | y ∈ U ∧ T B x, y }.Proposition 3.11.Given a decision table S U, A, V, f , and for all C 1 ⊆ C, given an decomposing order c 1 , c 2 , . . .c p p |C 1 | , according to the order and tolerance relation, S can be divided into k sub-decision tables S 1 , S 2 , . . ., S k , where S i U i , A, V, f .Assume S and S 1 , S 2 , . . ., S k be processed with the same way.For each sub-decision table . 4.4 Merge the solutions of sub-problems Solu MergingSolution Solu 1 , Solu 2 , . . ., Solu k .
i ∪ D, V, f , C i ⊆ C, P i is the sub-problem of S i 3.1 Design a judgment criteria CanBeDivide : On attribute c i , for all U i U i ⊆ U , IF ∃ x∈U i ∃ y∈U i c i x / c i y ∧ ∃ z∈U i ∃ w∈U i D z / D w THEN CanbeDivide c i true; ELSE CanBeDivide c i false; END IF 3.2 Design a decompose function DecomposingFunc : According to U/C, S can be divided into k |U/C| sub-decision tables S 1 , S 2 , . . ., S k on attributes c 1 , c 2 , . . ., c m recursively.3.3 Design a boolean function IsEnoughSmall : Let c r be the attribute on which the universe is being decomposed.IF |U i | 1 or r > m or ∀ x∈U i ∀ y∈U i D x Design a computation function P rocessSmallP roblem : For arbitrary sub-decision table S i and its universe U i , IF ∀ x∈U i ∀ y∈U − {c 1 } ∪ D, V, f , P i denotes computing the positive region of S i .4.4 Merge the solutions of sub-problems Solu Solu 1 i D x D y THEN U i ⊆ Pos C D ∧ Solu i U i ; ELSE Solu i φ; END IF 3.5 Design a computation function MergingSolution : Solu Solu 1 ∪ Solu 2 ∪ • • • ∪ Solu k ;Step 4 Process the problem based on the divide and conquer method .4.1 IF IsEnoughSmall S THEN Solu P rocessSmallP roblem S ; goto Step 5. 4.2 Divide According to the order {c 1 , c 2 , . . ., c m }, S can be divided into k subdecision tables S 1 , S 2 , . . ., S k on c 1 by using U/IND {c 1 } .
C D ∧ x ∈ U , according to |U| > 1 ∧ |V D | > 1 , then, ∃ y y / x y ∈ U ∧ D x / D y .If c x / c y , then c ∈ B Sxy , that is, ∃ α α∈M c ∈ α.The proposition holds.If c x c y , then, there are two cases.Case 1 y ∈ Pos C D .According to Compute non-empty labels based on divide and conquer method .Let NonEmptyLabel be an array used to store the solution.Step 2.2 Divide .Divide S into S 1 , S 2 , . . ., S k by U/IND {c r } ;

A Fast Algorithm for Value Reduction Based on Divide and Conquer Method Proposition
1 | ; Compute new non-empty label attribute set R 1 of S by Algorithm 4.3; |C|, the average time complexity of the Algorithm 4.4 is O n × m × m log n .Its space complexity is O m n .In Algorithm 4.4, Step 1 is the initialization.In Step 2 of Algorithm 4.4, divide and conquer method is used to compute equivalence classes and positive region, thus Step 2 is an instance of Algorithm 3.13.In Step 3, Algorithm 4.3 is used to compute non-empty label attribute set Algorithm 4.3 is also an instance of Algorithm 3.13 .Step 4 is responding to Step 5 of Algorithm 3.13.In Step 4, Algorithm 4.3 is called repeatedly to reduce the redundant attribute set.That is, Algorithm 4.4 is composed of instances of Algorithm 4.3, which illustrates that Algorithm 4.4 is implemented by divide and conquer method.Basically, divide and conquer method is used primarily to compute equivalence classes, positive region, and non-empty label attribute set and delete element cells in discernibility matrix in Algorithm 4.4.∀ d i ∈DR 1≤i≤|DR| d i , d i must be induced by x x ∈ Pos C D .Given a decision table S U, A, V, f , let us denote by DR certain rule set of S. For all x ∈ Pos C D , let us denote by d 1 the certain rule by x.Let M 1 {B s 5. 5.1.Given a decision table SU, A, V, f , for all x ∈ Pos C D , a certain rule can be induced by object x.Proposition 5.2.Given a decision table SU, A, V, f , let us denote by DR a certain rule set of S. xy | y ∈ U}.Then,

Table 1 :
Specifications of 11 uci data sets.

Table 3 :
The recognition of 6 algorithms on KDDCUP99 data sets ≤10 4 records .

.2. The Experimental Results on KDDCUP99 Data Sets In
order to test the efficiency of our methods for processing large data sets, some experiments are done on KDDCUP99 data sets with 4898432 records, 41 condition attributes, and 23 decision classifications http://kdd.ics.uci.edu/databases//kddcup99/kddcup99.html.Our methods consist of the discretization algorithm 14 , Algorithms 4.4, and 5.4 still.Weka is used as the experimental plat and "Java Eclipse Weka" as the developing tool Table2.The test method is 10 cross-validation.The specifications of the experimentation computer are an Intel R Core TM2 Quad CPU Q8200 @2.33 GHz CPU, 4 GB RAM, and Microsoft Windows Server 2008.The experimental results are as follows.

Table 4 :
The training time of 6 algorithms on KDDCUP99 data sets ≤10 4 records .

Table 5 :
The test time of 6 algorithms on KDDCUP99 data sets ≤10 4 records .

Table 6 :
The recognition of 5 algorithms on KDDCUP99 data sets ≤10 5 records .

Table 7 :
The running time of 5 algorithms on KDDCUP99 data sets ≤10 5 records .

Table 8 :
The experimental results of 3 algorithms on KDDCUP99 data sets ≤10 6 records .