δ-Cut Decision-Theoretic Rough Set Approach: Model and Attribute Reductions

Decision-theoretic rough set is a quite useful rough set by introducing the decision cost into probabilistic approximations of the target. However, Yao's decision-theoretic rough set is based on the classical indiscernibility relation; such a relation may be too strict in many applications. To solve this problem, a δ-cut decision-theoretic rough set is proposed, which is based on the δ-cut quantitative indiscernibility relation. Furthermore, with respect to criterions of decision-monotonicity and cost decreasing, two different algorithms are designed to compute reducts, respectively. The comparisons between these two algorithms show us the following: (1) with respect to the original data set, the reducts based on decision-monotonicity criterion can generate more rules supported by the lower approximation region and less rules supported by the boundary region, and it follows that the uncertainty which comes from boundary region can be decreased; (2) with respect to the reducts based on decision-monotonicity criterion, the reducts based on cost minimum criterion can obtain the lowest decision costs and the largest approximation qualities. This study suggests potential application areas and new research trends concerning rough set theory.


Introduction
Decision-theoretic rough set (DTRS) was proposed by Yao et al. in the early 1990s [1,2]. Decision-theoretic rough set introduces Bayesian decision procedure and loss function into rough set. In decision-theoretic rough set, the pair of thresholds and , which are used to describe the tolerance of approximations, can be directly calculated by minimizing the decision costs with Bayesian theory. Following Yao's pioneer works, many theoretical and applied results related to decision-theoretic rough set have been obtained; see [3][4][5][6][7][8][9][10][11][12][13] for more details.
In decision-theoretic rough set, Pawlak's indiscernibility relation is a basic concept [14][15][16][17][18][19], and it is an intersection of some equivalence relations in knowledge base. It should be noticed that, in [20], Zhao et al. have made a further investigation about indiscernibility relation and proposed another two indiscernibility relations, which are referred to as weak indiscernibility and -cut quantitative indiscernibility relations, respectively. Correspondingly, Pawlak's indiscernibility relation is called the strong indiscernibility relation. By comparing such three binary relations, it is proven that thecut quantitative indiscernibility relation is a generalization of both strong and weak indiscernibility relations. Therefore, it is interesting to construct -cut decision-theoretic rough set based on -cut quantitative indiscernibility relation. This is what will be discussed in this paper.
Furthermore, attribute reduction is one of the most fundamental and important topics in rough set theory and has drawn attention from many researchers. As far as attribute reduction in decision-theoretic rough set, the properties of nonmonotonicity and decision cost should be concerned.
(1) On the one hand, as we all know, in Pawlak's rough set model, the positive region is monotonic with respect to the set inclusion of attributes. However, the monotonicity property of the decision regions with respect to the set inclusion of 2 The Scientific World Journal attributes does not hold in the decision-theoretic rough set model [21,22]. To fill such a gap, Yao and Zhao proposed the definition of decision-monotonicity criterion based attribute reduction [23]; (2) on the other hand, decision cost is a very important notion in decision-theoretic rough set model; to deal with the minimal decision cost, Jia et al. proposed a fitness function and designed a heuristic algorithm [24].
As a generalization of decision-theoretic rough set, in our -cut decision-theoretic rough set, we conduct the attribute reductions from above two aspects. Firstly, we introduce the notion of decision-monotonicity criterion into attribute reduction and design a significance to measure attributes; secondly, to deal with the minimum decision cost problem, we regard it as an optimization problem and apply the generic algorithm to obtain a reduct with the lowest decision cost.
To facilitate our discussions, we present the basic knowledge, such as Pawlak's rough set, -cut quantitative rough set, and Yao's decision-theoretic rough set in Sections 2 and 3. In Section 4, we propose a new -cut decision-theoretic rough set and present several related properties. In Section 5, we discuss the attribute reductions by considering two criterions. The paper ends with conclusions in Section 6.

Strong Indiscernibility
Relation. An information system is a pair = ( , ), in which universe is a finite set of the objects; is a nonempty set of the attributes, such that for all ∈ , and is the domain of . For all ∈ , ( ) denotes the value of on . Particularly, when = ∪ and ∩ = 0 ( is the set of conditional attributes and is the set of decisional attributes), the information system is also called decision system. Each nonempty subset ⊆ determines a strong indiscernibility relation ( ) as follows: A strong indiscernibility relation with respect to is denoted as ( ). Two objects in satisfy ( ) if and only if they have the same values on all attributes in ; it is an equivalence relation.
( ) partitions into a family of disjoint subsets / ( ) called a quotient set of : where [ ] denotes the equivalence class determined by with respect to ; that is, Definition 1. Let be an information system, let be any subset of , and let be any subset of . The lower approximation of denoted as ( ) and the upper approximation of denoted as ( ), respectively, are defined by The pair [ ( ), ( )] is referred to as Pawlak's rough set of with respect to the set of attributes .

Weak Indiscernibility Relation.
In the definition of strong indiscernibility relation, we can observe that two objects in satisfy ( ) if and only if they have the same values on all attributes in ; such case may be too strict to be used in many applications. To address this issue, Zhao and Yao proposed a notion which is called weak indiscernibility relation. The semantic interpretation of weak indiscernibility relation is that two objects are considered as indistinguishable if and only if they have the same values on at least one attribute in . In an information system , for any subset of , a weak indiscernibility relation can be defined as follows [20]: From the description of the weak indiscernibility relation we can find that a weak indiscernibility relation ( ) with respect to only requires that two objects have the same values on at least one attribute in . A weak indiscernibility relation is reflexive and symmetric, but not necessarily transitive. Such a relation is known as a compatibility or a tolerance relation.
where [ ] = { ∈ : ( , ) ∈ ( )} is the set of objects, which are weak indiscernibility with in terms of set of attributes .

-Cut Quantitative Indiscernibility
Relation. The strong and weak indiscernibility relations represent the two extreme cases, which include many levels of indiscernibility. With respect to a nonempty set of attributes ⊆ , a -cut quantitative indiscernibility relation is defined as a mapping from × to the unit interval [0, 1].
Definition 3 (see [20]). Let be an information system; for all ⊆ , the -cut quantitative indiscernibility relation ( ) is defined by where | ⋅ | denotes the cardinality of a set.
By the definition of -cut quantitative indiscernibility relation, we can obtain the lower and upper approximations as in the following definition.
where [ ] = { ∈ : ( , ) ∈ ( )} is the set of objects, which are -cut indiscernibility with in terms of set of attributes .

Decision-Theoretic Rough Set
The Bayesian decision procedure deals with making a decision with minimum risk based on observed evidence. Yao and Zhou introduced a more general rough set model called a decision-theoretic rough set (DTRS) model [25][26][27]. In this section, we briefly introduce the original DTRS model. According to the Bayesian decision procedure, the DTRS model is composed of two states and three actions. The set of states is given by Ω = { , ∼ } indicating that an object is in or not, respectively. The probabilities for these two complement states can be denoted as ( . The set of actions is given by A = { , , }, where , , and represent the three actions in classifying an object , namely, deciding that belongs to the positive region, deciding that belongs to the boundary region, and deciding that belongs to the negative region, respectively. The loss functions are regarding the risk or cost of actions in different states. Let , , and denote the cost incurred for taking actions , , and , respectively, when an object belongs to , and let , , and denote the cost incurred for taking the same actions when an object belongs to ∼ .
According to the loss functions, the expected costs associated with taking different actions for objects in [ ] can be expressed as follows: The Bayesian decision procedure leads to the following minimum-risk decision rules: ; that is to say, the loss of classifying an object belonging to into the positive region is no more than the loss of classifying into the boundary region, and both of these losses are strictly less than the loss of classifying into the negative region. The reverse order of losses is used for classifying an object not in . We further assume that a loss function satisfies the following condition: Based on the above two assumptions, we have the following simplified rules: ≥ , then this decides that belongs to the positive region; ( 1) if < ( | [ ] ) < , then this decides that belongs to the boundary region; ( 1) if ( | [ ] ) ≤ , then this decides that belongs to the negative region, with 1 ≥ ≥ ≥ 0. Using these three decision rules, for all ⊆ and for all ⊆ , we get the following probabilistic approximations:

-Cut Decision-Theoretic Rough Set
As the discussion in Section 3, we can observe that the classical decision-theoretic rough set is based on the strong indiscernibility relation which is too strict since it requires that the two objects have the same values on all attributes. In this section, we introduce the concept of -cut indiscernibility relation into the decision-theoretic rough set model.  The pair [ ( , ) ( ), ( , ) ( )] is referred to as a -cut decision-theoretic rough set of with respect to the set of attributes .

Definition of -Cut Decision-Theoretic Rough Set
After obtaining the lower and upper approximations, the probabilistic positive, boundary, and negative regions are defined by Let be a decision system and let = { 1 , 2 , . . . , } be a partition of the universe , which is defined by the decision attribute , representing classes. By the definition of quantitative decision-theoretic rough set, the lower and upper approximations of the partition can be expressed as follows: For this -classes problem, it can be regarded as twoclass problems; following this approach, the positive region, boundary region, and negative region of all the decision classes can be expressed as follows: Based on the notions of the three regions in -cut decision-theoretic rough set model, three important rules should be concerned, that is, positive rule, boundary rule, and negative rule. Similar to Yao's decision-theoretic rough set, when > , for all ∈ , we can obtain the following decision rules, that is, tie-break: Let be a decision system, ∈ (0, 1]; for all ∈ , the Bayesian expected costs of decision rules can be expressed as follows: Considering the special case where we assume zero cost for a correct classification, that is, = = 0, the decision costs of rules can be simply expressed as follows: For any subset of conditional attributes, the overall cost of all decision rules can be denoted as COST( ), such that

Proposition 6. Let be an information system; if
Proof. In this proposition, we suppose that there is a unit misclassification cost if an object in is classified into the negative region or if an object in ∼ is classified into the positive region; otherwise there is no cost; that is, = = 1 and = = = = 0. By the computational processes of and , we have = 1 and The Scientific World Journal 5 = 0 and by the definition of -cut decision-theoretic rough set, we can observe that Similarly, it is not difficult to prove ( , ) ( ) = ( ).

Proposition 7.
Let be an information system; for all ⊆ , for all ⊆ , one has Proof. For all ∈ ( ) and by Definition  Similarly, it is not difficult to prove ( , ) ( ) ⊆ ( ).
Propositions 6 and 7 show the relationships betweencut decision-theoretic rough set and classical -cut quantitative rough set. The details are given as follows: the classical -cut quantitative indiscernibility lower approximation is included into the -cut decision-theoretic lower approximation and the -cut decision-theoretic upper approximation is included into the classical -cut quantitative indiscernibility upper approximation. Particularly, with some limitations, the -cut decision-theoretic rough set can degenerate to the classical -cut quantitative rough set. As the discussion above, we can observe that the -cut decision-theoretic rough set is a generalization of classical -cut quantitative rough set, and it can increase lower approximation and decrease upper approximation.
Proof. It is not difficult to prove this proposition by Definitions 3 and 5 and the definition of decision-theoretic rough set.
Proposition 8 shows the relationships between -cut decision-theoretic rough set and Yao's decision-theoretic rough set. The details are the following: if we set the value of with 1, the lower and upper approximations based on our decision-theoretic rough set are equal to those based on Yao's decision-theoretic rough set. By Proposition 8 we can observe that our decision-theoretic rough set is also a generalization of Yao's decision-theoretic rough set.

Attribute Reductions in Quantitative
Decision-Theoretic Rough Set

Decision-Monotonicity Criterion Based Reducts. In
Pawlak's rough set theory, attribute reduction is an important concept which has been addressed by many researchers all around the world. In classical rough set, the reduct is a minimal subset of attributes which is independent and has the same power as all of the attributes. The positive region, the boundary region, and the negative region are monotonic with respect to the set inclusion of attributes in classical rough set theory. However, in decision-theoretic rough set model, the monotonicity property of the decision regions with respect to the set inclusion of attributes does not hold. To solve such a problem, Yao and Zhao have proposed a decision-monotonicity criterion [23]. The decision-monotonicity criterion requires two things. Firstly, the criterion requires that by reducing attributes a positive rule is still a positive rule of the same decision. Secondly, the criterion requires that by reducing attributes a boundary rule is still a boundary rule or is upgraded to a positive rule with the same decision. Following their work, it is not difficult to introduce the decision-monotonicity criterion into our -cut decision-theoretic rough set. The detailed definition is shown in Definition 9 as follows.
Definition 9. Let = ( , ∪ ) be a decision system, ∈ (0, 1], and let be any subset of conditional attributes; is referred to as a decision-monotonicity reduct in if and only if is the minimal set of conditional attributes, which preserves ( , ) ( ) ⊆ ( , ) ( ), for each ∈ .
Let be a decision system, ∈ (0, 1], and let be any subset of conditional attributes and ∈ ; we define the following coefficients: sig in ( , , ) ⋅ ; sig out ( , , ) ⋅ , where and are the numbers of objects and decision classes, respectively, and Step 6.
Algorithm 1: Heuristic algorithm for attribute reduction based on decision-monotonicity criterion.
Step 1. Create an initial random population (number = 40); Step 2. Evaluation the population; Step 3. While Number of generations < 100 do Select the fittest chromosomes in the population; Perform crossover on the selected chromosomes to create offspring; Perform mutation on the selected chromosomes; Evaluate the new population;

End
Step 4. Selected the fittest chromosome form current population and output it as .
Algorithm 2: Genetic algorithm for attribute reduction based on cost minimum criterion.
Based on these measures, we can design a heuristic algorithm to compute the decision-monotonicity reduct; the details are shown as in Algorithm 1.

Cost Minimum Criterion Based Reducts.
Cost is one of the important features of the -cut decision-theoretic rough set. In Section 4.1 we have discussed the cost issue of our -cut decision-theoretic rough set. However, in the reduction process, from the viewpoint of cost criterion, we want to obtain a reduct with smaller or smallest cost. Similar to the decision-monotonicity criterion, it is not difficult to introduce the cost criterion into our rough set model.

Definition 10. Let
= ( , ∪ ) be a decision system, ∈ (0, 1], and let be any subset of conditional attributes; is referred to as a cost reduct in if and only if is the minimal set of conditional attributes, which satisfies COST( ) ≤ COST( ), and, for each set ⊂ , COST( ) > COST( ).
In this definition, we want to find a subset of conditional attributes so that the overall decision cost will be decreased or unchanged based on the reduct. In most situations, it is better for the decider to obtain a smaller or smallest cost in the decision procedure. We propose an optimization problem with the objective of minimizing the cost values; the minimum cost can be denoted as follows [3]: Then the optimization problem is described as finding a proper attributes set to make the whole decision cost minimum. Therefore, in the following, we will present a genetic algorithm to compute cost minimum based reducts. The details of genetic algorithm are described in Algorithm 2.
The Scientific World Journal 7

Experimental Analyses.
In this subsection, by experimental analyses, we will illustrate the differences between Algorithms 1 and 2. All the experiments have been carried out on a personal computer with Windows 7, Intel Core 2 DuoT5800 CPU (4.00 GHz), and 4.00 GB memory. The programming language is Matlab 2012b. We download four public data sets from UCI Repository of Machine Learning Databases, which are described in Table 1. In the experiment, 10 different groups of loss functions are randomly generated. Tables 2, 3, 4, and 5 show the experimental results of ( ) rules, ( ) rules, and ( ) rules. The number of these rules is equivalent to the number of objects in positive region, boundary region, and negative region, respectively. This is mainly because each object in positive/boundary/negative region can induce a ( )/( )/( ) decision rule.
Based on these four tables, it is not difficult to draw the following conclusions.
(1) With respect to the original data set, decisionmonotonicity reducts can generate more ( ) rules; this is mainly because the condition of decisionmonotonicity reducts requires that, by reducing attributes, a positive rule is still a positive rule, or a boundary rule is upgraded to a positive rule. This mechanism not only keeps the original ( ) rules unchanged, but also increases the ( ) rules. (2) With respect to the original data set, decisionmonotonicity reducts can generate less ( ) rules; this is mainly because the second condition of decisionmonotonicity reducts requires that, by reducing attributes, a boundary rule is still a boundary rule or is upgraded to a positive rule; that is to say, the number of ( ) rules may be equal to or less than those of original data set.
In order to compare the differences between decisionmonotonicity criterion based reducts and cost minimum criterion based reducts, we conduct the experiments from three aspects, that is, decision costs, approximation qualities, and running times. On the one hand, Figure 1 shows the costs comparisons between these two attribute reduction algorithms; on the other hand, Tables 6, 7, 8, and 9 show the differences between decision-monotonicity criterion based reducts and cost minimum criterion based reducts in approximation qualities and running times, respectively.
In Figure 1, each subfigure is corresponding to a data set. In each subfigure, the -coordinate pertains to different 8 The Scientific World Journal 307 ± 0 3 0 7 ± 0 0 ± 0 0 ± 0 9 2 1 ± 0 9 2 1 ± 0 1.0 307 ± 0 3 0 7 ± 0 0 ± 0 0 ± 0 9 2 1 ± 0 9 2   values of , whereas the -coordinate concerns the values of costs. Through an investigation of Figure 1, it is not difficult to observe that, in all the ten used values of , the decision costs of cost minimum criterion based reducts are the same or lower than those obtained by decisionmonotonicity criterion based reducts. Tables 6 to 9 show the differences between decisionmonotonicity criterion based reducts and cost minimum criterion based reducts in approximation qualities and running times, respectively. It is not difficult to note that, from the viewpoint of approximation qualities, the approximation qualities of decision-monotonicity criterion based reducts are The Scientific World Journal larger than those of cost minimum criterion based reducts at times. However, in most cases, the approximation qualities of cost minimum criterion based reducts are larger than those of decision-monotonicity criterion based reducts. From the point of running times, it is easy to observe that the run times of genetic algorithm are greater than those of heuristic algorithm.
To sum up, we can draw the following conclusions.
(1) From the viewpoint of decision monotonicity, our heuristic algorithm based on decision-monotonicity criterion can generate more ( ) rules and less ( ) rules with respect to the original data set. Such approach not only increases the certainties which are expressed by ( ) rules and ( ) rules, but also decreases the uncertainty coming from ( ) rules. (2) From the viewpoint of decision costs, the generic algorithm based on cost minimum criterion can obtain the lowest decision costs and the largest approximation qualities by comparing with heuristic algorithm based on decision-monotonicity criterion. However, such approach loses the property of decision monotonicity and it wastes larger running times than heuristic algorithm.

Conclusion
In this paper, we have developed a generalized framework of decision-theoretic rough set, which is referred 10 The Scientific World Journal   to as a -cut decision-theoretic rough set. Different from Yao's decision-theoretic rough set model, our model is constructed based on -cut quantitative indiscernibility relation, and it can degenerate to Yao's decision-theoretic rough set with some limitation. Based on the proposed model, we discussed the attribute reductions from two criterions; the experiments show that, on the one hand, the decision-monotonicity criterion based reducts can generate more positive rules and less boundary rules; on the other hand, the cost minimum criterion based reducts can obtain the lowest decision costs with high approximation qualities.
The Scientific World Journal

11
The present study is the first step towards -cut decisiontheoretic rough set. The following are challenges for further research.
(1) -cut decision-theoretic rough set approach to complicated data type, such as interval-valued data, is one of the challenges; incomplete data may be an interesting topic.
(2) The threshold learning of in this paper is also a serious challenge.