In recent years, the theory of decision-theoretic rough set and its applications have been studied, including the attribute reduction problem.
However, most researchers only focus on decision cost instead of test cost.
In this paper, we study the attribute reduction problem with both types of costs in decision-theoretic rough set models.
A new definition of attribute reduct is given, and the attribute reduction is formulated as an optimization problem, which aims to minimize the total cost of classification.
Then both backtracking and heuristic algorithms to the new problem are proposed.
The algorithms are tested on four UCI (University of California, Irvine) datasets.
Experimental results manifest the efficiency and the effectiveness of both algorithms.
This study provides a new insight into the attribute reduction problem in decision-theoretic rough set models.
1. Introduction
We are involved in decision making all the time. Most of the decisions are based on a group of criteria. In this case, decision making is often aimed at finding a proper balance or tradeoff among multiple criteria. There are a series of methods for analyzing multicriteria decision making, such as game theory. Game theory is an effective mathematical method for formulating decision problems as competition between several entities [1]. These entities, or players, aspire to either achieve a dominant position over the other players or cooperate with each other in order to find a position that benefits all [2]. Researchers have accumulated a vast literature on game theory and its applications. For example, recent advances in the study of evolutionary games are reviewed in [3–5], and some strategies in the spatial ultimatum game are discussed in [6, 7], and so on. However, most of these studies do not consider attribute reduction, which can significantly reduce the computation complexity.
Different from the works mentioned above, in rough set theory, attribute reduction is an important concept. It supports the wide applications of rough sets. Moreover, classical rough sets [8–10] and their extensions [11–15] can be used in conflict analysis [16], a field related to decision making and game theory. Decision-theoretic rough sets (DTRS) [12, 13] may be particularly relevant to decision making and benefit from some new insights provided by game theory. In the rough set theory, a concept is usually described by three classification regions: positive region, boundary region, and negative region. The three regions in DTRS are systematically calculated based on a set of loss functions according to Bayesian decision procedure. The loss functions can be interpreted based on practical notions of costs and risks. In DTRS models, an object is classified into a particular region because the cost of classifying it into the region is less than that of classifying it into other regions. The expected cost of classifying a set of objects is called decision cost.
Generally speaking, attribute reduction can be interpreted as a process of finding the minimal set of attributes that can preserve or improve one or several criteria. The minimal set of attributes is called an attribute reduct. Some researchers have investigated the attribute reduction problem in DTRS models. Most of them addressed the problem based on the preservation or extension of the positive region or the nonnegative region [17–19]. However, for DTRS, the regions are nonmonotonic with respect to the set inclusion of attributes [18–20], so it is difficult to evaluate and interpret region-based attribute reduction. To tackle the problem, minimal-decision-cost attribute reduction was discussed in [20]. However, most existing studies of attribute reduction in DTRS only concern decision costs but not test costs.
Test cost is the time, money, or other resources one pays for obtaining a data item of an object. Most of the existing attribute reduction problems assume that the data are already stored in datasets and available without charge. However, data are often not free in reality. Recently, the topic of test costs has drawn our attention due to its broad applications. According to the data models constructed in [21], the issues of test-cost-sensitive attribute reduction have been studied based on classical rough sets [22, 23], neighborhood rough sets [24], covering rough sets [25, 26], and so forth. In these works, both backtracking and heuristic algorithms have been implemented through an open source software Coser [27]. Unfortunately, few works have addressed attribute reduction with test cost in the context of DTRS.
In this paper, we study the cost-sensitive attribute reduction problem for DTRS through considering the tradeoff between test costs and decision costs, which is remarkably related to decision making and game theory. Since the purpose of decisions making is to minimize the cost, the process of attribute reduction should help in minimizing the total cost, namely, the summation of test cost and decision cost. A decreasing average-total-cost attribute reduct is defined, which ensures that the total cost will be decreased or unchanged for decisions making by using the reduct. In view of this, a minimal average-total-cost reduct (MACR) in DTRS models is introduced. An optimization problem is constructed in order to minimize the average total cost. It is a generalization of the minimal-decision-cost attribute reduction problem discussed in [20].
Both backtracking and heuristic algorithms are proposed to deal with the new attribute reduction problem. The backtracking algorithm is designed to find an optimal reduct for small datasets. However, for large datasets, it is not easy to find a minimal cost attribute subset. Therefore, we propose a heuristic algorithm to deal with this problem. To study the performance of both algorithms, experiments are undertaken on four datasets from the UCI library [28] through the software Coser. Experimental results show that the efficiency of the backtracking algorithm is acceptable, especially when the loss functions are not much more than the test costs, while the heuristic algorithm is rather efficient, and it can generate a minimal total cost reduct in most cases. Even if the reduct is not optimal sometimes, it is still acceptable from a statistical perspective. Moreover, both algorithms perform well on classification accuracy with CART and RBF-kernel SVM classifiers. Meanwhile, the number of selected attributes is effectively reduced by the two algorithms.
The rest of the paper is organized as follows. In Section 2, we review the main ideas of DTRS. Section 3 gives a detailed explanation of the minimal-total-cost attribute reduction in DTRS models. An optimization problem is proposed. In Section 4, we present a backtracking algorithm and a heuristic algorithm to address the optimization problem. Experimental settings and results are discussed in Section 5. Section 6 concludes and suggests further research trends.
2. Decision-Theoretic Rough Set Models
In this section, we review some basic notions of DTRS model [12, 13, 17], which presents a theoretical basis for our method.
Definition 1.
A decision system (DS) S is the 5-tuple:
(1)S=(U,C,D,V={Va∣a∈C∪D},I={Ia∣a∈C∪D}),
where U is a finite nonempty set of objects called the universe, C is the set of conditional attributes, D is the set of decision attributes with only discrete values, Va is the set of values for each a∈C∪D, and Ia:U→Va is an information function for each a∈C∪D.
In a decision system, given a set of conditional attributes A⊆C, the equivalence class of an object x with respect to A, namely, {y∈U∣Ia(x)=Ia(y),∀a∈A}, is denoted by [x]A or [x], if it is understood. In DTRS models, the set of states Ω={X,Xc} indicates that an object is in a decision class X and not in X, respectively. The probabilities for these two complement states can be denoted by P(X∣[x])=|X⋂[x]|/|[x]| and P(Xc∣[x])=1-P(X∣[x]). With respect to the three regions: positive region POS(X), boundary region BND(X), and negative region NEG(X), the set of actions regarding the state X is given by 𝒜={aP,aB,aN}, where aP,aB,aN represent the three actions of classifying an object x into the three regions, respectively. Let λPP,λBP, and λNP denote the cost incurred for taking actions aP,aB, and aN, respectively, when an object belongs to X, and λPN, λBN, and λNN denote the cost incurred for taking the same actions when the object does not belong to X. The loss functions regarding the states X and Xc can be expressed as a 2×3 matrix given in Table 1.
The loss function matrix.
aP
aB
aN
X
λPP
λBP
λNP
Xc
λPN
λBN
λNN
Based on the loss functions, the expected costs of taking different actions for objects in [x] can be expressed as
(2)ℜ(aP∣[x])=λPPP(X∣[x])+λPNP(Xc∣[x]),ℜ(aB∣[x])=λBPP(X∣[x])+λBNP(Xc∣[x]),ℜ(aN∣[x])=λNPP(X∣[x])+λNNP(Xc∣[x]).
The Bayesian decision procedure leads to the following minimal-risk decision rules:
If ℜ(aP∣[x])≤ℜ(aB∣[x]) and ℜ(aP∣[x])≤ℜ(aN∣[x]), decide [x]⊆POS(X);
If ℜ(aB∣[x])≤ℜ(aP∣[x]) and ℜ(aB∣[x])≤ℜ(aN∣[x]), decide [x]⊆BND(X);
If ℜ(aN∣[x])≤ℜ(aP∣[x]) and ℜ(aN∣[x])≤ℜ(aB∣[x]), decide [x]⊆NEG(X).
Consider a special kind of loss functions with
(3)λPP≤λBP<λNP,λNN≤λBN<λPN.
That is, the cost of classifying an object x belonging to X into the positive region POS(X) is less than or equal to the cost of classifying x into the boundary region BND(X), and both of these costs are strictly less than the cost of classifying x into the negative region NEG(X). The reverse order of costs is used for classifying an object that does not belong to X. The decision rules can be reexpressed as follows:
If P(X∣[x])≥α and P(X∣[x])≥γ, decide x∈POS(X);
If P(X∣[x])≤α and P(X∣[x])≥β, decide x∈BND(X);
If P(X∣[x])≤β and P(X∣[x])≤γ, decide x∈NEG(X),
where the parameters α, β, and γ are defined as
(4)α=λPN-λBN(λPN-λBN)+(λBP-λPP),β=λBN-λNN(λBN-λNN)+(λNP-λBP),γ=λPN-λNN(λPN-λNN)+(λNP-λPP).
When
(5)(λPN-λBN)(λNP-λBP)>(λBP-λPP)(λBN-λNN),
we have 0≤β<γ<α≤1. After tie-breaking, the simplified rules are obtained as follows:
If P(X∣[x])≥α, decide x∈POS(X);
If β<P(X∣[x])<α, decide x∈BND(X);
If P(X∣[x])≤β, decide x∈NEG(X).
Let πD={D1,D2,…,Dm} denote the partition of the universe U induced by D. Based on the thresholds (α,β), one can divide the universe U into three regions of the decision partition πD:
(6)POSA(α,β)(D)={x∈U∣P(Dmax([x]A)∣[x]A)≥α},BNDA(α,β)(D)={x∈U∣β<P(Dmax([x]A)∣[x]A)<α},NEGA(α,β)(D)={x∈U∣P(Dmax([x]A)∣[x]A)≤β},
where Dmax([x]A)=argmaxDi∈πD{|[x]A⋂Di|/|[x]A|}.
Let p=p(Dmax([x]A)∣[x]A); the Bayesian expected cost of positive rule, boundary rule, and negative rule can be expressed, respectively, as follows:
(7)p·λPP+(1-p)·λPN;p·λBP+(1-p)·λBN;p·λNP+(1-p)·λNN.
3. Minimal-Total-Cost Attribute Reduction in Decision-Theoretic Rough Set Models
In this section, we focus on cost-sensitive attribute reduction based on test costs and decision costs in DTRS models. The objective of attribute reduction is to minimize the total cost through considering a tradeoff between test costs and decision costs. Minimizing the total cost is equal to minimizing the average total cost (ATC), so we study the minimal average-total-cost reduct problem.
Test cost is intrinsic to data. There are a number of test-cost-sensitive decision systems. A corresponding hierarchy consisting of six models was proposed in [21]. Here, we consider only the test-cost-independent decision system, which is the simplest though most widely used model.
Definition 2 (see [<xref ref-type="bibr" rid="B21">21</xref>]).
A test-cost-independent decision system (TCI-DS) S is the 6-tuple:
(8)S=(U,C,D,V,I,tc),
where U, C, D, V, I have the same meanings as in a DS and tc:C→ℝ+ is the test cost function. Test costs are independent of one another; that is, tc(A)=∑a∈Atc(a) for any A⊆C.
By introducing test cost into DTRS models, we can obtain the following definition.
Definition 3.
A test-cost-independent cost-sensitive decision system in DTRS models (DTRS-TCI-CDS) S is the 7-tuple:
(9)S=(U,C,D,V,I,tc,(λij)2×3),
where U,C,D,V,I,tc have the same meanings as in Definition 2 and (λij)2×3 is the loss function matrix listed in Table 1, where i∈{P,B,N} and j∈{P,N}.
An example of DTRS-TCI-CDS is given in Tables 2, 3, and 4. From Table 2 to Table 4, there are a decision system where U={x1,x2,…,x9} and C={a1,a2,…,a6}, a corresponding test cost vector, and a corresponding loss function matrix, respectively.
An example decision system.
a1
a2
a3
a4
a5
a6
D
x1
1
1
1
1
1
1
d1
x2
1
0
1
0
1
1
d1
x3
0
1
1
1
0
0
d2
x4
1
1
1
0
0
1
d2
x5
0
0
1
1
0
1
d2
x6
1
0
1
0
1
1
d3
x7
0
0
0
1
1
0
d3
x8
1
0
1
0
1
1
d3
x9
0
0
1
1
0
1
d3
An example test cost vector.
a1
a2
a3
a4
a5
a6
tc(ai)
$92
$9
$96
$81
$87
$54
An example loss function matrix.
aP
aB
aN
X
480
2895
6095
Xc
7846
4238
373
For a given DTRS-TCI-CDS, A⊆C; the decision cost is composed of the three types of cost formulated in (7), so the decision cost can be expressed as(10)dc(U,A)=∑pi≥α(pi·λPP+(1-pi)·λPN)+∑β<pj<α(pj·λBP+(1-pj)·λBN)+∑pk≤β(pk·λNP+(1-pk)·λNN),
where pi=p(Dmax([xi]A)∣[xi]A). According to (6), we can rewrite the decision cost formulation as
(11)dc(U,A)=∑xi∈POSA(α,β)(D)(pi·λPP+(1-pi)·λPN)+∑xj∈BNDA(α,β)(D)(pj·λBP+(1-pj)·λBN)+∑xk∈NEGA(α,β)(D)(pk·λNP+(1-pk)·λNN).
Obviously, we can obtain the average decision cost as follows:
(12)dc(U,A)¯=dc(U,A)|U|.
Because the test cost of any object is the same for the test set A, the average total cost (ATC) is given by
(13)ATC(U,A)=tc(A)+dc¯(U,A).
Similar to [20], we study the decreasing cost attribute reduction to avoid the interpretation difficulties in region preservation based definitions. The definition of decreasing average-total-cost attribute reduct is presented as follows.
Definition 4.
In a DTRS-TCI-CDS, S=(U,C,D,V,I,tc,(λij)2×3); R⊆C is a decreasing average-total-cost attribute reduct if and only if
ATCR≤ATCC,
∀R′⊂R, ATCR′>ATCR.
According to the definition, we choose the subsets of C which ensure that ATC will be decreased or unchanged for decisions making in the processing of attribute reduction.
In most situations, users want to obtain the smallest total cost in the classification procedure, so we propose an optimization problem with the objective of minimizing average total classification cost. The proper attribute set to make ATC minimal is called minimal average-total-cost reduct (MACR). The optimization problem is, namely, the MACR problem. We define them as follows.
Definition 5.
In a DTRS-TCI-CDS, S=(U,C,D,V,I,tc,(λij)2×3); R⊆C is a MACR if and only if
(14)R=argminA⊆C{ATCA}.
Definition 6.
The MACR problem:
input: S=(U,C,D,V,I,tc,(λij)2×3);
output: A⊆C;
optimization objective: min|ATC(U,A)|.
If we set tc(a)=c for all a∈C where c is a constant, the MACR problem is essentially the minimal-decision-cost attribute reduct problem [20], so the former is a generalization of the latter.
4. Algorithms
Since the MACR problem is a combinational problem and it is not easy to get the optimal solution in a linear time, we use heuristic approach to obtain the approximate optimal solution. However, to evaluate the performance of a heuristic algorithm in terms of the quality of the solution, we should find an optimal reduct first, so an exhaustive algorithm is also needed. In this section, we propose a backtracking algorithm and a δ-weighted heuristic algorithm to address the MACR problem.
4.1. The Backtracking Attribute Reduction Algorithm
The backtracking algorithm is illustrated in Algorithm 1. In order to invoke this backtracking algorithm, several global variables should be explicitly initialized as follows:
R=∅ is a reduct with minimal average total cost;
cmc=dc¯(U,R) is currently minimal average total cost;
l=0 is current level test index lower bound.
<bold>Algorithm 1: </bold>A backtracking algorithm to the MACR problem.
(4) continue; //Pruning for too expensive test costs
(5) end if
(6) if(ATC(U,A)<cmc)then
(7) cmc=ATC(U,A); //Update the minimal average total cost
(8) R=A; //Update the minimal total cost reduct
(9) end if
(10) backtracking (A,i+1);
(11) end for
The backtracking algorithm is denoted as backtracking (R,l). A reduct with minimal ATC will be stored in R at the end of the algorithm execution. Generally, the search space of the attribute reduction algorithm is 2|C|. To reduce the search space, we employ one pruning technique shown in lines 3 through 5 in Algorithm 1. The attribute subset A will be discarded if the test cost of A is not less than the current minimal average total cost (cmc), in that the decision costs are nonnegative in real applications.
Note that total costs may decrease with the addition of attributes, which means that ATC under an attribute set may be less than that under some of its subsets. That is different from the previous works which considered only test cost [25], in which test costs increase when more attributes are selected. The following example gives an intuitive understanding.
Example 7.
Take the DTRS-TCI-CDS listed in Tables 2–4 for example. By computation, we find that ATC is 3974.8 when the selected attribute set A={c4}, while ATC is reduced to 3346.2 when A={c4,c6}.
Therefore, no matter whether the currently selected attribute subset A satisfies ATC(U,A)<cmc or not, A continues expanding to search a minimal ATC, which is shown in line 10 of Algorithm 1.
The δ-weighted heuristic attribute reduction algorithm is listed in Algorithm 2, in which the algorithm framework contains two main steps. Let A denote the set of currently selected attributes. First, we combine the current best attribute subset S⊆(C-A) with A according to the heuristic attribute significance function f(A,S,tc) until A becomes a superreduct. This step is essentially the attribute addition. Then, we delete the attribute a from A to guarantee A with the current minimal total cost.
Lines 4 through 13 contain the key code of the addition step. There are two main differences from those in existing works [22, 25, 29]. One is the heuristic attribute significance function. We propose the δ-weighted attribute significance function as follows:
(15)f(A,S,tc)=(POSA∪S(α,β)(D)-POSA(α,β)(D))×(1+1tc(ak1)·tc(ak2)⋯tc(ak|S|)|S|),
where aki is the attribute in S and tc(aki) is the test cost of aki.
The other difference is the computation steps. At first, the dimension of S is 1, which means that we test current left attributes one by one. However, since the positive region may shrink with the addition of attributes in DTRS models [19], for all ai∈CA, (POSA∪{ai}(α,β)(D)-POSA(α,β)(D)) may not be more than 0 at the same time. In this case, we cannot choose a suitable attribute to make current POSA(α,β)(D) expand to reach POSA(α,β)(D)⊇POSC(α,β)(D). To address this situation, we gradually increase the dimension of S, namely, consider multiattributes simultaneously, and compute the corresponding values of attribute significance function until at least one value is more than 0.
5. Experiments
In this section, the performance of our two algorithms is studied. We try to answer the following questions by experimentation.
Are both the backtracking algorithm and the heuristic algorithm efficient?
Is the heuristic algorithm effective for the MACR problem?
Are both algorithms appropriate for classification?
5.1. Data Generation
Experiments are undertaken on four datasets obtained from the UCI library. The basic information of the datasets is listed in Table 5, where |C| is the number of condition attributes, |U| is the number of instances, and D is the name of the decision. Note that missing values in these datasets (e.g., in voting dataset) are treated as a particular value. That is, ? is equal to itself and unequal to any other value.
Dataset information.
Name
Domain
|C|
U
D={d}
Tic-tac-toe
Game
9
958
Class
Voting
Society
16
435
Vote
Zoo
Zoology
16
101
Type
Mushroom
Botany
22
8124
Class
Since there are no intrinsic test costs and loss functions in the datasets mentioned above, we will create these data for experimentations. First, we generate test costs that are always represented by positive integers. Let ai be a condition attribute; tc(ai) is set to a random number in [1,100] subject to the uniform distribution discussed in [22]. Then, we produce loss functions λij(i∈{P,B,N},j∈{P,N}), which are random nonnegative integers satisfying (3) and (5). Since the loss functions are often more than test costs in real life, we set the average of λij to be in [100,5000]. Of course, the assumptions of cost value could be easily changed if necessary. To observe whether the algorithm efficiency is influenced by the ratio of loss functions to test costs, experiments shown below are undertaken with two groups of cost settings for each dataset listed in Table 5. Each group contains 100 different cost settings. Test costs in both groups are the same, but the loss functions are different. The average values of loss functions (ALF) in group 1 and group 2 are around 500 and 3000, respectively. Experiments are undertaken on a PC with Intel 2.20 GHz CPU and 4 GB memory.
5.2. Efficiencies of the Two Algorithms
We study the efficiencies of both algorithms using two metrics. One is the number of backtrack steps Algorithm 1 is invoked. Comparing it with the size of search space [25], the efficiency of the backtracking algorithm is investigated. The other is the run-time comparison between the two algorithms. The metric is used to study the efficiency of the heuristic algorithm. The search space size and the average number of backtracking steps for Algorithm 1 are depicted in Table 6, and the average run-time for both algorithms is shown in Table 7, where the unit of run-time is 1 ms.
The average number of backtrack steps for Algorithm 1.
Dataset
Search space size
Average backtrack steps
ALF≈500
ALF≈3000
Tic-tac-toe
29
130.68
406.34
Voting
216
774.18
4873.3
Zoo
216
547.05
5134.96
Mushroom
222
3758.50
307682.57
Average run-time comparison.
Dataset
ALF≈500
ALF≈3000
Algorithm 1
Algorithm 2
Algorithm 1
Algorithm 2
Tic-tac-toe
591.7
206.7
1376.14
177.16
Voting
975.93
203.77
3845.7
162
Zoo
252.77
1.73
2327.02
1.42
Mushroom
9541.24
807.91
64172.5
835.72
From the results, we note the following.
In both groups, the number of backtrack steps is less than the search space size, which manifests the effectiveness of the pruning technique in Algorithm 1.
With the increasing of ALF, both the backtrack steps and the run-time of Algorithm 1 grow, which means that the efficiency of the backtracking algorithm is influenced by the ratio of loss functions to test costs. The reason is that, when the loss functions are much more than test costs, currently minimal ATC, namely, cmc in Algorithm 1, is also high compared to current test costs. In this case, the pruning technique shown in lines 3 to 4 of Algorithm 1 cannot make effect.
The run-time of Algorithm 2 is small compared with Algorithm 1, especially for the dataset Zoo. Therefore, the heuristic algorithm is very efficient. Moreover, the heuristic algorithm is stable in terms of run-time with the increasing of ALF.
In a word, the heuristic algorithm is good at the efficiency. Although the backtracking algorithm is not very efficient sometimes, it is still needed to evaluate the performance of a heuristic algorithm in terms of the quality of the solution.
5.3. Effectiveness of the Two Algorithms
In this part, we observe the effectiveness of both algorithms by using four metrics. First, two metrics defined in [22], namely, finding optimal factor (FOF) and average exceeding factor (AEF), are computed to measure the performance of the heuristic algorithm from the perspective of cost. In the computations, the results of the backtracking algorithm are used to evaluate the effectiveness of the heuristic algorithm. The results of the two metrics are shown in Figure 1.
Assuming that ALF is around 500 and 3000, respectively, the effectiveness of the heuristic algorithm is measured by using two metrics. (a) Finding optimal factor (FOF) is the fraction of successful searches of an optimal reduct in experiments. The higher the FOF is, the better the heuristic algorithm is. It is shown that all FOF are above 0.5. (b) Average exceeding factor (AEF) is the average value of the fractions beyond the minimal-average-total costs. The lower the AEF, the better the algorithm. All AEF are below 0.1 in the figure.
From the results, we note the following.
The values of FOF and AEF are not significantly different between ALF≈500 and ALF≈3000. Maybe we can conclude that the performance of the heuristic algorithm is little influenced by the ratio of loss functions to test costs.
All FOF are above 0.5, and all AEF are below 0.1. In other words, the results are acceptable.
Then, we compare the classification performances of the original data and the reduced data obtained by our two algorithms based on 10-fold cross validation. CART and RBF-kernel SVM are used as learning algorithms, respectively. The results are depicted in Tables 8-9. We also present the comparison of the average numbers of selected attributes, which is shown in Table 10.
Classification performance comparison with CART classifier.
Dataset
Raw data
ALF≈500
ALF≈3000
Algorithm 1
Algorithm 2
Algorithm 1
Algorithm 2
Tic-tac-toe
93.72%
87.48%
87.18%
91.6%
89.4%
Voting
95.47%
90.34%
86.37%
94.07%
87.38%
Zoo
90.69%
82.85%
78.92%
89.65%
85.52%
Mushroom
99.96%
95.18%
90.51%
95.82%
90.29%
Classification performance comparison with RBF-kernel SVM classifier.
Dataset
Raw data
ALF≈500
ALF≈3000
Algorithm 1
Algorithm 2
Algorithm 1
Algorithm 2
Tic-tac-toe
87.59%
82.16%
81.8%
85.97%
84.28%
Voting
95.38%
87.35%
83.4%
92.13%
84.36%
Zoo
90.1%
83.73%
79.27%
89.52%
85.8%
Mushroom
100%
97.21%
90.38%
95.79%
91.88%
The comparison of the average numbers of selected attributes.
Dataset
Raw data
ALF≈500
ALF≈3000
Algorithm 1
Algorithm 2
Algorithm 1
Algorithm 2
Tic-tac-toe
9
1.31
1.34
5.59
4.11
Voting
16
1.29
1.06
2.6
1.6
Zoo
16
1.98
1.41
3.51
2.59
Mushroom
22
1.57
1.26
6.95
5.32
From the results, we observe the following.
The values of classification accuracy by our algorithms are a little lower than those by the raw data, but the numbers of selected attributes are effectively reduced, which is consistent with the essence of DTRS models. Different from the classical rough set, classification error is acceptable within a certain range according to the thresholds in DTRS models. Consequently, the reduction effectiveness is improved.
With the increasing of ALF, all numbers of selected attributes grow, and the classification performance of most datasets improves. This means that the tolerability of classification error is decreasing when the classification costs increase.
For all datasets, the classification performance of Algorithm 1 is a little better than that of Algorithm 2; meanwhile, the numbers of selected attributes are more in most cases.
6. Conclusions
In this paper, we address cost-sensitive attribute reduction problem in DTRS models. By considering the tradeoff of decision costs and test costs, minimal average-total-cost attribute reduct is defined, and the corresponding optimization problem is proposed. Both backtracking and heuristic algorithms are designed to deal with the optimization problem. Experimental results demonstrate the efficiency and the effectiveness of both algorithms. By combining test costs with the existing elements in DTRS models, such as the loss functions and the probabilistic approaches, our model is practical in real applications.
The following research topics deserve further investigation.
The MACR problem could be addressed again based on more complicated test-cost-sensitive decision systems (DS), such as the simple common-test-cost DS and the complex common-test-cost DS [21]. The corresponding algorithms may also be more complicated.
Sometimes the costs one could afford are limited. We could consider the attribute reduction problem with test cost constraint or total cost constraint in DTRS models.
Recently, from the viewpoint of rough set theory, Yao [30, 31] has discussed three-way decisions, which may have many real-world applications. One could explore the cost-sensitive attribute reduction problem for three-way decisions with decision-theoretic rough sets.
In summary, this study suggests new research trends concerning decision-theoretic rough set theory, attribute reduction problem, and cost-sensitive learning applications.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is in part supported by the National Science Foundation of China under Grant nos. 61379089, 61379049, and 61170128 and the Education Department of Fujian Province under Grant no. JA12224.
von NeumannJ.MorgensternO.FudenbergD.TiroleJ.PercM.SzolnokiA.Coevolutionary games—a mini reviewPercM.Gómez-GardeñesJ. G.SzolnokiA.FloríaL. M.MorenoY.Evolutionary dynamics of group interactions on structured populations: a reviewPercM.GrigoliniP.Collective behavior and evolutionary games–-an introductionSzolnokiA.PercM.SzabóG.Defense mechanisms of empathetic players in the spatial ultimatum gameSzolnokiA.PercM.SzabóG.Accuracy in strategy imitations promotes the evolution of fairness in the spatial ultimatum gamePawlakZ.Rough setsPawlakZ.SkowronA.Rough sets: some extensionsPawlakZ.SkowronA.Rudiments of rough setsZiarkoW.Variable precision rough set modelYaoY. Y.WongS. K. M.LingrasP.A decision-theoretic rough set modelYaoY. Y.WongS. K. M.A decision theoretic framework for approximating conceptsGrecoS.MatarazzoB.SlowinskiR.Rough set approach to decisions under risk2005Proceedings of the 2nd International Conference on Rough Sets and Current Trends in Computing (RSCTC '00)2001160169Lecture Notes in Computer ScienceGrecoS.MatarazzoB.SlowinskiR.Rough sets theory for multicriteria decision analysisSkowronA.RamannaS.PetersJ. F.Conflict analysis and information systems: a rough set approachYaoY.ZhaoY.Attribute reduction in decision-theoretic rough set modelsZhaoY.WongS. K. M.YaoY.A note on attribute reduction in the decision-theoretic rough set modelLiH.ZhouX.ZhaoJ.LiuD.Attribute reduction in decision-theoretic rough set model: a further investigationJiaX.LiaoW.TangZ.ShangL.Minimum cost attribute reduction in decision-theoretic rough set modelsMinF.LiuQ.A hierarchical model for test-cost-sensitive decision systemsMinF.HeH.QianY.ZhuW.Test-cost-sensitive attribute reductionMinF.HuQ.ZhuW.Feature selection with test cost constraintLiaoS.LiuJ.MinF.ZhuW.Minimal-test-cost reduct problem on neighborhood decision systemsMinF.ZhuW.Attribute reduction of data with error ranges and test costsZhaoH.MinF.ZhuW.Test-cost-sensitive attribute reduction of data with normal distribution measurement errorsMinF.ZhuW.ZhaoH.PanG.Coser: cost-senstive rough sets2011, http://grc.fjzs.edu.cn/~fmin/coser/BlakeC. L.MerzC. J.UCI repository of machine learning databases1998, http://www.ics.uci.edu/~mlearn/mlrepository.htmlZhaoH.MinF.ZhuW.Cost-sensitive feature selection of numeric data with measurement errorsYaoY.Three-way decisions with probabilistic rough setsYaoY.The superiority of three-way decisions in probabilistic rough set models