MPE Mathematical Problems in Engineering 1563-5147 1024-123X Hindawi Publishing Corporation 875918 10.1155/2014/875918 875918 Research Article Cost-Sensitive Attribute Reduction in Decision-Theoretic Rough Set Models Liao Shujiao 1,2 Zhu Qingxin 1 http://orcid.org/0000-0002-3290-1036 Min Fan 3 Perc Matjaž 1 School of Information and Software Engineering University of Electronic Science and Technology of China Chengdu 611731 China uestc.edu.cn 2 School of Mathematics and Statistics Minnan Normal University Zhangzhou 363000 China mnnu.net 3 School of Computer Science Southwest Petroleum University Chengdu 610500 China swpi.edu.cn 2014 2722014 2014 23 10 2013 12 01 2014 12 01 2014 27 2 2014 2014 Copyright © 2014 Shujiao Liao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In recent years, the theory of decision-theoretic rough set and its applications have been studied, including the attribute reduction problem. However, most researchers only focus on decision cost instead of test cost. In this paper, we study the attribute reduction problem with both types of costs in decision-theoretic rough set models. A new definition of attribute reduct is given, and the attribute reduction is formulated as an optimization problem, which aims to minimize the total cost of classification. Then both backtracking and heuristic algorithms to the new problem are proposed. The algorithms are tested on four UCI (University of California, Irvine) datasets. Experimental results manifest the efficiency and the effectiveness of both algorithms. This study provides a new insight into the attribute reduction problem in decision-theoretic rough set models.

1. Introduction

We are involved in decision making all the time. Most of the decisions are based on a group of criteria. In this case, decision making is often aimed at finding a proper balance or tradeoff among multiple criteria. There are a series of methods for analyzing multicriteria decision making, such as game theory. Game theory is an effective mathematical method for formulating decision problems as competition between several entities . These entities, or players, aspire to either achieve a dominant position over the other players or cooperate with each other in order to find a position that benefits all . Researchers have accumulated a vast literature on game theory and its applications. For example, recent advances in the study of evolutionary games are reviewed in , and some strategies in the spatial ultimatum game are discussed in [6, 7], and so on. However, most of these studies do not consider attribute reduction, which can significantly reduce the computation complexity.

Different from the works mentioned above, in rough set theory, attribute reduction is an important concept. It supports the wide applications of rough sets. Moreover, classical rough sets  and their extensions  can be used in conflict analysis , a field related to decision making and game theory. Decision-theoretic rough sets (DTRS) [12, 13] may be particularly relevant to decision making and benefit from some new insights provided by game theory. In the rough set theory, a concept is usually described by three classification regions: positive region, boundary region, and negative region. The three regions in DTRS are systematically calculated based on a set of loss functions according to Bayesian decision procedure. The loss functions can be interpreted based on practical notions of costs and risks. In DTRS models, an object is classified into a particular region because the cost of classifying it into the region is less than that of classifying it into other regions. The expected cost of classifying a set of objects is called decision cost.

Generally speaking, attribute reduction can be interpreted as a process of finding the minimal set of attributes that can preserve or improve one or several criteria. The minimal set of attributes is called an attribute reduct. Some researchers have investigated the attribute reduction problem in DTRS models. Most of them addressed the problem based on the preservation or extension of the positive region or the nonnegative region . However, for DTRS, the regions are nonmonotonic with respect to the set inclusion of attributes , so it is difficult to evaluate and interpret region-based attribute reduction. To tackle the problem, minimal-decision-cost attribute reduction was discussed in . However, most existing studies of attribute reduction in DTRS only concern decision costs but not test costs.

Test cost is the time, money, or other resources one pays for obtaining a data item of an object. Most of the existing attribute reduction problems assume that the data are already stored in datasets and available without charge. However, data are often not free in reality. Recently, the topic of test costs has drawn our attention due to its broad applications. According to the data models constructed in , the issues of test-cost-sensitive attribute reduction have been studied based on classical rough sets [22, 23], neighborhood rough sets , covering rough sets [25, 26], and so forth. In these works, both backtracking and heuristic algorithms have been implemented through an open source software Coser . Unfortunately, few works have addressed attribute reduction with test cost in the context of DTRS.

In this paper, we study the cost-sensitive attribute reduction problem for DTRS through considering the tradeoff between test costs and decision costs, which is remarkably related to decision making and game theory. Since the purpose of decisions making is to minimize the cost, the process of attribute reduction should help in minimizing the total cost, namely, the summation of test cost and decision cost. A decreasing average-total-cost attribute reduct is defined, which ensures that the total cost will be decreased or unchanged for decisions making by using the reduct. In view of this, a minimal average-total-cost reduct (MACR) in DTRS models is introduced. An optimization problem is constructed in order to minimize the average total cost. It is a generalization of the minimal-decision-cost attribute reduction problem discussed in .

Both backtracking and heuristic algorithms are proposed to deal with the new attribute reduction problem. The backtracking algorithm is designed to find an optimal reduct for small datasets. However, for large datasets, it is not easy to find a minimal cost attribute subset. Therefore, we propose a heuristic algorithm to deal with this problem. To study the performance of both algorithms, experiments are undertaken on four datasets from the UCI library  through the software Coser. Experimental results show that the efficiency of the backtracking algorithm is acceptable, especially when the loss functions are not much more than the test costs, while the heuristic algorithm is rather efficient, and it can generate a minimal total cost reduct in most cases. Even if the reduct is not optimal sometimes, it is still acceptable from a statistical perspective. Moreover, both algorithms perform well on classification accuracy with CART and RBF-kernel SVM classifiers. Meanwhile, the number of selected attributes is effectively reduced by the two algorithms.

The rest of the paper is organized as follows. In Section 2, we review the main ideas of DTRS. Section 3 gives a detailed explanation of the minimal-total-cost attribute reduction in DTRS models. An optimization problem is proposed. In Section 4, we present a backtracking algorithm and a heuristic algorithm to address the optimization problem. Experimental settings and results are discussed in Section 5. Section 6 concludes and suggests further research trends.

2. Decision-Theoretic Rough Set Models

In this section, we review some basic notions of DTRS model [12, 13, 17], which presents a theoretical basis for our method.

Definition 1.

A decision system (DS) S is the 5-tuple: (1) S = ( U , C , D , V = { V a a C D } , I = { I a a C D } ) , where U is a finite nonempty set of objects called the universe, C is the set of conditional attributes, D is the set of decision attributes with only discrete values, V a is the set of values for each a C D , and I a : U V a is an information function for each a C D .

In a decision system, given a set of conditional attributes A C , the equivalence class of an object x with respect to A , namely, { y U I a ( x ) = I a ( y ) , a A } , is denoted by [ x ] A or [ x ] , if it is understood. In DTRS models, the set of states Ω = { X , X c } indicates that an object is in a decision class X and not in X , respectively. The probabilities for these two complement states can be denoted by P ( X [ x ] ) = | X [ x ] | / | [ x ] | and P ( X c [ x ] ) = 1 - P ( X [ x ] ) . With respect to the three regions: positive region POS ( X ) , boundary region BND ( X ) , and negative region NEG ( X ) , the set of actions regarding the state X is given by 𝒜 = { a P , a B , a N } , where a P , a B , a N represent the three actions of classifying an object x into the three regions, respectively. Let λ P P , λ B P , and λ N P denote the cost incurred for taking actions a P , a B , and a N , respectively, when an object belongs to X , and λ P N , λ B N , and λ N N denote the cost incurred for taking the same actions when the object does not belong to X . The loss functions regarding the states X and X c can be expressed as a 2    ×    3 matrix given in Table 1.

The loss function matrix.

a P a B a N
X λ P P λ B P λ N P
X c λ P N λ B N λ N N

Based on the loss functions, the expected costs of taking different actions for objects in [ x ] can be expressed as (2) ( a P [ x ] ) = λ P P P ( X [ x ] ) + λ P N P ( X c [ x ] ) , ( a B [ x ] ) = λ B P P ( X [ x ] ) + λ B N P ( X c [ x ] ) , ( a N [ x ] ) = λ N P P ( X [ x ] ) + λ N N P ( X c [ x ] ) . The Bayesian decision procedure leads to the following minimal-risk decision rules:

If ( a P [ x ] ) ( a B [ x ] ) and ( a P [ x ] ) ( a N [ x ] ) , decide [ x ] POS ( X ) ;

If ( a B [ x ] ) ( a P [ x ] ) and ( a B [ x ] ) ( a N [ x ] ) , decide [ x ] BND ( X ) ;

If ( a N [ x ] ) ( a P [ x ] ) and ( a N [ x ] ) ( a B [ x ] ) , decide [ x ] NEG ( X ) .

Consider a special kind of loss functions with (3) λ P P λ B P < λ N P , λ N N λ B N < λ P N . That is, the cost of classifying an object x belonging to X into the positive region POS ( X ) is less than or equal to the cost of classifying x into the boundary region BND ( X ) , and both of these costs are strictly less than the cost of classifying x into the negative region NEG ( X ) . The reverse order of costs is used for classifying an object that does not belong to X . The decision rules can be reexpressed as follows:

If P ( X [ x ] ) α and P ( X [ x ] ) γ , decide x POS ( X ) ;

If P ( X [ x ] ) α and P ( X [ x ] ) β , decide x BND ( X ) ;

If P ( X [ x ] ) β and P ( X [ x ] ) γ , decide x NEG ( X ) ,

where the parameters α , β , and γ are defined as (4) α = λ P N - λ B N ( λ P N - λ B N ) + ( λ B P - λ P P ) , β = λ B N - λ N N ( λ B N - λ N N ) + ( λ N P - λ B P ) , γ = λ P N - λ N N ( λ P N - λ N N ) + ( λ N P - λ P P ) . When (5) ( λ P N - λ B N ) ( λ N P - λ B P ) > ( λ B P - λ P P ) ( λ B N - λ N N ) , we have 0 β < γ < α 1 . After tie-breaking, the simplified rules are obtained as follows:

If P ( X [ x ] ) α , decide x POS ( X ) ;

If β < P ( X [ x ] ) < α , decide x BND ( X ) ;

If P ( X [ x ] ) β , decide x NEG ( X ) .

Let π D = { D 1 , D 2 , , D m } denote the partition of the universe U induced by D . Based on the thresholds ( α , β ) , one can divide the universe U into three regions of the decision partition π D : (6) PO S A ( α , β ) ( D ) = { x U P ( D max ( [ x ] A ) [ x ] A ) α } , BN D A ( α , β ) ( D ) = { x U β < P ( D max ( [ x ] A ) [ x ] A ) < α } , NE G A ( α , β ) ( D ) = { x U P ( D max ( [ x ] A ) [ x ] A ) β } , where D max ( [ x ] A ) = arg max D i π D { | [ x ] A D i | / | [ x ] A | } .

Let p = p ( D max ( [ x ] A ) [ x ] A ) ; the Bayesian expected cost of positive rule, boundary rule, and negative rule can be expressed, respectively, as follows: (7) p · λ P P + ( 1 - p ) · λ P N ; p · λ B P + ( 1 - p ) · λ B N ; p · λ N P + ( 1 - p ) · λ N N .

3. Minimal-Total-Cost Attribute Reduction in Decision-Theoretic Rough Set Models

In this section, we focus on cost-sensitive attribute reduction based on test costs and decision costs in DTRS models. The objective of attribute reduction is to minimize the total cost through considering a tradeoff between test costs and decision costs. Minimizing the total cost is equal to minimizing the average total cost (ATC), so we study the minimal average-total-cost reduct problem.

Test cost is intrinsic to data. There are a number of test-cost-sensitive decision systems. A corresponding hierarchy consisting of six models was proposed in . Here, we consider only the test-cost-independent decision system, which is the simplest though most widely used model.

Definition 2 (see [<xref ref-type="bibr" rid="B21">21</xref>]).

A test-cost-independent decision system (TCI-DS) S is the 6-tuple: (8) S = ( U , C , D , V , I , t c ) , where U , C , D , V , I have the same meanings as in a DS and t c : C + is the test cost function. Test costs are independent of one another; that is, t c ( A ) = a A t c ( a ) for any A C .

By introducing test cost into DTRS models, we can obtain the following definition.

Definition 3.

A test-cost-independent cost-sensitive decision system in DTRS models (DTRS-TCI-CDS) S is the 7-tuple: (9) S = ( U , C , D , V , I , t c , ( λ i j ) 2    ×    3 ) , where U , C , D , V , I , t c have the same meanings as in Definition 2 and ( λ i j ) 2    ×    3 is the loss function matrix listed in Table 1, where i { P , B , N } and j { P , N } .

An example of DTRS-TCI-CDS is given in Tables 2, 3, and 4. From Table 2 to Table 4, there are a decision system where U = { x 1 , x 2 , , x 9 } and C = { a 1 , a 2 , , a 6 } , a corresponding test cost vector, and a corresponding loss function matrix, respectively.

An example decision system.

a 1 a 2 a 3 a 4 a 5 a 6 D
x 1 1 1 1 1 1 1 d 1
x 2 1 0 1 0 1 1 d 1
x 3 0 1 1 1 0 0 d 2
x 4 1 1 1 0 0 1 d 2
x 5 0 0 1 1 0 1 d 2
x 6 1 0 1 0 1 1 d 3
x 7 0 0 0 1 1 0 d 3
x 8 1 0 1 0 1 1 d 3
x 9 0 0 1 1 0 1 d 3

An example test cost vector.

a 1 a 2 a 3 a 4 a 5 a 6
t c ( a i ) \$92 \$9 \$96 \$81 \$87 \$54

An example loss function matrix.

a P a B a N
X 480 2895 6095
X c 7846 4238 373

For a given DTRS-TCI-CDS, A C ; the decision cost is composed of the three types of cost formulated in (7), so the decision cost can be expressed as (10) d c ( U , A ) = p i α ( p i · λ P P + ( 1 - p i ) · λ P N ) + β < p j < α ( p j · λ B P + ( 1 - p j ) · λ B N ) + p k β ( p k · λ N P + ( 1 - p k ) · λ N N ) , where p i = p ( D max ( [ x i ] A ) [ x i ] A ) . According to (6), we can rewrite the decision cost formulation as (11) d c ( U , A ) = x i PO S A ( α , β ) ( D ) ( p i · λ P P + ( 1 - p i ) · λ P N ) + x j BN D A ( α , β ) ( D ) ( p j · λ B P + ( 1 - p j ) · λ B N ) + x k NE G A ( α , β ) ( D ) ( p k · λ N P + ( 1 - p k ) · λ N N ) .

Obviously, we can obtain the average decision cost as follows: (12) d c ( U , A ) ¯ = d c ( U , A ) | U | . Because the test cost of any object is the same for the test set A , the average total cost (ATC) is given by (13) ATC ( U , A ) = t c ( A ) + d c ¯ ( U , A ) .

Similar to , we study the decreasing cost attribute reduction to avoid the interpretation difficulties in region preservation based definitions. The definition of decreasing average-total-cost attribute reduct is presented as follows.

Definition 4.

In a DTRS-TCI-CDS,   S = ( U , C , D , V , I , t c , ( λ i j ) 2 × 3 ) ;   R C is a decreasing average-total-cost attribute reduct if and only if

AT C R AT C C ,

R R , AT C R > AT C R .

According to the definition, we choose the subsets of C which ensure that ATC will be decreased or unchanged for decisions making in the processing of attribute reduction.

In most situations, users want to obtain the smallest total cost in the classification procedure, so we propose an optimization problem with the objective of minimizing average total classification cost. The proper attribute set to make ATC minimal is called minimal average-total-cost reduct (MACR). The optimization problem is, namely, the MACR problem. We define them as follows.

Definition 5.

In a DTRS-TCI-CDS,   S = ( U , C , D , V , I , t c , ( λ i j ) 2 × 3 ) ;   R C is a MACR if and only if (14) R = argmin A C { AT C A } .

Definition 6.

The MACR problem:

input: S = ( U , C , D , V , I , t c , ( λ i j ) 2 × 3 ) ;

output: A C ;

optimization objective: min | ATC ( U , A ) | .

If we set t c ( a ) = c for all a C where c is a constant, the MACR problem is essentially the minimal-decision-cost attribute reduct problem , so the former is a generalization of the latter.

4. Algorithms

Since the MACR problem is a combinational problem and it is not easy to get the optimal solution in a linear time, we use heuristic approach to obtain the approximate optimal solution. However, to evaluate the performance of a heuristic algorithm in terms of the quality of the solution, we should find an optimal reduct first, so an exhaustive algorithm is also needed. In this section, we propose a backtracking algorithm and a δ -weighted heuristic algorithm to address the MACR problem.

4.1. The Backtracking Attribute Reduction Algorithm

The backtracking algorithm is illustrated in Algorithm 1. In order to invoke this backtracking algorithm, several global variables should be explicitly initialized as follows:

R = is a reduct with minimal average total cost;

cmc = d c ¯ ( U , R ) is currently minimal average total cost;

l = 0 is current level test index lower bound.

<bold>Algorithm 1: </bold>A backtracking algorithm to the MACR problem.

Input: ( U , C , D , { V a } , { I a } , t c , ( λ i j ) 2 × 3 ) , select tests R ,

current level test index lower bound l

Output: R and cmc , they are global variables

Method: backtracking

(1) for ( i = l ; i < | C | ; i ++) do

(2)    A = R { a i }

(3)   if ( t c ( A ) cmc ) then

(4)    continue; //Pruning for too expensive test costs

(5)   end if

(6)  if   ( ATC ( U , A ) < cmc )   then

(7)      c m c = ATC ( U , A ) ; //Update the minimal average total cost

(8)      R = A ; //Update the minimal total cost reduct

(9)   end if

(10) backtracking ( A , i + 1 );

(11) end for

The backtracking algorithm is denoted as backtracking ( R , l ). A reduct with minimal ATC will be stored in R at the end of the algorithm execution. Generally, the search space of the attribute reduction algorithm is 2 | C | . To reduce the search space, we employ one pruning technique shown in lines 3 through 5 in Algorithm 1. The attribute subset A will be discarded if the test cost of A is not less than the current minimal average total cost ( cmc ), in that the decision costs are nonnegative in real applications.

Note that total costs may decrease with the addition of attributes, which means that ATC under an attribute set may be less than that under some of its subsets. That is different from the previous works which considered only test cost , in which test costs increase when more attributes are selected. The following example gives an intuitive understanding.

Example 7.

Take the DTRS-TCI-CDS listed in Tables 24 for example. By computation, we find that ATC is 3974.8 when the selected attribute set A = { c 4 } , while ATC is reduced to 3346.2 when A = { c 4 , c 6 } .

Therefore, no matter whether the currently selected attribute subset A satisfies ATC ( U , A ) < cmc or not, A continues expanding to search a minimal ATC, which is shown in line 10 of Algorithm 1.

4.2. The <inline-formula> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M208"> <mml:mrow> <mml:mi>δ</mml:mi></mml:mrow> </mml:math></inline-formula>-Weighted Heuristic Attribute Reduction Algorithm

The δ -weighted heuristic attribute reduction algorithm is listed in Algorithm 2, in which the algorithm framework contains two main steps. Let A denote the set of currently selected attributes. First, we combine the current best attribute subset S ( C - A ) with A according to the heuristic attribute significance function f ( A , S , t c ) until A becomes a superreduct. This step is essentially the attribute addition. Then, we delete the attribute a from A to guarantee A with the current minimal total cost.

<bold>Algorithm 2: </bold>An addition-deletion cost-sensitive attribute reduction algorithm.

Input: ( U , C , D , { V a } , { I a } , t c , ( λ i j ) 2 × 3 )

Output: The reduct A

Method:

(1) A = ;

(2)  C A = C ;

(3) while   ( P O S A ( α , β ) ( D ) P O S C ( α , β ) ( D ) ) do

(4)  n = 1 ; // n controls the dimension of S

(5) for  (each S C A )  do

(6)  if    ( | S | = n )   then

(7)   Compute f ( A , S , t c ) ;

(8)  end if

(9)  end for

(10) Select S with the maximal f ( A , S , t c ) ;

(11)  if    ( f ( A , S , t c ) 0 )   then

(12)   n ++;

(13)  Go to line 5;

(14)  else

(15)   A = A S ; C A = C A - S ;

(16)  end if

(17) end while

//Deletion

(18) while   ( ATC ( U , A ) > ATC ( U , A - { a } ) )   do

(19)  for  (each a A )  do

(20)  Compute ATC ( U , A - { a } ) ;

(21)  end for

(22)  Select a with the minimal ATC ( U , A - { a } ) ;

(23)   A = A - { a } ;

(24) end while

(25) return A ;

Lines 4 through 13 contain the key code of the addition step. There are two main differences from those in existing works [22, 25, 29]. One is the heuristic attribute significance function. We propose the δ -weighted attribute significance function as follows: (15) f ( A , S , t c ) = ( PO S A S ( α , β ) ( D ) - PO S A ( α , β ) ( D ) ) × ( 1 + 1 t c ( a k 1 ) · t c ( a k 2 ) t c ( a k | S | ) | S | ) , where a k i is the attribute in S and t c ( a k i ) is the test cost of a k i .

The other difference is the computation steps. At first, the dimension of S is 1, which means that we test current left attributes one by one. However, since the positive region may shrink with the addition of attributes in DTRS models , for all a i C A , ( PO S A { a i } ( α , β ) ( D ) - PO S A ( α , β ) ( D ) ) may not be more than 0 at the same time. In this case, we cannot choose a suitable attribute to make current PO S A ( α , β ) ( D ) expand to reach PO S A ( α , β ) ( D ) PO S C ( α , β ) ( D ) . To address this situation, we gradually increase the dimension of S , namely, consider multiattributes simultaneously, and compute the corresponding values of attribute significance function until at least one value is more than 0 .

5. Experiments

In this section, the performance of our two algorithms is studied. We try to answer the following questions by experimentation.

Are both the backtracking algorithm and the heuristic algorithm efficient?

Is the heuristic algorithm effective for the MACR problem?

Are both algorithms appropriate for classification?

5.1. Data Generation

Experiments are undertaken on four datasets obtained from the UCI library. The basic information of the datasets is listed in Table 5, where | C | is the number of condition attributes, | U | is the number of instances, and D is the name of the decision. Note that missing values in these datasets (e.g., in voting dataset) are treated as a particular value. That is, ? is equal to itself and unequal to any other value.

Dataset information.

Name Domain | C | U D = { d }
Tic-tac-toe Game 9 958 Class
Voting Society 16 435 Vote
Zoo Zoology 16 101 Type
Mushroom Botany 22 8124 Class

Since there are no intrinsic test costs and loss functions in the datasets mentioned above, we will create these data for experimentations. First, we generate test costs that are always represented by positive integers. Let a i be a condition attribute; t c ( a i ) is set to a random number in [ 1,100 ] subject to the uniform distribution discussed in . Then, we produce loss functions λ i j ( i { P , B , N } , j { P , N } ) , which are random nonnegative integers satisfying (3) and (5). Since the loss functions are often more than test costs in real life, we set the average of λ i j to be in [ 100,5000 ] . Of course, the assumptions of cost value could be easily changed if necessary. To observe whether the algorithm efficiency is influenced by the ratio of loss functions to test costs, experiments shown below are undertaken with two groups of cost settings for each dataset listed in Table 5. Each group contains 100 different cost settings. Test costs in both groups are the same, but the loss functions are different. The average values of loss functions (ALF) in group 1 and group 2 are around 500 and 3000, respectively. Experiments are undertaken on a PC with Intel 2.20 GHz CPU and 4 GB memory.

5.2. Efficiencies of the Two Algorithms

We study the efficiencies of both algorithms using two metrics. One is the number of backtrack steps Algorithm 1 is invoked. Comparing it with the size of search space , the efficiency of the backtracking algorithm is investigated. The other is the run-time comparison between the two algorithms. The metric is used to study the efficiency of the heuristic algorithm. The search space size and the average number of backtracking steps for Algorithm 1 are depicted in Table 6, and the average run-time for both algorithms is shown in Table 7, where the unit of run-time is 1 ms.

The average number of backtrack steps for Algorithm 1.

Dataset Search space size Average backtrack steps
ALF 500 ALF 3000
Tic-tac-toe 2 9 130.68 406.34
Voting 2 16 774.18 4873.3
Zoo 2 16 547.05 5134.96
Mushroom 2 22 3758.50 307682.57

Average run-time comparison.

Dataset ALF 500 ALF 3000
Algorithm 1 Algorithm 2 Algorithm 1 Algorithm 2
Tic-tac-toe 591.7 206.7 1376.14 177.16
Voting 975.93 203.77 3845.7 162
Zoo 252.77 1.73 2327.02 1.42
Mushroom 9541.24 807.91 64172.5 835.72

From the results, we note the following.

In both groups, the number of backtrack steps is less than the search space size, which manifests the effectiveness of the pruning technique in Algorithm 1.

With the increasing of ALF, both the backtrack steps and the run-time of Algorithm 1 grow, which means that the efficiency of the backtracking algorithm is influenced by the ratio of loss functions to test costs. The reason is that, when the loss functions are much more than test costs, currently minimal ATC, namely, cmc in Algorithm 1, is also high compared to current test costs. In this case, the pruning technique shown in lines 3 to 4 of Algorithm 1 cannot make effect.

The run-time of Algorithm 2 is small compared with Algorithm 1, especially for the dataset Zoo. Therefore, the heuristic algorithm is very efficient. Moreover, the heuristic algorithm is stable in terms of run-time with the increasing of ALF.

In a word, the heuristic algorithm is good at the efficiency. Although the backtracking algorithm is not very efficient sometimes, it is still needed to evaluate the performance of a heuristic algorithm in terms of the quality of the solution.

5.3. Effectiveness of the Two Algorithms

In this part, we observe the effectiveness of both algorithms by using four metrics. First, two metrics defined in , namely, finding optimal factor (FOF) and average exceeding factor (AEF), are computed to measure the performance of the heuristic algorithm from the perspective of cost. In the computations, the results of the backtracking algorithm are used to evaluate the effectiveness of the heuristic algorithm. The results of the two metrics are shown in Figure 1.

Assuming that ALF is around 500 and 3000, respectively, the effectiveness of the heuristic algorithm is measured by using two metrics. (a) Finding optimal factor (FOF) is the fraction of successful searches of an optimal reduct in experiments. The higher the FOF is, the better the heuristic algorithm is. It is shown that all FOF are above 0.5. (b) Average exceeding factor (AEF) is the average value of the fractions beyond the minimal-average-total costs. The lower the AEF, the better the algorithm. All AEF are below 0.1 in the figure.

From the results, we note the following.

The values of FOF and AEF are not significantly different between ALF 500 and ALF 3000 . Maybe we can conclude that the performance of the heuristic algorithm is little influenced by the ratio of loss functions to test costs.

All FOF are above 0.5, and all AEF are below 0.1. In other words, the results are acceptable.

Then, we compare the classification performances of the original data and the reduced data obtained by our two algorithms based on 10-fold cross validation. CART and RBF-kernel SVM are used as learning algorithms, respectively. The results are depicted in Tables 8-9. We also present the comparison of the average numbers of selected attributes, which is shown in Table 10.

Classification performance comparison with CART classifier.

Dataset Raw data ALF 500 ALF 3000
Algorithm 1 Algorithm 2 Algorithm 1 Algorithm 2
Tic-tac-toe 93.72% 87.48% 87.18% 91.6% 89.4%
Voting 95.47% 90.34% 86.37% 94.07% 87.38%
Zoo 90.69% 82.85% 78.92% 89.65% 85.52%
Mushroom 99.96% 95.18% 90.51% 95.82% 90.29%

Classification performance comparison with RBF-kernel SVM classifier.

Dataset Raw data ALF 500 ALF 3000
Algorithm 1 Algorithm 2 Algorithm 1 Algorithm 2
Tic-tac-toe 87.59% 82.16% 81.8% 85.97% 84.28%
Voting 95.38% 87.35% 83.4% 92.13% 84.36%
Zoo 90.1% 83.73% 79.27% 89.52% 85.8%
Mushroom 100% 97.21% 90.38% 95.79% 91.88%

The comparison of the average numbers of selected attributes.

Dataset Raw data ALF 500 ALF 3000
Algorithm 1 Algorithm 2 Algorithm 1 Algorithm 2
Tic-tac-toe 9 1.31 1.34 5.59 4.11
Voting 16 1.29 1.06 2.6 1.6
Zoo 16 1.98 1.41 3.51 2.59
Mushroom 22 1.57 1.26 6.95 5.32

From the results, we observe the following.

The values of classification accuracy by our algorithms are a little lower than those by the raw data, but the numbers of selected attributes are effectively reduced, which is consistent with the essence of DTRS models. Different from the classical rough set, classification error is acceptable within a certain range according to the thresholds in DTRS models. Consequently, the reduction effectiveness is improved.

With the increasing of ALF, all numbers of selected attributes grow, and the classification performance of most datasets improves. This means that the tolerability of classification error is decreasing when the classification costs increase.

For all datasets, the classification performance of Algorithm 1 is a little better than that of Algorithm 2; meanwhile, the numbers of selected attributes are more in most cases.

6. Conclusions

In this paper, we address cost-sensitive attribute reduction problem in DTRS models. By considering the tradeoff of decision costs and test costs, minimal average-total-cost attribute reduct is defined, and the corresponding optimization problem is proposed. Both backtracking and heuristic algorithms are designed to deal with the optimization problem. Experimental results demonstrate the efficiency and the effectiveness of both algorithms. By combining test costs with the existing elements in DTRS models, such as the loss functions and the probabilistic approaches, our model is practical in real applications.

The following research topics deserve further investigation.

The MACR problem could be addressed again based on more complicated test-cost-sensitive decision systems (DS), such as the simple common-test-cost DS and the complex common-test-cost DS . The corresponding algorithms may also be more complicated.

Sometimes the costs one could afford are limited. We could consider the attribute reduction problem with test cost constraint or total cost constraint in DTRS models.

Recently, from the viewpoint of rough set theory, Yao [30, 31] has discussed three-way decisions, which may have many real-world applications. One could explore the cost-sensitive attribute reduction problem for three-way decisions with decision-theoretic rough sets.

In summary, this study suggests new research trends concerning decision-theoretic rough set theory, attribute reduction problem, and cost-sensitive learning applications.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is in part supported by the National Science Foundation of China under Grant nos. 61379089, 61379049, and 61170128 and the Education Department of Fujian Province under Grant no. JA12224.

von Neumann J. Morgenstern O. Theory of Games and Economic Behavior 1944 Princeton, NJ, USA Princeton University Press xviii+625 MR0011937 Fudenberg D. Tirole J. Game Theory 1991 Cambridge, Mass, USA MIT Press xxiv+579 MR1124618 Perc M. Szolnoki A. Coevolutionary games—a mini review BioSystems 2010 99 2 109 125 2-s2.0-73449112588 10.1016/j.biosystems.2009.10.003 Perc M. Gómez-Gardeñes J. G. Szolnoki A. Floría L. M. Moreno Y. Evolutionary dynamics of group interactions on structured populations: a review Journal of the Royal Society Interface 2013 10 80 17 10.1098/rsif.2012.0997 20120997 Perc M. Grigolini P. Collective behavior and evolutionary games–-an introduction Chaos, Solitons & Fractals 2013 56 1 5 10.1016/j.chaos.2013.06.002 MR3106638 Szolnoki A. Perc M. Szabó G. Defense mechanisms of empathetic players in the spatial ultimatum game Physical Review Letters 2012 109 4 078701 Szolnoki A. Perc M. Szabó G. Accuracy in strategy imitations promotes the evolution of fairness in the spatial ultimatum game Europhysics Letters 2012 100 2 6 28005 Pawlak Z. Rough sets International Journal of Computer & Information Sciences 1982 11 5 341 356 MR703291 ZBL0525.04005 2-s2.0-27744565978 10.1007/BF01001956 Pawlak Z. Skowron A. Rough sets: some extensions Information Sciences 2007 177 1 28 40 MR2272733 ZBL1142.68550 2-s2.0-33749667344 10.1016/j.ins.2006.06.006 Pawlak Z. Skowron A. Rudiments of rough sets Information Sciences 2007 177 1 3 27 MR2272732 ZBL1142.68549 2-s2.0-33749680310 10.1016/j.ins.2006.06.003 Ziarko W. Variable precision rough set model Journal of Computer and System Sciences 1993 46 1 39 59 MR1208621 ZBL0764.68162 2-s2.0-0027543613 10.1016/0022-0000(93)90048-2 Yao Y. Y. Wong S. K. M. Lingras P. A decision-theoretic rough set model Methodologies for Intelligent Systems, 5 (Knoxville, TN, 1990) 1990 New York, NY, USA North-Holland 17 24 MR1105771 Yao Y. Y. Wong S. K. M. A decision theoretic framework for approximating concepts International Journal of Man-Machine Studies 1992 37 6 793 809 2-s2.0-0002608174 Greco S. Matarazzo B. Slowinski R. Rough set approach to decisions under risk 2005 Proceedings of the 2nd International Conference on Rough Sets and Current Trends in Computing (RSCTC '00) 2001 160 169 Lecture Notes in Computer Science Greco S. Matarazzo B. Slowinski R. Rough sets theory for multicriteria decision analysis European Journal of Operational Research 2001 129 1 1 47 2-s2.0-0035254283 10.1016/S0377-2217(00)00167-3 Skowron A. Ramanna S. Peters J. F. Conflict analysis and information systems: a rough set approach Proceedings of the 1st International Conference on Rough Sets and Knowledge Technology (RSKT '06) 2006 4062 233 240 Lecture Notes in Computer Science 2-s2.0-33746666540 Yao Y. Zhao Y. Attribute reduction in decision-theoretic rough set models Information Sciences 2008 178 17 3356 3373 MR2436407 ZBL1156.68589 2-s2.0-45849092954 10.1016/j.ins.2008.05.010 Zhao Y. Wong S. K. M. Yao Y. A note on attribute reduction in the decision-theoretic rough set model Transactions on Rough Sets XIII 2011 6499 260 275 Lecture Notes in Computer Science 2-s2.0-79956321860 10.1007/978-3-642-18302-7_14 Li H. Zhou X. Zhao J. Liu D. Attribute reduction in decision-theoretic rough set model: a further investigation Proceedings of 6th International Conference on Rough Sets and Knowledge Technology (RSKT '11) 2011 6954 466 475 Lecture Notes in Computer Science 2-s2.0-80054069045 10.1007/978-3-642-24425-4_61 Jia X. Liao W. Tang Z. Shang L. Minimum cost attribute reduction in decision-theoretic rough set models Information Sciences 2013 219 151 167 10.1016/j.ins.2012.07.010 MR2991562 Min F. Liu Q. A hierarchical model for test-cost-sensitive decision systems Information Sciences 2009 179 14 2442 2452 MR2554690 ZBL1192.68651 2-s2.0-67349214027 10.1016/j.ins.2009.03.007 Min F. He H. Qian Y. Zhu W. Test-cost-sensitive attribute reduction Information Sciences 2011 181 22 4928 4942 2-s2.0-79960290370 10.1016/j.ins.2011.07.010 Min F. Hu Q. Zhu W. Feature selection with test cost constraint International Journal of Approximate Reasoning 2014 55 1, part 2 167 179 10.1016/j.ijar.2013.04.003 MR3127791 Liao S. Liu J. Min F. Zhu W. Minimal-test-cost reduct problem on neighborhood decision systems Journal of Information and Computational Science 2012 9 14 4083 4098 Min F. Zhu W. Attribute reduction of data with error ranges and test costs Information Sciences 2012 211 48 67 10.1016/j.ins.2012.04.031 MR2946900 ZBL1250.68227 Zhao H. Min F. Zhu W. Test-cost-sensitive attribute reduction of data with normal distribution measurement errors Mathematical Problems in Engineering 2013 2013 12 10.1155/2013/946070 946070 MR3037223 Min F. Zhu W. Zhao H. Pan G. Coser: cost-senstive rough sets 2011, http://grc.fjzs.edu.cn/~fmin/coser/ Blake C. L. Merz C. J. UCI repository of machine learning databases 1998, http://www.ics.uci.edu/~mlearn/mlrepository.html Zhao H. Min F. Zhu W. Cost-sensitive feature selection of numeric data with measurement errors Journal of Applied Mathematics 2013 2013 13 10.1155/2013/754698 754698 Yao Y. Three-way decisions with probabilistic rough sets Information Sciences 2010 180 3 341 353 MR2564152 2-s2.0-70449130467 10.1016/j.ins.2009.09.021 Yao Y. The superiority of three-way decisions in probabilistic rough set models Information Sciences 2011 181 6 1080 1096 MR2765311 ZBL1211.68442 2-s2.0-79251593234 10.1016/j.ins.2010.11.019