A Variable Precision Attribute Reduction Approach in Multilabel Decision Tables

Owing to the high dimensionality of multilabel data, feature selection in multilabel learning will be necessary in order to reduce the redundant features and improve the performance of multilabel classification. Rough set theory, as a valid mathematical tool for data analysis, has been widely applied to feature selection (also called attribute reduction). In this study, we propose a variable precision attribute reduct for multilabel data based on rough set theory, called δ-confidence reduct, which can correctly capture the uncertainty implied among labels. Furthermore, judgement theory and discernibility matrix associated with δ-confidence reduct are also introduced, from which we can obtain the approach to knowledge reduction in multilabel decision tables.


Introduction
Conventional supervised learning deals with the single-label data, where each instance is associated with a single class label. However, in many real-world tasks, one instance may simultaneously belong to multiple class lultilabel decision tablabels, for example, in text categorization problems, where every document may be labeled as several predefined topics, such as religion and political topics [1]; in image annotation problems, a photograph may be associated with more than one tag, such as elephant, jungle, and Africa [2]; in functional genomics, each gene may be related to a set of functional classes, such as metabolism, transcription, and protein synthesis [3]. Such data are called multilabel data.
Owing to the high dimensionality of multilabel data, feature selection in multilabel learning will be necessary in order to reduce the redundant features and improve the performance of multilabel classification. Among various feature selection approaches, rough set theory, proposed by Pawlak [4], has attracted much attention due to its special advantage, that is, the capability of studying imprecise, incomplete, or vague information without requiring prior information.
Feature selection in rough set theory is also called attribute reduction. Generally speaking, attribute reduction can be interpreted as a process of finding the minimal set of attributes that can preserve or improve one or several criteria. The minimal set of attributes is called an attribute reduct. In past few years, many researchers have done much work on attribute reduction and the summarization of important results has been done in [5,6]. The idea of attribute reduction using positive region was first originated in [7,8], aiming to remove redundant attributes as much as possible while retaining the so-called positive regions. Afterwards, Ziarko introduced the variable precision rough set model andreduct to improve the ability of modeling uncertain information [9]. Furthermore, Kryszkiewicz proposed five kinds of attribute reducts for inconsistent information systems [10] and the relationships in these five reducts and some related results are reconsidered and rectified in [11]. Applying discernibility matrix, Skowron and Rauszer [12] proposed an attribute reduction algorithm by computing disjunctive normal form, which is able to obtain all attribute reducts of a given information system. On the other hand, for obtaining a single reduct from a given information system in a relatively 2 The Scientific World Journal short time, many heuristic attribute reduction algorithms have been developed. In order to reduce computational time, Xu et al. [13] proposed a quick attribute reduction algorithm with complexity of max( (| || |), (| | 2 | / |)). Further, Qian et al. [14] developed a common accelerator based on four kinds of heuristic reduction algorithms to improve the time efficiency of a heuristic search process.
As far as we know, however, little work has been done on applying rough set theory to feature selection in multilabel learning. Although directly applying the existing attribute reduction methods to multilabel data is possible, it does not take into account the uncertainty conveyed by labels and thus can be enhanced further. In this paper, we propose a new attribute reduct for multilabel data, namely, -confidence reduct, which overcomes the limitations of existing attribute reduction methods to multilabel data. Furthermore, judgement theory and discernibility matrix associated withconfidence reduct are also established. These results provide approaches to knowledge reduction for multilabel data, which are significant in both the theoretic and applied perspectives.
The rest of this paper is organized as follows. Some basic notions in rough set theory are briefly reviewed in Section 2. Section 3 is devoted to introducing multilabel decision table and analyzing the limitations of the existing attribute reduction methods to multilabel data. In Section 4, the new attribute reduct, -confidence reduct, is proposed and the corresponding judgement theorem and discernibility matrix are also introduced. A computative example is also given to illustrate our approaches. Finally, in Section 5, we conclude the paper with a summary and outlook for further research.

Preliminaries
In this section, we will review several basic concepts in rough set theory.
A decision table is an information system = ( , ∪ ) with ∩ = 0, where = { 1 , 2 , . . . , } is a nonempty, finite set of objects called universe; = { 1 , 2 , . . . , } is a nonempty, finite set of condition attributes; = { 1 , 2 , . . . , } is a nonempty, finite set of decision attributes. Each nonempty subset ⊆ determines an indiscernibility relation in the following way: Let ⊆ and ⊆ . One can define a lower approximation of and an upper approximation of by respectively. The lower approximation is called the positive region of and denoted alternatively as POS ( ). If ( ) ̸ = ( ), then is called a rough set. Attribute reduct is one of the most important topics in rough set theory, which aims to delete the irrelevant or redundant attributes while retaining the discernible ability of original attributes. Among many attribute reduction methods, the positive region reduct [7,8] is a representative method.
is a positive region reduct of if and only if satisfies the following conditions:

The Multilabel Data
In this section, we first introduce the multilabel decision table and then analyze the limitations of existing attribute reduction approaches to multilabel data. (1) The object having no labels is irrelevant to multilabel learning and thus is not taken into account in the setting [15,16]. Note that this convention is a prerequisite for the proposed approach, as discussed in Section 4.
(2) Each label from associates with at least one object in [17].
The following example depicts a multilabel decision table in more detail.

Example 2. A multilabel decision table
= ( , , ) is presented in Table 1, which is a part of document topic classification problem. It consists of nine documents that belong to one or more of three labels: religion, science, and politics. It The Scientific World Journal 3  can be seen that Note that each object in is associated with at least one label from and each label from is associated with at least one object in .

The Limitations of Existing Attribute Reduction Approaches to Multilabel Data.
In this section, we mainly analyze the limitations of existing attribute reduction approaches to multilabel data. For a multilabel decision table = ( , , ), each label attribute can be viewed as a binary decision attribute and then form an indiscernibility relation as follows: partitions into a family of equivalence classes given by / = { 1 , 2 , . . . , }. In this case, most existing attribute reduction approaches can be directly applied to multilabel data. Here we consider, for instance, positive region reduct, to delete redundant condition attributes in multilabel decision tables. The following example illustrates this process.
Example 3. For the multilabel decision table = ( , , ) given by Table 1, we can conclude that This means 1 is uncertain with respect to the label set . Furthermore, we can calculate that Since 1 , 3 , and 5 are all uncertain with respect to , they can be safely merged without any information loss. In other words, removing the attribute or is valid from the perspective of rough sets. Moreover, one can check that no more attributes can be removed from either { , } or { , }; so { , } and { , } are both positive region reducts.
However, notice that all objects in 1 must be associated with label 1 and may be associated with label 2 in the probability of 1/2 and must not be associated with label 3 , whereas all objects in 5 must not be associated with label 1 and must be associated with label 2 and may be associated with label 3 in the probability of 1/2. Thus, the uncertainty of Through the above analysis, we know that some positive region reducts cannot preserve uncertainty implied among labels for multilabel data. In fact, since the computation of positive region reduct has to refer to the indiscernibility relation , the uncertainty conveyed by labels may be not analyzed thoroughly. Furthermore, note that the uncertainty characterized by is also considered by the other existing attribute reduction methods; so they have the same limitations for multilabel data like positive region reduct. Thus it is necessary to reconsider attribute reduction method for multilabel data.

The New Attribute Reduction Approach in Multilabel Data Decision Tables
In this section, we will introduce a new attribute reduct referred to as -confidence reduct and show some advantages of -confidence reduct in unraveling the uncertainty of multilabel data. Moreover, judgement theory and discernibility matrix associated with -confidence reduct are also established.

-Confidence Reduct in Multilabel Decision
Tables. First, we present the following definition.
Considering Convention 1 of multilabel learning, one has ∪ =1 = ; that is, 1 , 2 , . . . , form a cover of . In the following, we present a particular function to characterize the uncertainty implied among labels.
Definition 5. Let = ( , , ) be a multilabel decision table, let ( ) be the power set of label set , and let 1 , 2 , . . . , be label decision sets. Given a subset ⊆ and ∈ [0, 1], one defines a -confidence label function : → ( ), as follows: The -confidence label function ( ) is the collection of the labels that associate with at least % objects in [ ] .
In other words, ( ) is the collection of the labels which associate with each object in [ ] by at least % confidence level.
It is a contradiction.
(4) It is straightforward by the definition of ( ) and Now we define the consistent -confidence set using -confidence label function. Furthermore, we present the definition of new attribute reduct.
Definition 8. Let = ( , , ) be a multilabel decision table and ⊆ . If ( ) = ( ), for all ∈ , one says that is a consistent -confidence set of . If is a consistentconfidence set and no proper subset of is a consistentconfidence set, then is called a -confidence reduct of .
A -confidence reduct is the minimal set of condition attributes that preserves the invariances of the -confidence label function of all objects in .
Example 9 (continued from Example 6.). For the multilabel decision table = ( , , ) given by Table 1 Therefore, we obtain the unique 0.6-confidence reduct: { , }. Considering Example 3, however, we know that { , } and { , } are two positive region reducts for the same multilabel decision table. We think -confidence reduct is more appropriate for multilabel data than positive region reduct. This is because -confidence label function can more reasonably characterize the uncertainty implied among labels than the indiscernibility relation .
Note that the uncertainty characterized by is also considered by the other existing attribute reduction methods. Therefore, for multilabel data, -confidence reduct has significant advantages when compared with existing attribute reduction methods.

Discernibility Matric of -Confidence
Reduct. This section provides a discernibility matrice approach [12] to obtain The Scientific World Journal 5 all -confidence reducts. Firstly, we present the judgement theorem of consistent -confidence set.
Theorem 10 (judgement theorem of consistent -confidence set). Let = ( , , ) be a multilabel decision table, ⊆ and ∈ [0, 1]. Then the following conditions are equivalent: (1) is a consistent -confidence set; Therefore ∉ ( ), which is a contradiction. Thus we conclude that ( ) = ( ) for any ∈ . According to Definition 8, we have that is a consistentconfidence set.
Theorem 10 provides an approach to judge whether a subset of attributes is a consistent -confidence set in multilabel decision tables. Now we present a method for computing all -confidence reducts. First, we give the following notion.
By ( ) the value of with respect to the objects in . Define  For the -confidence discernibility matrix, we have the following property.
In the sequel we will write instead of̃when no confusion arises. Furthermore, according to related logical knowledge, we have the following theorem. Theorem 15 provides a discernibility matrix based method to compute all -confidence reducts. The following example illustrates the validity of the approach. (18) Note that 0.6 ( 1 ) = 0.6 ( 2 ) = 0.6 ( 3 ). Therefore ( 1 , 2 ) ∉ Δ 0.6 . We can calculate the -confidence discernibility matrix shown in Table 2.
Consequently, we have By Theorem 15 we derive that { , } is the unique 0.6confidence reduct which accords with the results in Example 9.

Conclusion
The -confidence reduct presented in this paper is an attribute reduction method designed for multilabel decision tables. Compared with the existing attribute reduction methods, the -confidence reduct accurately characterizes uncertainty implied among labels; thus it is more appropriate for multilabel data. Moreover we proposed the corresponding discernibility matrix based method to computeconfidence reduct, which is significant in both the theoretic and applied perspectives. In further research, the property of -confidence reduct and corresponding heuristic algorithm will be considered.