Extended Tolerance Relation to Define a New Rough Set Model in Incomplete Information Systems

This paper discusses and proposes a rough set model for an incomplete information system, which defines an extended tolerance relation using frequency of attribute values in such a system. It first discusses some rough set extensions in incomplete information systems. Next, “probability of matching” is defined from data in information systems and then measures the degree of tolerance. Consequently, a rough set model is developed using a tolerance relation defined with a threshold. The paper discusses the mathematical properties of the newly developed rough set model and also introduces a method to derive reducts and the core.


Introduction
Rough set theory [1,2] was first proposed by Pawlak as a means to analyze vague descriptions of items.The original rough sets approach presupposes that all objects in an information system have precise attribute values.Problems arise when some of the values are unknown, which sometimes happens in the real world.Therefore, it is necessary to develop a theory which enables classifications of objects even if there is only partial information available.The rough set model proposed by Kryszkiewicz [3,4], for example, introduced indiscernibility based on tolerance relation to deal with missing values in the information system.In these approaches, a missing value was considered as a special value that may take any possible value.
However, tolerance relation sometimes leads to a poor result with respect to approximation.Stefanowski and Tsoukiàs [5,6] discussed the limitation and introduced similarity relation to refine the results obtained by using tolerance relation approach.Wang [7] gave some examples to prove that similarity relation may results in lost information and proposed limited tolerance relation.Yang et al. [8] also generalized a reasonable and flexible classification in incomplete information system by "new binary relation." In fact, there is an array of methods to handle incomplete objects [9,10].Some approaches replace missing values with the most common value [11], while the other considers "unknown" itself as a new value for the attribute and treats it in the same way as ordinary values [10].Actually, the method of handling missing values should be chosen depending on the characteristics and requirements of applications.In general, approaches deal with unavailable values based on one of the following two interpretations [12].The first is "lost value" in which unknown values of attributes are already lost.Similarity relation [5] is one example of this semantics.The second is "do not care, " which may be potentially replaced by any value in the domain.Such incomplete decision tables were broadly studied in numerous researches [3,4].Grzymala-Busse [13][14][15][16][17] built a characteristic relation based on both "lost value" case and "do not care" case.
In this paper, we study "probability of matching" and propose a new method of handling missing values in incomplete information systems based on tolerance degree.Our approach adopts the "lost value" interpretation.The approach is useful in knowledge acquisition from incomplete information systems, in which some object values appear frequently and others do not.
The paper is organized as follows.Section 2 discusses tolerance relation to deal with incomplete information and its drawback.This section also introduces some extensions to avoid the issues of tolerance relation.The next section-Section 3-is to find out how the frequency of attributes

Rough Set in Incomplete Tables
In this section, we discuss several rough set extensions in incomplete decision tables with their issues.An information system is defined as a pair  = (, ), where  is a nonempty finite set of objects called the universe and  is a nonempty finite set of attributes.For every  ∈ , there is a mapping from  into a space,   :  →   , and   is called the value set of  [1,2].
If  contains at least one object with an unknown (missing or null) value, then  is called an incomplete information system, otherwise complete [3,4].In incomplete information systems, objects may contain several unknown attribute values, but we do not assume the case where all objects take the unknown value for an attribute.Unknown values are denoted by special symbol " * " in incomplete information systems and are supposed to be contained in the set   .
A decision table defined by DT = (,  ∪ {}) is an information system, where  ∉  is a distinguished attribute called decision [18].In a similar manner to information systems, a decision table may be incomplete, otherwise complete.However, all decision values are known both in complete and incomplete decision tables.
In a complete decision table, the relation EQU  (, ),  ⊆  denotes a binary relation between objects that are equivalent in terms of values of attributes in  [1].The equivalence relation is reflexive, symmetric, and transitive.Let   () = { ∈  | EQU  (, )} be the set of all objects that are equivalent to  by , and let it be called equivalence class.

Tolerance Relation.
A tolerance relation TOR  (, ),  ⊆  denotes a binary relation between objects that are possibly equivalent in terms of values of attributes.In incomplete information systems [3,4], tolerance relation is defined by where ∨ denotes disjunction.The relation is reflexive and symmetric but does not need to be transitive.Let   () = { ∈  | TOR  (, )} be the set of objects which are in relation with  in terms of  in the sense of the above tolerance relation.Due to the symmetric property,  is also tolerant to elements in   ().
Rough sets based on tolerance relation in incomplete information systems are defined in a similar way to those in complete information systems [1].Let  ⊆  and  ⊆ .
appr   is the upper approximation of  in terms of , if and only if Now, we illustrate the above concepts with an incomplete decision table from [10].The decision table is shown in Table 1.
From Table 1, we can induce approximation space for group of people such that the value of flu is no based on all condition attributes: The approximations are quite poor.Moreover, there exist objects which intuitively could be classified in , while they are not in the lower approximation.Take, for instance, object  6 ; we have its complete description, and intuitively there is no other object perceived as very tolerant to it.However, it is not included into the lower approximation of .This is due to missing attribute values of objects  8 , which is actually tolerant to  6 according to Equation (1).

Similarity Relation.
In the approach proposed by Stefanowski [5,6], it is assumed that an object  can be considered as similar to another object  only if all known attribute values of  are the same as those of .Such a relation shall not be symmetric.If one object has more complete description than the other, the inverse relation shall not hold.More formally, given an information system  = (, ) and an attribute set  ⊆ , the similarity relation is defined as follows: It is easy to observe that this relation is reflexive and transitive although not necessarily symmetric.Now for each object, we can induce two similarity sets:   () = { ∈  | SIM  (, )}, the set of objects similar to  (note that the arguments of SIM  is not (, )), and  −1  () = { ∈  | SIM  (, )}, the set of objects to which  is similar.
Clearly,   () and  −1  () are two different sets.We can now introduce the definitions of approximation space of a set  ⊆  as follows: By the definition of similarity relation and tolerance relation introduced in this section, we can see that the conditions for which similarity relation holds are a subset of the conditions for which tolerance relation holds (we can see that if SIM  (, ), then TOR  (, )).Hence, tolerance classes of elements in  shall be "wider" than the respective similarity classes [5,6].

Limited Tolerance Relation.
Lets compare the attributes of  4 with those of  5 in Table 1.According to our intuition,  4 seems similar to  5 due to the same description in temperature and nausea.However, it is actually not, though  5 is similar to  4 according to the Equation (5).In a huge system, two objects may be considered as distinct, in terms of similarity relation, because of little missing information.For example, objects  with ( * , 1, 2, 3, 4, 5, 6, 7, 8, 9) and  with (0, * , 2, 3,4,5,6,7,8,9), where the vectors are abbreviate representation of attribute values of the objects, are tolerant according to Equation (1) and intuitively similar to each other.However, they do not satisfy the nonsymmetric similarity relation.To avoid such problem, Wang [7] In the formula, the condition that . Thus, the two objects that satisfy TOR  (, ) but not LTOR  (, ) are only those satisfying   () ∩   () = .
Generally speaking, two objects are in limited tolerance relation if they are in one of the two cases.The first case is that all attribute values of the two objects are missing.The second is a case where there is at least an attribute having an ordinary value for both objects and the two objects have the same value for those attributes.Obviously, limited tolerance relation is reflexive and symmetric but not necessarily transitive.
Thus, limited tolerance class is denoted by Based on that, approximation space is defined as follows: apprLT Wang [7] also proved that tolerance relation and similarity relation are the two extremities for extending indiscernibility relation, and limited tolerance relation happens to be between tolerance and similar relations,

Probability of Matching
"The most common attribute value of an attribute" is a method of handling missing value summarized by Grzymala-Busse [9,10].In this method, missing values are replaced by the most common value of the attribute.In different words, a missing attribute value is replaced by the most probable known attribute value, where such probabilities are represented by frequencies of corresponding attribute values.This method of handling missing attribute values is implemented, for example, in well-known machine learning algorithm CN2 [11].Grymala-Busse illustrated the method [10] by the example from Table 1.For case  1 , the value of headache is replaced by yes since in Table 1 the attribute headache has four values yes and two values no.Similarly, for case  3 , the value of temperature is high since the attribute temperature has the value very high once, normal twice, and high three times.
Using this notion, suppose that the value domains are known, first, we define minimum probability that each value of an attribute appears based on the frequency in the dataset for each concept.Then, the minimum probability that two objects have the same values is defined in order to propose a tolerance relation.
The probability that a value  ∈   ,  ̸ = * appears as a value of a certain object is between |  ()|/|| and {|  ()| + |  ( * )|}/||, where   () and   ( * ) are sets of objects whose value of attribute "" is "" and " * , " respectively.If   () ̸ = * , that is, the attribute value of an object  is not missing, the probability that   () appears is between Let us define probabilities   (  ()) and   (), which are the minimum probabilities that a value of attribute "" is ", "   () is an object whose value of attribute "" is ", " and the minimum probability that an attribute value   () ̸ = * appears, respectively.The minimum probabilities are given as follows: The minimum probabilities take a value in [0, 1] in general, but they are greater than zero if   () ∈  and  ∈  Now, we define the probability of matching between objects  and  on an attribute  if one of their attribute values is missing.Definition 1.Let  = (, ) be an incomplete information system.Given that  ∈  and ,  ∈ , if the value of either  or  is missing on ", " probability of matching between  and  on "" denoted by   (, ) is defined as the minimum estimation of probability that  and  take the same value on "" and is given by the following equation: when  ̸ = .Otherwise,   (, ) =   (, ) = 1.If one of the two objects has a certain value,   (), for example, the least probability value that   () appears in attribute of  is   () assuming that the other objects with missing values on  take another value  ̸ =   ().If both of them are missing, we take the sum of joint probability on all values in attribute domain within the same explanation.
It should be noted that   () > 0 and   () > 0 in the case because ,  ∈ , and that   (  ()) > 0 at least for a value  ̸ = * because we do not assume the case where all objects take the unknown value for an attribute.They are also less than 1.0, because  or/and  takes the missing value.Thus, in the case of  ̸ = , 0 <   (, ) < 1 is guaranteed.Take the attribute  = temperature, for example, in Table 1.The minimum probability that value   ( 3 ) is the same as   ( 1 ) is 0.375, and the minimum probability that the value   ( 3 ) is similar to   ( 8 ) is 0.125 2 + 0.375 2 + 0.25 2 = 0.219.

Extended Tolerance Relation
To define whether objects  and  are tolerant or not, we introduce the concept tolerance degree between two objects by combining two relation indexes.One takes a binary value representing a binary equivalence relation defined by attributes with a known value in both the objects.The other is an index defined by attributes with the missing value in either of the objects.It is obtained from probability of matching assuming that   (, ) is independent of each other among attributes.
Limited tolerance relation was defined basically using attributes whose values are available in both  and .We define a binary function that represents that LT relation can hold between the objects in the case of   () ∩   () ̸ =  and utilize it.
When Θ  (, ) = 0, there are two cases; one is a case where there is  ∈   () ∩   () ̸ =  such that   () ̸ =   ().In this case,    (, ) = 0.The other is the case where   () ∩   () = .In this case, 0 <    (, ) < , considering that 0 <   (, ) < 1 for  ̸ = .Therefore,  could be understood as a value that separates the following cases: In order to separate the cases between (a) and (b),  should satisfy 1 −  ≥ .From those above, we have the constraint of  ∈ (0, 0.5].If  < 0.5,    (, ) never takes a value between  and 1 − .Hence, we define the tolerance degree by fixing  = 0.5, though    (, ) never takes the value of 0.5 as known from the conditions of (a) and (b): The tolerance degree with  = 0.5 lets us differentiate the three cases discussed before by seeing whether the degree is greater/less than 0.5 or whether it is greater than/equal to zero.This feature might be useful, because the users can control conditions of the tolerance based on equivalence existence and probability of matching with just a threshold value.This process shall be discussed in the next step.
Table 3 shows the tolerance degree among objects in terms of all attributes.
In fact, we can choose another probability of matching on an attribute for ( 14) and (15).For example, instead of using   (, ) defined in (12), we can choose   (, ) = 1/|  | [5,6].The choice might depend on probability distribution of attribute values in each system.
The probabilistic terms in our tolerance degree look similar to those used by Stefanowski [6].However, our approach uses probabilistic terms as pieces of evidence to derive tolerance relations.Furthermore, this term is combined with equivalence existence to define the relation.On the other hand, in probabilistic approach proposed in [6], the authors suppose a priori assumption that there exists a uniform probability distribution on every attribute domain and compute tolerance classes based on the joint probability distribution.Their aim seems to define approximation spaces applicable in many cases.Such tolerance classes could be used in some applications, but we believe not in most.Now, we define extended tolerance relation by controlling tolerance degree with a threshold.Definition 4. Given that incomplete information system  = (, ) and attribute set  ⊆  and given a threshold , the extended tolerance relation is defined as follows: It is easy to observe that this relation is reflexive and symmetric but not necessarily transitive.In Table 3, if a threshold  = 0.5 is given,  4 is tolerant to  5 based on this relation.
By changing the threshold, we are able to get the same results as those by the relations discussed in the previous sections.For example, in the case of tolerance relation, the set of objects tolerant to  5 is { 4 ,  5 ,  8 } in Table 1.From Table 3, we also get { 4 ,  5 ,  8 } as the set of objects tolerant to  5 using extended tolerance relation with  = 0.01.Similarly, we have the same result as limited tolerance relation: { 4 ,  5 }, if  = 0.5.Now, we can formalize these connections by the following propositions.
This proposition shows that with  → 0, extended tolerance relation can get the same results as tolerance relation.
Then, it is evident that LTOR  (, ) ⇒ ETR   (, ) except the case where   () =   () = .Proposition 7. Let  = (, ) be an incomplete information system.Given that  ⊆  and  ∈ , if  = 1.0, then This proposition shows that with  = 1.0, extended tolerance relation is able to get the same results as equivalence relation.
It should be noted that similarity/tolerance relations discussed in this paper are introduced to cope with incomplete information.However, we could also define those relations even in complete information tables.For example, the relation "subclass-of " is a similarity relation.It is clearly transitive, but not necessarily symmetric.We can also take the relation "friend-of " as an example of tolerance relation and examine its properties in the same way.Definition 8. Let  = (, ) be an incomplete information system.Given that  ⊆  and , ,  ∈ , if   (, ) >   (, ), then  is more tolerant to  than  based on extended tolerance relation.Property 1.Let  = (,) be an incomplete information system.Given that  ⊆  and , ,  ∈ , if   () ∩   () ̸ = , for all  ∈   () ∩   (),   () =   (), and   () ∩   () = , then  is more tolerant to  than  based on extended tolerance relation.
Proof.Consider the following: Hence, the cardinality of the tolerance set of  shall decrease if we increase the threshold to control the tolerance degree.

Lower and Upper Approximations
For complete decision tables, lower and upper approximations are defined on the basis of indiscernibility relation [1,2].They can also be defined in different ways, for example, using set elements or concepts represented by subsets.In the case of nonequivalence relations, which may not need to be reflexive, symmetric nor transitive, approximation spaces defined in such different ways may lead to variant results [14].This section shall introduce lower and upper approximation definitions based on singleton, subset, and concept methods which are first studied and generalized by Grzymala-Busse [14,19].

Singleton Definition. Singleton lower approximation is
Singleton upper approximation is In the example shown in Table 1, given the threshold  = 0.6 and  = , from Table 3,  Concept upper approximation is The difference between subset and concept definitions may be missed easily.In subset definition, extended tolerance classes of all elements in the universal set are examined, while only elements in  are examined in the case of concept definition.
Obviously, singleton lower and upper approximations of  are subsets of the subset lower and upper approximations of , respectively.The subset lower approximation is the same set as the concept lower approximation.The concept upper approximation, however, is a subset of the subset upper approximation.
Rough set approximations could be generalized with some other approaches [20][21][22].Actually, the above three definitions are classified as constructive rough set formulations by Yao [20], where rough set formulations are divided into two different groups: constructive and algebraic methods.The notion of singleton definition is indeed the same as the element based definition suggested by Yao.Meanwhile, subset definition is an expansion of concept definition and also undoubtedly is the same as the granule based definition in the Yao study.These definitions are special cases of the subsystem based definition by Yao when the covering is the set of equivalence/similarity/tolerance classes.

Properties of Approximations.
Approximation spaces defined based on extended tolerance relation have some properties suggested by Pawlak [1,2] as well as other properties.We discuss them in detail below.Property 3. Let  = (, ) be an incomplete information system, ,  ⊆ , and ,  ⊆ .Table 4 shows which properties of the original rough set model are satisfied with singleton, subset, and concept definitions.
These properties within our approach can be proved the same as those in the Grzymala-Busse and Wojciech Rzasa study [19] and the Pawlak research [2].Approximation spaces of those definition methods, in general, do not have properties 7a-7d.However, they are likely to satisfy the weaker versions of 7a-7d, which are defined by Yao [21].
Besides this, our tolerance relation is controlled by the threshold of tolerance degree.Thus, new properties for the threshold can be introduced as shown below.
Property 4. Let  = (, ) be an incomplete information system,  ⊆ , and  ⊆ .The following properties shall hold for arbitrary lower approximation appr

Reducts and Core
The concept of reducts and core was introduced by Pawlak [2] for complete information system.In this section, we shall propose a method to derive reduct and core for incomplete information systems based on extended tolerance relation.
A subset of conditional attributes  ⊆  is a reduct of an incomplete information system, if the tolerance classes induced by  are the same as the tolerance classes induced by all attributes in set  and no attribute can be removed from set  without changing the tolerance classes.
Definition 9.The comparison, Boolean function between two relations in terms of two attribute sets ,  ⊆  in an incomplete information system is defined as If   (, ) = 1, the two relations developed from two different attribute sets ,  make the same tolerance classes.
where   () is the decision value of an object .
Definition 11.The comparison Boolean function between two relations in terms of attribute sets ,  ⊆  in an incomplete decision table is defined as where  Proposition 14.A subset  ⊆  is a reduct of the incomplete information system (or decision table) with threshold  if and only if (i)   (, ) = 1 for incomplete information systems (or   (, ) = 1 for decision tables); (ii) For all  ∈ ,   ( − {}, ) = 0 for incomplete information systems (or ∀ ∈ ,   ( − {}, ) = 0 for decision tables).
Proof.Following the definition of reducts stated at the beginning of Section 6,  ⊆  is a reduct if and only if (i) the tolerance classes induced by  are the same as the tolerance classes induced by all attributes in set ; (ii) no attribute can be removed from set  without changing the tolerance classes.
In the example shown in Table 1, given the threshold  = 0.6, using the method of deriving core, all attributes including temperature, headache, and nausea are indispensable.Hence, in this system, temperature, headache, nausea is the core.The core also happens to be the only one reduct of this incomplete information system with  = 0.6.

Conclusion
This paper studies a rough set theory for incomplete information systems and establishes a new model based on tolerance degree called extended tolerance relation based rough set model.Frequency of attribute values appearing in the decision table is used to estimate the probability of matching among data items on an attribute.Then, tolerance degree is calculated based on the existence of equivalence on some attributes and probability of matching.Given a threshold to control tolerance degree, a tolerance relation is defined.
The approach is an extension of some rough set models and could solve the problem existing in tolerance relation of Kryszkiewicz.By adjusting the threshold, we are able to get the same results as tolerance, limited tolerance, and equivalence relations.The variable threshold also gives us a means to widen or thin the boundary region between lower and upper approximations.Actually, various lower and upper approximations are obtained using the approach, and the user can choose a threshold that suits his/her requirements.
The paper also discussed the mathematical properties of extended tolerance relation based rough set model and proposed a method to derive reducts and core.
Further research includes finding an algorithm to collect rules within the approach discussed.That is a significant application of rough set theory in knowledge acquisition from data.

Table 1 :
An example of dataset with missing values.

Table 2 :
Minimum probabilities of attribute values.

Table 2 .
From this table, we can see that in the information system in Table1, the value high of temperature occurs more frequently than the other values.The most frequent values of headache and nausea happen to be "yes."

Table 3 :
Tolerance degree among objects in terms of all attributes.

Table 4 :
Properties of approximations based on the three definitions.   are lower and upper approximations that can be defined by singleton, subset, and concept methods.∼ : denotes a complementary set of .In the incomplete decision table, the function   :  → (  ), where  ⊆  and (  ) is the power set of   , is defined as "✓" indicates that the property is satisfied.appr    and appr

)
Proposition 12.The attribute  ∈  is indispensable in  if and only if   ( − {}, ) = 0 for incomplete information system and   ( − {}, ) = 0 for incomplete decision tables.This proposition is applied to both incomplete information systems and incomplete decision tables.Consider that   ( − {}, ) = 0 or   ( − {}, ) = 0 means that if  is removed from , the tolerance classes based on extended tolerance relation in terms of  − {} are different from the tolerance classes based on .Hence,  is indispensable in .